Job Description :

SRE (Azure, AWS, GCP, Docker, Kubernetes-AKS, EKS, GKE)
Tampa, FL 33607
12 Months

This role requires any travel (Yes/No):  No
Is this an 100% remote role? No
Minimum years of experience required:  10+

JOB DESCRIPTION
Responsibilities:
Demonstrates extensive abilities and/or a proven record of success in the following areas:
? Providing SRE support for multiple distributed software applications (client-facing – internal & external); 
? Managing and continually improving platform infrastructure and applications with high reliability, resiliency, performance & quality, and faster time-to-market taking a holistic view of system health into account; 
? Gathering and analyzing metrics from both systems and applications for performance tuning and fault finding; 
? Partnering with development teams to improve services through rigorous testing and release procedures meeting security, compliance & performance requirements; 
? Participating in systems design, platform management, and capacity planning. Ensure that platforms are designed with "operability " in mind; 
? Pursuing the discovery of system faults throughout the application lifecycle – before & after release; 
? Defining, Implementing and being accountable for Velocity & Reliability (SLIs, SLOs, Error Budgets); 
? Creating & supporting sustainable systems and services through automation (to drive the problems away not just mere automation) and uplifts for infrastructure, testing, failover solutions, failure mitigation, etc.; 
? Writing, updating, and using documentation, including runbooks/playbooks; and, Using Chaos Engineering to test the robustness of the systems and applications.

Qualifications
? 5+ years professional experience with various flavors of Linux and/or Windows 
? 5+ years experience in supporting and troubleshooting full stack applications (monolithic and microservices), infrastructure and legacy applications (root cause analysis through identifying, analyzing and remediating service(s) performance and availability issues to ensure maximum service uptime and availability) 
? 5+ years experience with cloud computing technology and its concepts (Azure, AWS, GCP) 
? 3+ years experience in balancing service reliability, metrics, sustainability, technical debt, and operational toil for live  services running at scale 
? 3+ years experience with container technologies and orchestration (Docker, Kubernetes-AKS, EKS, GKE) 
? 3+ year implementing DevOps practices at scale

Demonstrates extensive abilities and/or a proven record of success in the following areas:
? Experience in one or more of the following: Go, Python, Ruby, Java, Perl, Shell, or Powershell; 
? Experience with CI/CD tool chain- Git, Jenkins, Azure DevOps. Veracode, SonarQube, JFrog Artifactory; 
? Experience with IaC with Terraform, ARM templates, and/or AWS CloudFormation templates;
? Experience with configuration management tools like Ansible, Puppet and/or Chef; 
? Experience with DBaaS/Managed Cloud database technologies such as CosmosDB, DynamoDB, Managed SQL (RDS, SQL Database), In-memory (Cache for Redis, ElastiCache); 
? Experience with application performance monitoring tools (AppDynamics, Azure application insights, Dynatrace, or Datadog) and log management tools (Azure Monitor’s log analytics, Elastic Stack, and/or Splunk) defining, creating and configuring metrics for dashboards and alerts; 
? Experience with distributed storage technologies like Azure (Blob, Files, Tables), S3, NFS, HDFS; 
? Experience with Web server technologies- HTTP, Nginx, Apache, Tomcat; 
? Experience in Kafka, Azure Event hubs or similar message queue technologies; 
? Experience with Service mesh platforms such as Istio, Hashicorp Consul; 
? Experience with Secrets Lifecycle management (Azure Keyvault, Hashicorp Vault); 
? Experience on minimal or near zero downtime deployments as Blue-Green, Canary, rolling upgrades, etc.; 
? Define and implement HA, DR and rollback strategies along with the product and build teams; 
? Possess proficiency in Networking concepts (HTTP/S, TCP/IP, DNS, Virtual Networks (VNet, VPC), Subnets, Routing, Firewalls, and Network Security, triaging packet loss etc) and knowledge on RESTful APIs;

             

Similar Jobs you may be interested in ..