Job Description :

Role: 1 Site Reliability Engineer –

location – Mountain View, CA.( Remote for now ),.

 

·        Proactive monitoring of mid to large scale services-based application golden signals end-to-end.

·        Ability to dig into the details and anomalies / unusual patterns to act on.

·        Assess and address anomalies timely, working with stewards and capability owners.

·        Develop required capabilities to improve Operational excellence (like proactive monitoring, dashboards, anomaly, etc.)

·        Improve processes to maintain and improve operational rigor for systems.

 

Role 2 - SRE Consultant

Location – Atlanta, GA (Remote initially)

 

Job Description:

 

Provisioning cloud infrastructure (AWS, GCP) using infrastructure as code (Terraform) :

• Creating pipelines and automation to deploy Dockerized applications onto Kubernetes clusters

• Build, release and configuration management of production systems

• Consulting with engineering teams to help them leverage our platform and tools on which to run their applications

• Developing, deploying, and maintaining tools built from the ground up in support of self-service, quality, security, and compliance initiatives.

• Collaborating with product, architecture, and engineering groups to build a platform that streamlines application developer productivity and throughput

• Participating in metrics gathering, monitoring, and alerting activities, as well as on-call rotations.

• Solving new problems with modern technologies.

• Automating infrastructure builds/configurations

• Build and manage CI/CD pipelines using Jenkins.

• Define, Implement and assign ownership for Stability/Reliability(SLIs, SLOs, Error Budgets)

• Collaboration with tribes/dev teams on Reliability development (Fixes, Logging, Delivery Metrics) Key Skillsets:

• 3+ years of experience developing and/or administering software in public cloud.

• Experience in monitoring infrastructure and application uptime and availability to ensure functional and performance objectives.

• Experience in languages such as Python, Ruby, Bash, Java, Go, Perl, JavaScript and/or node.js

• Demonstrable cross-functional knowledge with systems, storage, networking, security and databases

• System administration skills, including automation and orchestration of Linux/Windows using Chef, Puppet, Ansible, Salt Stack and/or containers (Docker, Kubernetes, etc.)

• Proficiency with continuous integration and continuous delivery tooling and practices

• Experience managing Infrastructure as code via tools such as Terraform or CloudFormation

• Experience in setting up and managing/modifying CI/CD pipelines using Jenkins.

• Significant experience in configuring industry leading infrastructure/application monitoring tools (Stackdriver, Kibana, Grafana, Datadog, Splunk, Dynatrace, AppDynamics etc).
Job Requirements: Kubernetes, Python, BASH, Design Review, Document Code Versioning Script

 

Role 3 : Site Reliability Engineer

Job Location: Denver, CO

 

Experience: 10+ years

Must have skills: SRE, Devops, Media Domain

 

Experience:

BS/MS in Computer Science or closely-related with 5-15 years with Handson Programming experience(Coding Test Mandatory)

 

Job Description:

·        Proficient In SRE, Devops

·        Secondary Skills: C, C++ (Nice to have)

·        Familiar with scripting languages such as python, shell Linux OS concepts, multi-threading, filesystem, makefile -

·        Media know how video codecs H.264, H.265 audio codecs MPEG, AAC

·        Streaming protocols HLS, DASH

·        Experience working with cloud and microservices Kubernetes terminology, administration and monitoring Lambdas

·        EC2 instances Batch Jobs

The desire and ability to learn and adapt is required for this position

 

             

Similar Jobs you may be interested in ..