Role: 1 Site Reliability Engineer –
location – Mountain View, CA.( Remote for now ),.
· Proactive monitoring of mid to large scale services-based application golden signals end-to-end.
· Ability to dig into the details and anomalies / unusual patterns to act on.
· Assess and address anomalies timely, working with stewards and capability owners.
· Develop required capabilities to improve Operational excellence (like proactive monitoring, dashboards, anomaly, etc.)
· Improve processes to maintain and improve operational rigor for systems.
Role 2 - SRE Consultant
Location – Atlanta, GA (Remote initially)
Job Description:
Provisioning cloud infrastructure (AWS, GCP) using infrastructure as code (Terraform) :
• Creating pipelines and automation to deploy Dockerized applications onto Kubernetes clusters
• Build, release and configuration management of production systems
• Consulting with engineering teams to help them leverage our platform and tools on which to run their applications
• Developing, deploying, and maintaining tools built from the ground up in support of self-service, quality, security, and compliance initiatives.
• Collaborating with product, architecture, and engineering groups to build a platform that streamlines application developer productivity and throughput
• Participating in metrics gathering, monitoring, and alerting activities, as well as on-call rotations.
• Solving new problems with modern technologies.
• Automating infrastructure builds/configurations
• Build and manage CI/CD pipelines using Jenkins.
• Define, Implement and assign ownership for Stability/Reliability(SLIs, SLOs, Error Budgets)
• Collaboration with tribes/dev teams on Reliability development (Fixes, Logging, Delivery Metrics) Key Skillsets:
• 3+ years of experience developing and/or administering software in public cloud.
• Experience in monitoring infrastructure and application uptime and availability to ensure functional and performance objectives.
• Experience in languages such as Python, Ruby, Bash, Java, Go, Perl, JavaScript and/or node.js
• Demonstrable cross-functional knowledge with systems, storage, networking, security and databases
• System administration skills, including automation and orchestration of Linux/Windows using Chef, Puppet, Ansible, Salt Stack and/or containers (Docker, Kubernetes, etc.)
• Proficiency with continuous integration and continuous delivery tooling and practices
• Experience managing Infrastructure as code via tools such as Terraform or CloudFormation
• Experience in setting up and managing/modifying CI/CD pipelines using Jenkins.
• Significant experience in configuring industry leading infrastructure/application monitoring tools (Stackdriver, Kibana, Grafana, Datadog, Splunk, Dynatrace, AppDynamics etc).
Job Requirements: Kubernetes, Python, BASH, Design Review, Document Code Versioning Script
Role 3 : Site Reliability Engineer
Job Location: Denver, CO
Experience: 10+ years
Must have skills: SRE, Devops, Media Domain
Experience:
BS/MS in Computer Science or closely-related with 5-15 years with Handson Programming experience(Coding Test Mandatory)
Job Description:
· Proficient In SRE, Devops
· Secondary Skills: C, C++ (Nice to have)
· Familiar with scripting languages such as python, shell Linux OS concepts, multi-threading, filesystem, makefile -
· Media know how video codecs H.264, H.265 audio codecs MPEG, AAC
· Streaming protocols HLS, DASH
· Experience working with cloud and microservices Kubernetes terminology, administration and monitoring Lambdas
· EC2 instances Batch Jobs
The desire and ability to learn and adapt is required for this position