Job Description :

Job Title: Sr. Site Reliability Engineer
Location: Newark, CA
Duration: FULL-TIME with End Client (Electric Car)
Job Description:

We are looking for a Senior SRE Engineer – Connectivity & Cyber Security, who enjoys thinking big and looking to make their mark on an incredibly fast-growing company. In this role you will maintain, monitor recover a highly scalable infrastructure for connectivity & cybersecurity using AWS Cloud and Kubernetes. If managing large, secure, fast infrastructure,

The Role

Maintain, Enhance and Monitor a highly scalable infrastructure for data processing platform using Kubernetes
Using AWS Cloud and open-source services to address critical business needs
Ensure the 24/7 availability of the system, with proper alerting and monitoring
Identify and fix bugs and performance issues in the platform
Work with agile teams on setting error budgets, root cause analysis exercises, and blameless post-mortems
Utilize continuous delivery (CI/CD) with Gitlab CI, Jenkins, ArgoCD, Artifactory, Docker
Data pipeline and application monitoring and failure recovery
Setup and monitor application access and connectivity
Advocate for a DevOps culture of automation, self-service, and engineering best practices to enable development teams
Autoscaling and monitoring performance for Kubernetes and running applications using Prometheus and Grafana or similar tools
Performing all SRE activities such as availability and reliability monitoring and reports
Tune, Monitor and configure tools such as Kafka, Spark, Presto, Airflow, MQTT
Use infrastructure as a service with Terraform
operate and maintain code repository with GitLab.
Qualifications

Bachelor’s degree in Computer Science (related field) or equivalent work experience
Minimum 5+ years of experience in DevOps engineering or software development.
Strong coding and scripting experience with Bash, Python, Go or similar languages.
Comprehensive experience with AWS including a solid understanding of CI/CD, Amazon S3, EC2, IAM, CloudFormation and Route 53
Experience with optimizing storage classes, lifecycle rules, instance classes, and throughput tuning to optimize for cost without sacrificing performance
Experience with user access, authentication, user permission management and security, LDAP, AD, OIDC, Kerberos
Experience with AWS Direct Connect or setting up and maintaining a hybrid cloud
Experience with secure infrastructure networking with AWS using different types of Load Balancers, working with VPCs, subnets, and routing tables
Experience with secure infrastructure networking with AWS using different types of Load Balancers, setting up VPCs, subnets, and routing tables
Experience with containerization and scheduling, with Docker and Kubernetes.
Strong distributed systems implementation experience
Experience with auto scaling, performance testing and capacity planning.
Experience with tools such as Jenkins, Artifactory, etc. to build automation, CI/CD, Self-Service pipelines.
Experience with configuration management tools: Puppet, Chef, Kustomize, or Ansible
Experience owning infrastructure in production, as well as designing and creating build/deploy & monitoring systems using CloudFormation/Terraform
Experience with restful services, pub/sub communication model, service-oriented architecture, distributed systems, cloud system (AWS) and micro-services architecture pattern.

             

Similar Jobs you may be interested in ..