Job Description :
Senior Site Reliability Engineer Need this asap Local only No out of state candidates. Length of service: 1 year Number of resources: 1 City: Columbus Start date of resource: ASAP Interview Process; Can be skype/phone Essential Accountabilities: Partner with engineering, security and product teams to improve the availability, scalability and efficiency of our products Design, develop, deploy, monitor and support large-scale production systems and event-driven services hosted within AWS Lead technology initiatives that drive scalability and reliability improvements Build tools and automation that eliminate repetitive tasks, minimize downtime and achieves human free operations Participate in 24 7 operation support and on-call rotation Skills and Qualifications: Intermediate proficiency in one of the following programming languages: Java, Go or C# Strong experience in Linux systems administration You can design, build and support highly available production systems in AWS using technologies like ELB, EC2, ECS, RDS, Elasticache, Cloudfront, S3, IAM, Route 53 and DynamoDB Experienced using automation technologies like Terraform, Puppet and Packer Proficient with APM, infrastructure and log aggregation tooling to monitor system health and customer experience (e.g. New Relic, Cloudwatch, Datadog, Sumologic, ELK) You have participated in a 24x7 on-call rotation with your team and responded to incidents A proven track record of diagnosing and fixing time sensitive and critical production issues Jenkins and/or CircleCI continuous integration build, package, release and deploy Git, GitHub, and GitFlow skills Experience operating common middleware (e.g., Apache, NGINX, Tomcat, JBoss) Solid understanding of DNS, DHCP, SSH, HTTP, TCP/IP and other common network protocol Description: Hands-on design, analysis, development and troubleshooting of highly-distributed large-scale production systems and event-driven services spanning on-prem and AWS based hosting Ownership of reliability, uptime, system security, cost, operations, capacity and performance-analysis Share a 24x7 on-call rotation with your team and respond to incidents; lead triage bridges during incidents and provide needed status updates Create and maintain monitoring, alerting and dashboarding solutions that improve the visibility into our applications' performance and business metrics and keep operational workload in-check. Use automation technologies to ensure repeatability, eliminate toil, reduce time to action and repair services Participate in technical training events and game day scenarios Partner with engineering, security, performance, qa and product management teams to improve the availability and quality of service of our products Required Skills: Strong Linux administration/build/management skills Development experience in at least one of these languages: Java, Go, C# and/or Python; Strong skills in reading, understanding and writing code in the same Demonstrated expertise building and managing highly scaled production infrastructure in on-prem and AWS based environments Extensive experience troubleshooting n-tier architectures with diverse sets of technologies strongly desired. (e.g. load balancers, web/app/caching/database servers, queues, threading, memory, cpu, heap, storage, network, os) Strong experience using application and infrastructure monitoring systems (like Splunk, Cloudwatch, Datadog, New Relic, Sumologic, ELK) Excellent presentation and communication skills Mastery of infrastructure automation technologies (like Terraform, Puppet, Ansible, Chef) Expertise with continuous deployment based software development lifecycles (e.g. CI/CD) Experience with common middleware (e.g., Apache, NGINX, IIS, Tomcat, JBoss) Experience with SQL databases (e.g., PostgreSQL, Oracle, MySQL) Expertise with SDLC branching, SCM, and code deployment systems (git/gitflow, Jenkins, CircleCI, TravisCI, etc Expertise in container/container-fleet-orchestration technologies (like Docker, Vagrant, Mesosphere) BS Degree in Computer Science (or related technical field and/or equivalent industry experience) Big Pluses Database administration skills (AWS Aurora, MySQL, Postgres, Oracle) Have leveraged deployment strategies such as blue-green and canary Experience building RESTful services and/or web applications Experience automating software deployments and following a continuous delivery and deployment model Experience with system analysis and troubleshooting in large-scale Linux environment People who have been successful in this role: Passionate and adept at software development and/or system engineering Love to understand how new technologies and architectures work, educate coworkers and channel their knowledge into improving system reliability and performance Continuously learning about application scalability, availability, reliability, and security Intensely curious about how complex distributed systems operate and fail at scale Think freely and independently, and are ready to share their views Eager to learn from mistakes and socialize the lessons learned Like to take ownership of infrastructure components and leading projects With Regards, Sourabh Kumar