Job Description :
Site Reliability Engineer San Francisco, CA or Charlotte, NC 2 years Phone and Skype Job description, all candidates must have: Excellent verbal communication skills and be able to clearly articulate their skills and experience. Manager references from recent jobs so that I can confirm their skills and past job performance (specifically the managers they directly reported to that were employees of the company they worked for An established LinkedIn profile. Summary: Work with local API development squads, platform teams, product owners, scrum masters, and architects. The SRE ensures that both our internally critical and our externally-visible systems have reliability and uptime appropriate to users' needs while keeping an ever-watchful eye on capacity and performance. Work on decreasing time on operational work and tickets and more time on improving the site performance, availability, and capacity. Responsibilities: Engage in and improve the whole lifecycle of services-from inception and design, through deployment, operation and refinement. Support APIs before they go live through system design review, developing software platforms and frameworks, capacity planning and performance reviews. Maintain APIs once they are live by measuring and monitoring availability, latency and overall system health. Scale systems sustainably through mechanisms like automation, and evolve systems by pushing for changes that improve reliability and velocity. Troubleshoot and mitigate the thorniest problems in our most mission-critical systems. Advise the team during postmortems on effectively avoiding repeated incidents Required Skills: 5+ years of experience designing, analyzing and troubleshooting large-scale distributed systems. Experience in one or more of: C, C++, Java, Perl, Python, Go, or scripting experience in Shell and Perl. Experience working with Unix/Linux systems from kernel to shell and beyond, with experience working with system libraries, file systems, and client-server protocols. Networking experience that includes network theory e.g. TCP/IP, UDP, ICMP, etc., MAC addresses, IP packets, DNS, OSI layers, and load balancing. In-depth knowledge of operating systems (processes, threads, concurrency issues, locks, mutexes, semaphores, monitors and how they work Familiarity with algorithms, data structures and complexity analysis. Systematic problem solving approach, coupled with a strong sense of ownership and drive. Experience with Puppet is a plus Experience with Apigee is a plus. Thanks and Regards, Shivangi Singh | Team Lead | KPG99, INC Certified Minority Business Enterprise (MBE) Direct| |