Job Description :
Automation first mindset with proficiency in programming languages such as Ruby, Python, or JSON Engage in and improve the entire lifecycle of services from inception and design, through deployment, operation and refinement Support and implement techniques such as chaos engineering and slowing snail Support services before they go live through activities such as system design consulting, developing software platforms and frameworks, capacity planning and launch reviews Maintain services once they are live by measuring and monitoring availability, latency and overall system health Scale systems sustainably through mechanisms like automation, and evolve systems by pushing for changes that improve reliability and velocity Practice sustainable incident response and blameless postmortems; Fixing support escalation issues Proficiency in config management tools such as Chef and Ansible to efficiently manage the fleet of servers Load balancing the application including Proxies and CDN Proficiency in monitoring and Metrics in tools such as Prometheus, Grafana and integrations with Slack/ PagerDuty etc. Contribute and leverage the reusable artefacts from repositories such as GitLab Understanding of SRE Principles and implementing in practice