Job Description :
Developing SRE capabilities to meet SLI/SLO/SLA requirements.  
Drive engagements with Development and Business Teams to define key Business and
system metrics    
Drives reliability activities utilizing SRE/DevOps concepts.  
Reduce Mean Time to Detect (MTTD), Mean time to resolve (MTTR), increase system availability and reduce overall incidents. 
Implements Observability using tools like Azure monitor and Dynatrace
Creates error budget for each component, availability dashboard and sets up fast burn
and slow burn alerts.
Developing software components that will be consumed by SREs using .NET and Microsoft Azure.
Backgorund in architectural level work.
Develops dashboards, alerts, and monitoring for various systems using Azure dashboards, workbooks and PowerBI    
Designs codes, tests, and implements automation for manual tasks using C#.NET, PowerShell
and Azure CLI following GIT process. (Or using Power Automate)    
Develops Self-healing capabilities for key business processes using Azure Runbooks
Exports telemetry from transactional systems to data lake using Azure Data Factory and
Azure Data Explorer
Performs chaos engineering by artificially injecting faults in systems to simulate SLO
Coordinates structured walkthroughs and technical reviews ensuring reliability, resiliency,
and scalability    
Ensures overall quality by continuous monitoring in development cycle.     
Identifies performance bottlenecks in an architecture and proposes solutions  
Mentor and coach other members of the team. (SREs & Support)     
Coordinates feasibility studies/proofs of concept to evaluate solutions      
Works with ITS Security and Infrastructure teams to ensure cloud-based systems and
programs are secure.
Works within the ITIL framework and SAFe.     
Actively participates in all team Agile ceremonies.      
Works with Support teams to identify reliability measures for P1s      
Takes part in SRE governance and lead Community of Practice activities.

Similar Jobs you may be interested in ..