Job Description :
Job Title: Senior Site Reliability Engineer (SRE)
Location: New York City, NY (1 New York Plaza, 21st Floor | New York, NY 10004)
Duration: FTE role and 6+ Months C2H


Role:

We are seeking a Senior Site Reliability Engineer (SRE) who can help ensure the health and availability of the Application production and Test environment. This will involve designing and deploying a robust monitoring/alerting strategy, establishing and administering on-call structures for all technical teams, assembling run books and failure points for all primary products and interfaces, and triaging/routing/solving production incidents as they occur.
You will work closely with the Development, Business Analyst, Analytics, and DevOps teams, in determining root causes and permanent solutions to incidents/problems.

Primary Responsibilities:
Influence and drive policies, operational processes, standards/guidelines, and solutions that proactively address issues before they impact system functionality or performance.
Enable high-availability, fault-tolerant infrastructure and systems to support critical products and processes
Design and implement automated monitoring strategy across critical processes, systems, and interfaces; identify the correct routing for each alert and establish thresholds for immediate notification
Establish application logging requirements to enable more effective troubleshooting for issue resolution and identification of root causes
Work with Development, Quality Assurance, and DevOps teams to assemble and cultivate application and tool runbooks and failure points
Provide advanced support for incident resolution for technical problems involving the full application stack
Identify persistent or recurring problems and recommend creative solutions
Own and drive improvement on Mean-Time-To-Repair (MTTR), Mean-Time-Between-Incidents (MTBI), uptime, and bug count metrics
Participate in 24/7 on-call rotation; respond to alerts in a timely fashion, and escalate issues as needed
Identify and lead availability, stability, and reliability improvement projects
Stay abreast of new technologies and practices to further enhance team capabilities and own skills

Requirement Qualifications:
5+ years of hands-on experience in Application Development
3+ years operating in an Agile environment
2+ years working with Microsoft® SQL Server, including using indexes, stored procedures, views, and triggers
2+ years in a 24/7 on-call environment and 2+ years working in Amazon Web Services (AWS)/Azure/GCP.
Bachelor’s degree in Computer Science or related field/equivalent experience

Preferred Qualifications:-
Ability to thrive in a fast-paced, rapid growth environment
Ability to solve problems and learn business rules and processes
Ability to demonstrate strategic, data-driven thinking combined with efficient implementation
Ability to communicate clearly and effectively to both technical peers and business customers
Ability work on a team as well as complete projects and tasks individually
Hands-on, self-starter with a positive attitude and strong work ethic

Technical Skills:-
Required – Amazon Web Services (VPC, EC2, ECS, Lambda, CloudWatch, etc, .NET Core/Java Spring boot, HTML, XHTML, XML, CSS, Bootstrap, JavaScript, JSON, Microsoft SQL Server 2016-2019, Microsoft Excel, Visual Studio 2017/2019, Team Foundation Server, Entity Framework 5/6, Application Performance Management tools like Dynatrace, AppDynamics
Preferred - T-SQL, PowerShell, GitHub, Python, NewRelic, Data Dog, Pager Duty, Splunk, Jira


Client : Morgan Stanley

             

Similar Jobs you may be interested in ..