Job Description :
Senior Site Reliability Engineer

Intelligent Conversation and Communications Cloud (IC3) Carrier Operations Team
Intelligent Conversations and Communications Cloud (IC3) powers billions of real-time customer conversations across Microsoft’s first-party (Teams, Skype) and second-party (Dynamics) solutions. IC3 enables reliable and high-quality audio/video calling, meeting, and messaging services that work every time, from anywhere seamlessly across all customer touchpoints. IC3 makes conversations on our platform more intelligent in real-time empowering the best-in-class productivity tools for the modern workplace where every call, meeting or chat makes the next one better.

As part of the IC3 Carrier Operations SRE team, our mission is to ensure we operate the IC3 PSTN services with end to end high availability, performance and reliability to ensure customer objectives are consistently met or exceeded. To achieve this, we work closely with our product and engineering teams and use a variety of home-grown toolsets aimed at aggressive automation for reliability. We are also a service engineering-focused team running at scale while supporting deployments to support new carriers across the globe.

Responsibilities

Work with team of engineers focused on improving the reliability, scalability, latency, and efficiency of PSTN services powering cloud communications.
Managing problem resolution with service providers.
Learning existing tools, enhancing them to meet new scale and features aimed at reducing manual intervention, enhancing prevention, detection and mitigation of service impacts.
Participate in on-call rotation of the local follow-the-sun team.
Manage incident response and perform root cause analysis investigations.
Reviewing existing processes and driving improvements in order to support scale and excellence of PSTN services.
Analyzing data and providing operational insights into service reliability, customer experience to Design and Product teams.
Participating in recruiting and developing a team of experienced SRE engineers.



Qualifications

Required:

7+ years of experience as a software engineer or site reliability engineer directly supporting development and quality in a product engineering team environment.
5+ years experience shipping distributed systems, services and highly available infrastructure
5+ years experience of scripting/coding using one or more of the following: PowerShell, C#, Python
Expertise with PowerBI – create data models, write queries, creating powerful visualizations
Experience with T-SQL, Kusto Query Language (KQL), Azure Log Analytics, Cosmos


Preferred:

Experience with Microsoft Azure, Azure DevOps, ServiceNow, Microsoft Dynamics or FLOW
Passionate about Site Reliability Engineering Practices
Knowledge/experience of cloud-based distributed systems and micro services architecture.
Knowledge/experience of Internet network architecture and working/functioning principles.
Experience with Voice over IP highly desirable.
Experience analyzing network packet captures and signaling traces
Experience working with SBCs, Media Gateways, Circuit-switched Telephony, SS7, ISDN/ISUP.