Job Description :

Job Title: Senior Production Support Engineer 

Job ID: 36871 

Location: Plano, TX 75075 

Duration: 12+ Months with possible extensions 

Interview Process: Phone/WebEx 

Number of Positions: 5 

Additional SitesBothell, WA 98011 

LOCATION: Bothell, WA or Dallas/Plano, TX ONLY. Candidate(s) selected for this role will begin assignment remote work; however, upon COVID-19 restrictions lifting candidate will be required to return to local office on flex work time schedule.  

Required Skills: 

Production Support Engineer: 

Java, Python, and Shell scripts: 

Docker/Kubernetes: 

Cloud (Azure Preferred): 

UNIX/Networking/ troubleshooting: 

Agile/Lean Agile/Scaled Agile: 

Quantum Metric/CatchPoint: 

Dynatrace/AppDynamics/Introscope: 

Kibana/Grafana: 

EFK stack (preferred): 

Required Qualifications   

•Bachelor’s degree in Computer Science or related field 

•5+ years’ experience in Production Support / Operations environment/ Development  

•3+ years’ experience in Java, Python, and Shell scripts  

•2+ years’ experience using Docker, Kubernetes, and Cloud environments 

•2+ years’ experience in working in cloud (Azure Preferred)  

•2+ years of strong UNIX, Networking and troubleshooting knowledge  

•3+ years of experience in Agile, Lean Agile and/or Scaled Agile methodologies  

•2+ years of experience in Customer Experience Analytics tool like Quantum Metric, CatchPoint 

•Solid understands and experience in Application Performance Monitoring tools like Dynatrace, AppDynamics, Introscope, etc. 

•Experience with visualization tools like Kibana and Grafana. EFK stack experience preferred.  

Excellent communication and collaboration skills 

 Preferred Qualifications  

•Kubernetes Certified Engineer or equivalent certification 

•Azure / AWS certification  

•Experience mentoring & training others 

•Experience with Site Reliability Engineering preferred. 

Roles & Responsibilities:  

•  Build software to help operations and support teams - Proactively build and implement services to make operations more effective and reduce toil. This includes adjustments to monitoring and alerting to automating scripts and code in production. Candidate can be tasked with building a homegrown tool from scratch to help with issues in software delivery or resolving impacts from outages/incident. 

•  Fix support escalation issues; Optimize on-call rotations and processes - Improve system reliability through the optimization of on-call processes. Add automation and context to alerts – leading to better real-time collaborative response from on-call responders. Additionally, update runbooks, tools and documentation to help prepare on-call teams for future incidents. 

•  Document “tribal” knowledge - Gain exposure to systems in both staging and production, and take part in work with software development, support, IT operations and on-call duties – to build up historical knowledge over time. Instead of silo-ing this knowledge, ensure constant upkeep of documentation and runbooks to ensure that teams get the information they need right when they need it. 

•  Conducting post-incident reviews - Thorough and transparent post-incident reviews to keep teams honest and ensure that everyone is conducting post-incident reviews, documenting their findings and taking action on their learnings. Take action items for building or optimizing parts of the SDLC or incident lifecycle to bolster reliability of the service.  

•  Develop automation for mission critical applications using scripts, programs 

•  Provide customer impact analysis and troubleshoot complex issues using domain knowledge of AT&T Sales & Ordering flows, applications and downstream interfaces  

•  Support APIs in K8s environment 

•  Contribute to design and implementation of new system layers utilizing principles of high-complexity compute environments. 

•  Provide on-call support for Production customer facing issues 

•  Work with developers, environment teams to identify necessary resources and remove constraints to increase application availability. 

1) Use appropriate programming language and technology, writes code, completes programming and documentation, and performs testing and debugging of applications for the enterprise.  

2) Provide technical and analytical input /guidance to project team and assist developers regarding project architecture and application programming practices.  

3) Manages individual projects and works as an individual contributor; is responsible for completing projects within allotted timeline.  

4) Assists with definition of project scope and objectives, as well as provides technical architecture input and coordinates programming practices of a project team, and identifies resource needs.  

5) Develops detailed work plans, schedules, project estimates and status reports.  

6) Conducts project meetings and is responsible for project tracking and analysis.  

7) Ensures adherences to quality standards and reviews project deliverables.  

8) Recommends and takes action to direct the analysis and solutions of problems. 

             

Similar Jobs you may be interested in ..