Job Description :
Responsibilities Operational Performance & Stability: Works with various team to ensure that the in-scope applications/platforms are meeting performance and stability requirements. Managing Major Incidents to Mitigation/Resolution. Problem Management: Performs Post-Incident Reviews of all Major Incidents and determining Action Items required to avoid similar issues/minimize downtime for future Incidents. Monitors and Metrics: Works with Application Development to ensure that assigned applications/platforms have the appropriate monitoring and metrics in place to appropriately measure performance and stability. Identify Functional and Non-Functional Improvements: Acts as the Operations representative in Value Stream planning and prioritizes sessions to ensure that Operational needs of assigned applications/platforms are addressed as needed. Holds quarterly Operational Performance Reviews with Value Stream management. Release Planning & Coordination: Works with SCM and Development team to ensure that the Production releases for their in scope applications/platforms are properly planned and coordinated. This includes Holds Change/Release implementation reviews to ensure thorough and appropriate implementation plans. Provides review and sign-off/approval of change tickets for the assigned Value Stream. Participates in Program Increment Planning Sessions as a liaison for Operations and Infrastructure support. Provides information regarding upcoming critical changes to the Value Stream. Operational Readiness: Ensures that applications/platforms are Operationally ready for Production. This includes Annual Review of all SOPs/Knowledge Articles. Monitors review for any new Feature launch or other significant change that may impact monitoring. SOP/Knowledge Article review for any new Feature launch or other significant change that may impact support documentation. Training of Command Center and Application 1st level Support on new SOPs, Knowledge Articles, and any other support-related needs. Performs Monthly Capacity Analysis of applications/platforms within the Value Stream. Creates and Maintains Operationally focused ELK Dashboards for the Value Stream. Additional responsibilities may include: Actively provide data for and participate in root cause analysis. Share knowledge globally between various teams. Analyze systems and make recommendations to prevent possible incidents. Strive for continuous improvement and make recommendations. Skills: Bachelor's Degree in Computer Information Systems, Computer Science, MIS, Engineering, Science, or related field 4+ years of experience in Information Technology, or related field Experience administering Unix/Linux in a production environment. Understanding of Unix/Linux systems from kernel to shell and beyond, taking in system libraries, file systems, and client-server protocols along the way. Programming experience in one or more of the following languages: Python, Shell, Java. Experience working with and developing enterprise monitoring/tooling solutions like Grafana, Kibana, Splunk, Graphite, Nagios, New Relic, Netcool , Bigpanda . Working knowledge of one or more Devops technologies such as Jenkins , Ansible , puppet . Working knowledge about web technologies such as HTML,JS , CSS , API etc. Networking knowledge and understanding of network concepts, such as different protocols (TCP/IP, UDP, ICMP, etc, MAC addresses, IP packets, DNS, OSI layers, and load balancing