Job Description :
Monitoring and Automation Engineer

Location: Reston, VA
Duration: 11 months

Description:
Monitoring and Automation Engineer The Tools Operations team is seeking an administrator and automation engineer to support enterprise wide application monitoring and performance systems. The candidate should be skilled and experienced with monitoring.

The Tools Operations team is seeking an administrator and automation engineer to support enterprise wide application monitoring and performance systems. The candidate should be skilled and experienced with monitoring tools such as Riverbed/OPNET AppInternals and AppResponse, CA NimSoft and Spectrum, HP SiteScope, and Icinga along with development and automation experience with Java, Python or any equivalent languages. The monitoring service we provide include network and systems monitoring, database monitoring, synthetic transaction monitoring, and application performance analysis. The qualifying engineer plans, coordinates and implements the product upgrades and maintenance, troubleshoot system related problems, and will be responsible for the 24x7 availability of the tools infrastructure as part of an on-call rotation. Responsibilities also includes maintaining and developing automation scripts, maintenance and development of the monitoring web application, providing tier 3 troubleshooting and restoration of services, enabling additional monitoring capabilities for the infrastructure and applications based on the project requirements.

The ideal candidate will be a highly effective communicator verbally and in writing, lead operational initiatives and projects, and act with the highest sense of accountability. The incumbent will take work assignments from management but is expected to work independently to define, drive, and execute initiatives.

Responsibilities:
- Plan, coordinate and implement the product upgrades, patches and maintenance activities for the monitoring tools (CA – UIM/NIMSOFT, Spectrum, HP SiteScope, OPNET/Riverbed)
- Perform customization, configurations and develop scripts / interfaces as applicable to enhance monitoring capabilities.
- Work with Network, System and Storage administrators for routine operations such as performance tuning, upgrades and backup
- Work with application teams to understand the Java/WebLogic framework and architecture of applications and recommend performance monitoring best practices accordingly.
- Develop and maintain BladeLogic jobs for installation of monitoring agents.
- Work towards automating repeatable processes and tasks using programming scripts.
- Be a highly cohesive team member and a change agent while serving as a subject matter expert (SME
- Maintain all environments and handle all end-to-end aspects of monitoring as a service.
- Engage with projects, drive the deliverables, manage expectations with all stakeholders.
- Analyze application and infrastructure monitoring and performance needs engaging with application owners, design appropriate solutions, and work toward implementation.
- Work closely with server, network, database, and storage administrators for routine operations such as performance tuning, upgrades and backup.
- Plan Disaster Recovery, maintain documentation, and be prepared to conduct periodic testing.
- Setup governance model for application monitoring.
- Maintain Service level agreements with both customer and support organizations.
- Maintain licenses and provides monthly metrics on the tool usage.
- Maintain and document procedures, data profiles, design, and architecture.
- Provide on-call support for troubleshooting critical issues and planned maintenances.
- Lead all change through appropriate release and change management procedures.
- Communicate routinely and effectively to customers, team, inter-team, and management.

Qualifications:
- At least 5 to 7 years of experience in application, systems, and network performance monitoring tools in a large enterprise environment with emphasis on high-availability.
- 2 to 3 years of experience with Java development for automation scripts and web development.
- 5+ years working with application, systems, or network monitoring tools.
- Strong scripting experience (batch/shell/Perl/Python
- Strong System administration skills (Unix/Windows
- Application Development background.
- Solid understanding of Java/J2EE solutions using WebLogic and Tomcat web servers.
- Familiarity with SQL databases and commands and fundamentals of relational database design.
- Experience working with html/web based technologies and the monitoring of it via synthetic test tools.
- Experience in Disaster Recovery planning, documentation, implementation, and periodic testing.
- Strong verbal and written communication skills.
- Must be able to work effectively in a team environment.

Education:
- Technical Degree or the equivalent combination of education, training, and experience.
- 7+ years of progressive experience in a similar environment as described above with at least 3 to 5 years as the subject matter expert.
- Industry certifications are desirable.
             

Similar Jobs you may be interested in ..