Site Reliability Engineer

Manassas, VA Manassas VA 20112

Date : Jun-28-21

Manassas, VA

Jun-28-21

Work Authorization

US Citizen
GC
H1B
GC EAD, L2 EAD, H4 EAD, TN EAD

Preferred Employment

Corp-Corp
W2-Permanent
W2-Contract
1099-Contract
Contract to Hire

Job Details

Experience

Expert, Senior

Rate/Salary ($)

Market

Duration

6 months

Sp. Area

C, C++, Middleware, Embedded

Sp. Skills

C/C++

Consulting / Contract

Required Skills :

Bitbucket, C, C++, DNS, FireWall, JAVA, Kibana, Korn Shell, Middleware, MQSeries, Oracle, Perl, TCP/IP

Preferred Skills :

Domain :

IT/Software

Work Authorization

US Citizen
GC
GC EAD, L2 EAD, H4 EAD, TN EAD
H1B

Preferred Employment

Corp-Corp
W2-Permanent
W2-Contract
1099-Contract
Contract to Hire

Job Details

Experience

Expert, Senior

Rate/Salary ($)

Market

Duration

6 months

Sp. Area

C, C++, Middleware, Embedded

Sp. Skills

C/C++

Consulting / Contract

Required Skills :

Bitbucket, C, C++, DNS, FireWall, JAVA, Kibana, Korn Shell, Middleware, MQSeries, Oracle, Perl, TCP/IP

Preferred Skills :

Domain : IT/Software

Napa Analytics LLC
Herndon, VA
Post Resume to
View Contact Details &
Apply for Job

Job Description :

In line with Division objectives and under guidance of a manager, the Site Reliability Engineer develops the methods and measures of analysis based on customer and contractual obligations. Analyzes the reliability in design of company products and services. Co-ordinates technical support/administration for moderate to highly complex systems/databases/applications for internal or external customers ensuring reliability requirements ranging from high to mission critical. Prepares reports, charts and diagrams to disclose results and highlight areas for further investigation. Identify toil and leverage automation to help eliminate it.
Responsibilities

Exert technical influence to improve the reliability of our production products and systems.
Resolution of highly complex problem management issues through investigation and solution development for effective mitigation and prevention of future recurrence by means of process, procedure, or tools improvements.
Execution of production installations including configuration setups, error message handling, and service verification and review of operational procedures in accordance with the established process
Design, develop, test and maintain automation tools for infrastructure and problem management analysis
Provide effective and detailed systems analysis that can contribute to definition of throughput requirements, information and application data flows, hardware and software requirements, and alternative approaches.
Actively lead and participate in design review meetings for medium to large size/complexity/risk projects.
Participate in system/network projects/enhancements by representing the department and providing technical advice/ solutions ensuring adherence to documented processes and procedures and risk mitigation effort
Provide expert on-call support
Regular work on the weekends, mainly on Saturday, in support of production deployments
Interact with network services, software systems engineering and applications development in order to restore availability of services and identify root cause of complex problems
Provide technical guidance, mentorship, and coaching to less senior team members
Remain engaged in industry trends and best practices and share with others on the team and management
Steward reliability as a feature across the organization through concepts such as SLOs and service maturity.

Qualifications

University degree in IT / Engineering or equivalent Experience
At least 6 years of experience in a similar position in a technical support environment including software development / debugging and problem analysis in support of mission critical applications and services.
Professional Knowledge and Skills:

Strong problem solving orientation and skills
Excellent communication skills, both verbally and in writing
Experience with distributed systems with high availability requirements and balancing the service reliability, sustainability, and technical debt for services running at scale
Demonstrated leverage of a methodical and analytical mindset during problem investigations and management of incidents
Ability to work under pressure
Comfort with RHEL and Client-UX
Familiarity with configuration and deployment management software such as BitBucket, Jenkins and Ansible.
Analytics software such as Elastic and Kibana
DB : Proficient working knowledge of Oracle DB
Middleware: Tuxedo, MQSeries
Public Key Infrastructure technologies
Programming languages: Scripting languages, including ksh and Perl
Exposure to languages such as C/C++ and Java.