Site Reliability Engineer (SRE)

Plano, TX Plano TX 75094

Date : Jun-13-22

Plano, TX

Jun-13-22

Work Authorization

US Citizen
GC
H1B
GC EAD, TN EAD

Preferred Employment

Corp-Corp
W2-Permanent
W2-Contract
1099-Contract
Contract to Hire

Job Details

Experience

Architect

Rate/Salary ($)

Duration

6 Months

Sp. Area

Project, Product Management, Dev Ops

Sp. Skills

[CICD] Continuous integration, Build, Deploy

Consulting / Contract

Required Skills :

Google Cloud Platform, JAVA, Kubernetes, Linux, Ubuntu, Windows Azure, Apache, C++, Continuous deployment, DNS, Elasticsearch, Go, Groovy, JSON, Kiban

Preferred Skills :

Domain :

IT/Software

Work Authorization

US Citizen
GC
GC EAD, TN EAD
H1B

Preferred Employment

Corp-Corp
W2-Permanent
W2-Contract
1099-Contract
Contract to Hire

Job Details

Experience

Architect

Rate/Salary ($)

Duration

6 Months

Sp. Area

Project, Product Management, Dev Ops

Sp. Skills

[CICD] Continuous integration, Build, Deploy

Consulting / Contract

Required Skills :

Google Cloud Platform, JAVA, Kubernetes, Linux, Ubuntu, Windows Azure, Apache, C++, Continuous deployment, DNS, Elasticsearch, Go, Groovy, JSON, Kiban

Preferred Skills :

Domain : IT/Software

cbittech.com
Scotch Plains, NJ
Post Resume to
View Contact Details &
Apply for Job

Job Description :

Job Title: Site Reliability Engineer (SRE) (multiple openings)

Location: Plano, TX (Need Day 1 onsite or within a month or two)

Mandatory Skills:

Jenkins, Puppet, Dynatrace, AppDynamics, Kubernetes, monitoring tools, cloud, AWS, Java, microservice, Ubuntu(Linux), Maven, Grafana

Responsibilities:

? Responsible for how code is deployed, configured, and monitored, as well as the availability, latency, change management, emergency response, and capacity management of services already in / going to

production.

? Design, code, test and deliver software to automate manual operational work, develop self-service, auto-detection and healing

? Develop software for reliability and scale, ensuring minimal refactoring or changes

? Define, monitor and defend SLOs

? Deploying closed-loop remediation – continuous testing and remediation—to fix problems in pre-production before software is released to production.

? Build custom tooling from scratch to meet specific needs in the incident management workflow.

? Complex incident resolution across public cloud, private cloud, 3rd party, and on-premise tech.

? Leverage Chaos Engineering to find and prevent future problems and to confirm fixes from past incidents function as intended.

? Focus on end-user experiences and partner with development teams to implement changes to increase uptime and performance based on empirical evidence.

? Troubleshoot priority incidents, facilitate blameless post-incident evaluations and ensure permanent closure of incidents

? Identify application patterns and analytics in support of better service level objectives

? Design performance tests, identify bottlenecks and opportunities for optimization and capacity demands, and present solutions for continuous improvements

? Design best in class monitoring frameworks to accomplish end-to-end flow monitoring and noiseless alerting

? Design automated software and product upgrades, change management and release management solutions

Skills/Qualifications

? Bachelor’s degree or equivalent experience in a software engineering discipline

? 2-3 years of SRE or System Engineering experience.

? Expert in at least one technology stack designing, coding, testing, delivering software e.g., Java, Python, C++, Go, etc.

? Deep knowledge of Internet protocols and web services technologies e.g., HTTP, DNS, TCP/UDP, SOAP, JSON, Apache, Tomcat and REST

? Experience working with containers e.g., Docker, Kubernetes, Cloud Foundry, etc.

? Experience in working with automation tools e.g., Ansible, Puppet, Selenium etc.

? In-Depth OS Experience e.g., RHEL, Ubuntu, Windows Server with strong debugging, troubleshooting, and problem-solving skills

? Testing and build automation with a continuous integration/continuous delivery (CI/CD) pipeline e.g., Travis CI, Maven, Gradle, Groovy, Git, Terraform, Jenkins etc.

? Experience deploying and managing services on modern platforms e.g., AWS, GCP, Azure.

? Strong experience in using industry standard monitoring tools e.g., AppDynamics, Dynatrace, APICA, Splunk, ELK, FluentD, Prometheus, Kibana, Elasticsearch, Grafana, Nagios, Datadog, New Relic, etc.

? Advanced understanding of application monitoring stack (Logs, Events Metrics & Alerts) and ability to visualize and setup end-to-end observability

? Certified in one or more cloud technology e.g., AWS, Azure, GCP or RedHat is a big plus