Job Description :

Role: SRE (Site Reliability Engineering) of Support and Operations team

Location: Irving, TX  (Initial Remote)

Duration: Long Term

 Job Description:  

you will be working on implementing Operation efficiencies and work very closely with Tier1 /Tier2 and Business Operations team on application support, ticket management , CMD/Outage management and proactive alerting , while leveraging Site Reliability Engineering (SRE) practices and principles across all customer/agent interactions and applications. Your role, as part of the Operations team would be to monitor and operate applications in production mission-critical systems and do whatever is necessary to keep the site up and running. You will be responsible for maintaining and establishing service levels agreed upon with Business and manage MTTR and error budgets for each of their systems. You will be expected to balance your time doing operational work (making sure systems work as expected) and also improving the systems by writing software to automate processes and reduce wasted effort.
• Implementing SRE automation, developing automation across the stack, and optimizing operations hours by reducing manual operations.
• Implementing automation across all the layers - infrastructure provisioning, configuration management, deployment, testing, and operation.
• Working on retooling our infrastructure to provide an agile, cloud based foundation that provides common infrastructure management and automation framework.
• Interfacing directly with senior staff members within the organization to discuss and assess compliance with IT policies, standards and procedures, suggest opportunities for improvement, and report on the status of specific.
• Working with development teams throughout the software life cycle ensuring sustainable software releases.
• Practicing sustainable incident response and blameless postmortems.
• Developing automation scripts to automate the creation and maintenance of environments.
• Creating and implement monitoring tools and solutions to provide visibility into the infrastructure and application performance.
• Diagnosing complex system problems using dumps, traces or other diagnostics aids.
• Diagnosing system performance problems using available standard performance tools and system indicators such as queue lengths, CPU utilization and Datadog.
• Looking for opportunities for continual improvements in infrastructure and processes.
• Ensuring scalable process and infrastructure with a focus on high availability.


Required Skills:
• Strong analytical skills, communication, leadership, and presentation skills.
• Bachelors program in Computer Science, Information Systems, Information Technology, Electrical Engineering, Engineering Science and Mechanics, Analytics, Business Administration, Business Intelligence, or related majors.
• 5 or more years of relevant work experience.
• Experience with Web application development using React JS, Angular JS
• Experience with server-side development using Spring Boot, Spring MVC.
• Experience with interfacing technologies such as REST APIs, SOAP & Asynchronous Processing using Kafka.
• Experience in caching technologies like REDIS.
• Knowledge of ORACLE and MongoDB.
• Knowledge & Understanding of DevOps and CI/CD tools for automation of build, packaging, deployment, and testing.
• Experience using source version control products such as GIT
• Knowledge of APM tools like CA Wily, New Relic, Catchpoint, or Datadog.

   , 

Basanthi

US IT Recruiter
RITWIK Infotech, Inc.

Email:

 

38345 W. 10 Mile Rd, Suite 253|| Farmington|| MI 48335|| USA|

 

 **We strengthen Oracle EBS/Fusion Cloud, NetSuite Cloud and BI Partners & End Customers by accelerating career development.**

 

             

Similar Jobs you may be interested in ..