Job Description :

Recovery & Resiliency Manager (Infrastructure & Production)

Location: Fort Mill, SC (Onsite)

Employment Type: Contract

Industry: Financial Services

About us

We are a leading U.S.-based financial services organization providing investment, wealth management, and advisory solutions to financial advisors, institutions, and banks. With a strong focus on digital transformation and cloud modernization, we invest in secure, scalable, and high-performing technology platforms that enhance advisor productivity, operational efficiency, and client outcomes while maintaining the highest standards of compliance and innovation.

About the role

We are seeking a Recovery & Resiliency Manager to ensure the continuous availability and rapid recovery of critical infrastructure and production systems. We are seeking someone who will lead disaster recovery planning, implement observability and monitoring strategies, manage major incidents, and drive initiatives that strengthen system resilience. We are seeking a leader who can bridge infrastructure engineering and production support to maintain “always-on” operations across on-premises and cloud environments.

General Expectation

1.

Ensure 24/7 availability and rapid recovery of critical infrastructure and production systems.

2.

Proactively identify risks and implement measures to prevent service disruptions.

3.

Lead incident management and coordinate cross-functional teams during outages.

4.

Maintain and test Disaster Recovery (DR) plans, ensuring alignment with business priorities.

5.

Continuously monitor system health, driving improvements in reliability and observability.

6.

Provide executive-level updates on resilience, risks, and recovery readiness.

7.

Enforce compliance with industry standards and internal governance frameworks.

8.

Foster a culture of resilience, accountability, and continuous improvement across teams.

What is needed

·

Proven Experience: 5–10+ years in IT disaster recovery, business continuity, production support, or infrastructure operations.

·

Infrastructure Knowledge: Deep understanding of on-premises (VMware, SAN/NAS, Linux/Windows) and cloud environments (AWS, Azure).

·

DR & Recovery Expertise: Experience defining RTO/RPO, creating DR plans, and leading recovery exercises.

·

Monitoring & Observability: Skilled in tools like Splunk, Datadog, Prometheus, or Grafana.

·

Incident Management: Strong ability to lead major incidents, coordinate cross-functional teams, and drive RCA.

·

Automation & DevOps: Familiarity with Infrastructure-as-Code (IaC) and DevOps practices.

·

Compliance & Governance: Knowledge of ITIL, NIST, ISO 22301, or equivalent standards.

·

Soft Skills: Excellent communication, problem-solving, crisis management, and decision-making under pressure.

Experience

·

5–10+ years in IT disaster recovery, business continuity, or infrastructure resiliency.

·

Proven track record in production support, managing high-severity incidents, and ensuring system availability.

·

Experience coordinating cross-functional teams across network, server, storage, database, and cloud environments.

·

Hands-on expertise with on-premises (VMware, SAN/NAS, Linux/Windows) and cloud platforms (AWS, Azure).

·

Skilled in observability and monitoring tools such as Datadog, Splunk, Dynatrace, Prometheus, or Grafana.

·

Familiarity with ITIL, NIST, ISO 22301, and other compliance or governance standards.

·

Experience implementing automation and Infrastructure-as-Code (IaC) for recovery and resiliency processes.

What’s in It for You

·

Opportunity to lead critical infrastructure resilience programs and make a measurable impact on business continuity.

·

Exposure to hybrid and cloud environments, working with modern infrastructure and observability tools.

·

Chance to drive innovation through automation, Infrastructure-as-Code (IaC), and DevOps practices.

·

Collaborate with cross-functional teams and leadership, enhancing strategic and technical influence.

·

Professional growth in disaster recovery, business continuity, and IT operations leadership.

·

Work in a fast-paced, high-impact environment, developing skills in crisis management and incident resolution.

Apply Today

Take the next step in your engineering career. Join a company that values your ideas, supports your growth, and challenges you to be your best.

Submit your resume and let’s build the future—together.



Client : Lotus Technology Group

             

Similar Jobs you may be interested in ..