Job Description :

Role: Lead Linux Specialist

Location: Houston, TX (Onsite)

Description:

We are currently seeking an experienced Lead Linux Specialist to join the Linux support team. The ideal candidate will be responsible for troubleshooting, deployment, and operational support of Linux based systems as well as on-going maintenance. As part of the Linux team, the candidate will work closely with business and other IT groups to provide 3rd level support and technical expertise as needed.

Responsibilities include:

  • Support the operation and maintenance of Linux servers, ensuring operational availability & performance, conducting health checks, managing software upgrades, patching (including testing and implementation), system optimization and administration.
  • Monitor server health and performance to identify issues, bugs, or potential improvements
  • Strict adherence to change management processes to ensure changes are properly planned, documented, and deployed
  • Develop, review, and update existing operational documentation (SOPs, application checklists, playbooks, etc)
  • Provide after-hours on-call technical support
  • Collaborate with the Security Operations Center (SOC) team for process optimization, tool tuning & integration, information sharing, playbook development and incident response
  • Implement automated near real-time monitoring of all tools to ensure proper operation and collection of pertinent data
  • Incident and Problem Management; including both during and post-incident, along with Root Cause Analysis
  • Application support, issue management and escalation
  • Perform incident investigation, diagnosis, and resolution
  • Perform system monitoring and remediation

The successful candidate will meet the following qualifications:

  • 7+ years of experience installing, administering, and maintaining Oracle or Red Hat Linux based servers
  • 5+ years of experience designing and implementing redundant systems including data backups/recoveries, high availability, load balancing, and disaster recovery
  • 5+ years of experience designing, analyzing, and repairing large-scale distributed systems
  • Experience with deploying and maintaining AWS and on-premises Linux servers
  • Experience in application deployment automation, modern DevOps practices, and infrastructure as code
  • Experience with IT automation tools such as Ansible Automation Platform, Chef, Puppet, or Terraform
  • Knowledgeable of core IT infrastructure technologies including virtualization, networking, and storage management
  • Technical documentation skills
  • Comfortable interacting with management at various levels in a professional manner
  • Takes ownership of areas of responsibility and makes recommendations and decisions on the improvement and operation of those areas
  • High level of organizational skills
  • Knowledge of and experience with Security Design and Implementation
  • Ability to participate in after-hours on-call rotation
  • Knowledge of backup and recovery methods and verification
  • Knowledge of EMC PowerMax and Isilon storage, including snapshots
  • Excellent written and verbal communications
  • Ability to work in a fast paced, schedule-driven, and customer-oriented environment
  • Experience with Bash, Perl, and Python scripting
  • Experience with LVM including online expansion of file systems

Preferred Qualifications:

  • Experience supporting container-based platforms
  • SUSE Manager for patching Linux servers
  • Red Hat Satellite for patching Linux servers
  • Prometheus and Grafana for system performance monitoring
             

Similar Jobs you may be interested in ..