Job Description :
Job Title : Site Reliability Engineer
Location: Reston, VA
Job Description :
Design and manage OpenStack, private cloud environment. Deploy instances, create security groups, manage volumes, user creation and migrate instances from one environment to another.
Deploy, manage, create templates, and migrate servers across clusters in a VMware/vSphere environment.
Manage services such as LDAP for user''s authentication, Nginx, HAproxy and HTTP for proxy, GlusterFS and NFS for volume sharing, SSL and SELinux for security, SMTP for mail, DNS for name resolution, F5 for load balancing,palo Alto firewall and DHCP.
Develop and create QRH (quick reference handbook) documentations used by tier1 SRE for first respond troubleshooting in-order to maintain little or no downtime.
Troubleshoot, track, and resolve complex issues on all the Linux servers in the environment by reading log messages and making positive decisions.
Installation, Maintenance, Administration and troubleshooting of Physical and Virtual server builds for Redhat/Oracle Linux.
Deploy software and use automation tools such as ansible, Gitlab (CI/CD),bash scripting and most other UNIX shell scripting to automate the
processes.
Troubleshoot server and software related issues by reading log files and use good practices such as log centralization through Elasticsearch and Kibana to ease the process
Deploy and manage MySQL databases to ensure high availability and high-speed functioning of all applications.
Broad experience in utilizing cloud storage, S3, MongoDB, DynamoDB, Amazon kinesis, OpenShift for analytics, etc.
Deploy and manage containers using DCOS/docker-UCP and migrate them from one environment to another.
Experience in OpenStack components Neutron, Nova, Cinder, Horizon, Ceilometer, Heat, Glance and Swift
Experience in DevOps process using Jenkins, Ansible for On Demand OpenStack deployment. Experience with OpenStack Command line
Manage users on active directory and configure single sign on several application and Linux servers.
Deploy and manage servers in a RevM/KVM environment and participate in daily scrums, track and triage issues
Create and manage user accounts in active directory and use them to authenticate applications in- order to promote single sign on.
Deploy and manage MySQL clusters, recover lost nodes within clusters, Configure MySQL bin-log replication.