Remote - Site Reliability Engineer (C++, Python, Prometheus)

San Francisco, CA San Francisco CA 94188

Date : Apr-28-21

San Francisco, CA

Apr-28-21

Work Authorization

US Citizen
GC
H1B
GC EAD, L2 EAD, H4 EAD

Preferred Employment

Corp-Corp
W2-Permanent
W2-Contract
1099-Contract
Contract to Hire

Job Details

Experience

Expert, Senior, Midlevel

Rate/Salary ($)

Market

Duration

6 Months

Sp. Area

Python, Open Source

Sp. Skills

x-Other

Permanent Direct Hire

Consulting / Contract

Remote Work from Home

Required Skills :

C++, Python, Blockchain, unix, DevOps, DNS, Linux, Perl, PHP, Security, TCP/IP

Preferred Skills :

Domain :

IT/Software, Financial, Government, HealthCare, Retail, Dot Com, Insurance, Pharmaceuticals, Manufacturing, Telecom

Work Authorization

US Citizen
GC
GC EAD, L2 EAD, H4 EAD
H1B

Preferred Employment

Corp-Corp
W2-Permanent
W2-Contract
1099-Contract
Contract to Hire

Job Details

Experience

Expert, Senior, Midlevel

Rate/Salary ($)

Market

Duration

6 Months

Sp. Area

Python, Open Source

Sp. Skills

x-Other

Permanent Direct Hire

Consulting / Contract

Remote Work from Home

Required Skills :

C++, Python, Blockchain, unix, DevOps, DNS, Linux, Perl, PHP, Security, TCP/IP

Preferred Skills :

Domain : IT/Software, Financial, Government, HealthCare, Retail, Dot Com, Insurance, Pharmaceuticals, Manufacturing, Telecom

InfoObjects Inc
Santa Clara, CA
Post Resume to
View Contact Details &
Apply for Job

Job Description :

Site Reliability Engineer

Location: SF – 100% remote

Duration: 6 month CTH

Must haves:

RUST or C++ - - if no RUST be open to learning
Python – scripting
5+ years of experience
Prometheus/Grafana and ELK – monitoring
networking protocols

Product Launch – in May

Launching new products, fully open and they can by their block chain products, buy their tokens.
Will have an event for the new launch
Blockchain company – when engineers want to do coding on internet – its on AWS, AWS gets to decide.
They are reinventing the internet – if they use this platform they do not need to use AWS

Job Description:

The SRE team is charged with creating tools, processes, and frameworks that ensure the stability of the Internet Computer, which is distributed and scalable.
As a member of the team you will work with engineering, infrastructure, and security teams to bake reliability and operability into the product from the start, by participating in design and code reviews, identifying risks, problems, and mitigations.
This is not a team that exists to be on-call; this is a team that elects to be on-call because it helps do the job better.

Responsibilities:

Implement tools that ensure high availability of our product
Gain deep knowledge of our complex applications
Identify opportunities to automate or improve processes and then implement the automation
Coordinate incident response across multiple teams -- clearly understanding and communicating what is going on, next steps, who is responsible for what, and so on
Implement observability tools to ensure visibility into service stability and performance
Be on-call for production services
Operating, troubleshooting, and deploying software to Unix systems
Thinking about things in a systemic, methodical way, especially when troubleshooting

Required Skills:

Expertise in observability and monitoring of applications, services, and networks, using tools such as Prometheus/Grafana and ELK logging
Unix/Linux experience, including application installation, configuration, and maintenance
Significant experience with site reliability, developer productivity, devops, or server infrastructure engineering (including on call incident response)
Understanding of Internet networking protocols: TCP/IP, TLS, DNS, HTTP/S, SMTP
Experience troubleshooting issues across the entire stack (hardware, software, network, etc)
Experience writing automation scripts and utilities in a scripting language such as Python, Perl, Shell, PHP, etc
Experience with incident and problem management.
Strong communication and interpersonal skills