Position : Data Engineer (AWS) Location: Sunnyvale, CA (2, 3 Months Remote) Job Description: - 8+ years professional experience, experience supporting and working with cross-functional teams
- At least four (4) years of experience supporting AWS cloud infrastructure deployments (e.g. JupyterHub, Airflow)
- Experience with AWS cloud services: EC2, EMR, RDS, Redshift, Athena, Glue, Sagemaker
- At least three (3) years of technical architecture experience integrating identity management, access management and access governance software into cloud infrastructure and applications
- Proficient with Identity and Access Management with background in oAuth 2.0, OpenID connect, SAML, Single Sign On, multi-tenancy and API authorization/ access management
- Proficient with containerization and cluster management technologies like Docker and Kubernetes
- Experience with workflow scheduling/ orchestration such as Airflow or Oozie
- Expertise with infrastructure-as-code tools, such as Ansible, Chef, Terraform, or CloudFormation
- Experience with Python packaging and conda environments
- Experience with revision control systems like GitHub, CI/CD unit testing, and configuration management systems
- Experience developing, maintaining and debugging distributed systems
- Proficiency with relational databases (e.g. SQL Server, Oracle, Postgres)
- Experience using one or more scripting languages (e.g., Python, bash, etc.)
- Unix-based command line experience required
- Experience with applying data encryption and data security standards
- Ability to quickly learn new and existing technologies
- Strong attention to detail and excellent analytical capabilities
- Most importantly, a sense of humor and an eagerness to learn!
- Extract Transform Load (ETL) experience using Spark, Kafka, Hadoop, or similar technologies
- SQL expertise, data modeling, and relational database experience required
- Presto, Hive, SparkSQL, Cassandra, Solr, or other big data query and transformation experience
- Ability to design and implement effective testing and operations strategies for data products
- Experience implementing machine learning and data science workloads is a plus
- Data visualization experience using R, Python, or Tableau.
|