Job Description :
Role Name: Data Engineer

Location: San Francisco, CA

Duration: 6 Months + Hire

No of openings: 10


Exp: Ideally 4+

They are doing a Redshift to Hive migration, so consultants with either help with that or batch processing initially. It will be some of the grunt work, but they will have the chance to take on more once they prove themselves.
Candidates need to be strong in Python or Java, scalable data pipelines experience, batch processing (real-time or streaming), Hadoop, Spark, Hive, Presto (can be a combination of those skills), and AWS EC2 would be preferred.


Experience & Skills:

· Extensive experience with Hadoop (or similar) Ecosystem (MapReduce, Yarn, HDFS, Hive, Spark, Presto, Pig, HBase, Parquet)

· Proficient in at least one of the SQL languages (MySQL, PostgreSQL, SqlServer, Oracle)

· Good understanding of SQL Engine and able to conduct advanced performance tuning

· Strong skills in scripting language (Python, Ruby, Perl, Bash)

· Experience with workflow management tools (Airflow, Oozie, Azkaban, UC4)

· Comfortable working directly with data analytics to bridge business requirements with data engineering

Responsibilities:

· Owner of the core company data pipeline, responsible for scaling up data processing flow to meet the rapid data growth.

· Consistently evolve data model & data schema based on business and engineering needs

· Implement systems tracking data quality and consistency

· Develop tools supporting self-service data pipeline management (ETL)

· SQL and MapReduce job tuning to improve data processing performance
             

Similar Jobs you may be interested in ..