Job Description :
Role: pySpark Developer with AWS EMR Experience
Work Location: Portland, OR
Duration: 12 Months

Technical/Functional Skills (Mandatory skills):
5+ years of experience in programming with python. Strong proficiency in python
Familiarity with functional programming concepts
3+ years of Hands on experience in developing ETL data pipelines using pyspark on AWS EMR
Hands on experience of XML processing using python
Good understanding of Spark’s RDD API
Good understanding of Spark’s Dataframe and API
Experience in configuring EMR clusters on AWS
Experience and good understanding of Apache Spark Data sources API.
Experience of dealing with AWS S3 object storage from Spark.
Experience in trouble shooting spark jobs. Knowledge of monitoring spark jobs using Spark UI
Performance tuning of Spark jobs.
Understanding fundamental design principles behind a business processes

Nice to have skills:
Knowledge of AWS SDK CLI
Experience of setting up continuous integration/deployment of spark jobs to EMR clusters
Knowledge of scheduling spark applications in AWS EMR cluster.
Understanding the differences between Hadoop Mapreduce and Apache Spark
Proficient understanding of code versioning tools as Git, SVN

Roles & Responsibilities:
Design, development and implementation of performant ETL pipelines using python API (pySpark) of Apache Spark on AWS EMR
Writing reusable, testable, and efficient code
Integration of data storage solutions in spark – especially with AWS S3 object storage.
Performance tuning of pySpark scripts.
Need to ensure overall build delivery quality is good and on time delivery is done at all times.
Should be able to handle meetings with customers with ease.
Need to have excellent communication skills to interact with customer.
Be a team player and willing to work in an onsite offshore model, mentor other folks in the team (onsite as well as offshore)