Job Description :
Position: pySpark Developer with AWS EMR Experience
Location: Portland, OR
Duration: 12 Months

Mandatory skills:
5+ years of experience in programming with python. Strong proficiency in python
Familiarity with functional programming concepts
3+ years of Hands on experience in developing ETL data pipelines using pyspark on AWS EMR
Hands on experience of XML processing using python
Good understanding of Spark’s RDD API
Good understanding of Spark’s Dataframe and API
Experience and good understanding of Apache Spark Data sources API.
Experience of dealing with AWS S3 object storage from Spark.

Roles & Responsibilities:
Design, development and implementation of performant ETL pipelines using python API (pySpark) of Apache Spark on AWS EMR
Writing reusable, testable, and efficient code
Integration of data storage solutions in spark – especially with AWS S3 object storage.
Performance tuning of pySpark scripts.
Need to ensure overall build delivery quality is good and on time delivery is done at all times.
Should be able to handle meetings with customers with ease.
Need to have excellent communication skills to interact with customer.
Be a team player and willing to work in an onsite offshore model, mentor other folks in the team (onsite as well as offshore)