Job Description :

Job Title: Pyspark Developer

Location: Remote

Duration: Long Term Contract

Job Description:

  • Design and Developed real time streaming pipelines for sourcing data from IOT devices, defining strategy for data lakes, data flow, retention, aggregation, summarization for optimizing the performance of analytics products.
  •  Experience in building large scale batch and data pipelines with data processing frameworks in AWS cloud platform using PySpark (on EMR) & Glue ETL
  • Deep experience in developing data processing data manipulation tasks using PySpark such as reading data from external sources merge data perform data enrichment and load in to target data destinations.
  • Proficiency with Big Data processing technologies (Hadoop, Hive, or Databricks)
  • Experience in deployment and operationalizing the code using CI/CD tools Bitbucket and Bamboo
  • Experience with SQL and relational databases
  • Strong AWS cloud computing experience. Extensive experience in Lambda, S3, EMR, Redshift
  • Experience in Data Warehousing applications, responsible for the Extraction, Transformation and Loading (ETL) of data from multiple sources into Data Warehouse
  • Experience in optimizing Hive SQL quarries and Spark Jobs.
  • Implemented various frameworks like Data Quality Analysis, Data Governance, Data Trending, Data Validation and Data Profiling with the help of technologies like Bigadata,Data Stage, Spark, Python, Mainframe with databases like Netezza and DB2,Hive & Snowflakes
  • Experience with creation of Technical document for Functional Requirement, Impact Analysis, Technical Design documents, Data Flow Diagram with MS Visio.
  • Having experience in delivering the highly complex project with Agile and Scrum methodology.
  • Quick learner and up-to-date with industry trends, Excellent written and oral communications, analytical and problem-solving skills and good team player, Ability to work independently and well-organized.

Similar Jobs you may be interested in ..