Job Description :

Design and Developed real time streaming pipelines for sourcing data from IOT devices, defining strategy for data lakes, data flow, retention, aggregation, summarization for optimizing the performance of analytics products.
Experience in building large scale batch and data pipelines with data processing frameworks in AWS cloud platform using PySpark (on EMR) & Glue ETL
Deep experience in developing data processing data manipulation tasks using PySpark such as reading data from external sources merge data perform data enrichment and load in to target data destinations. 
Proficiency with Big Data processing technologies (Hadoop, Hive, or Databricks)
Experience in deployment and operationalizing the code using CI/CD tools Bitbucket and Bamboo
Experience with SQL and relational databases
Strong AWS cloud computing experience. Extensive experience in Lambda, S3, EMR, Redshift
Experience in Data Warehousing applications, responsible for the Extraction, Transformation and Loading (ETL) of data from multiple sources into Data Warehouse
Experience in optimizing Hive SQL quarries and Spark Jobs.
Implemented various frameworks like Data Quality Analysis, Data Governance, Data Trending, Data Validation and Data Profiling with the help of technologies like Bigadata,Data Stage, Spark, Python, Mainframe with databases like Netezza and DB2,Hive & Snowflake
Experience with creation of Technical document for Functional Requirement, Impact Analysis, Technical Design documents, Data Flow Diagram with MS Visio.
Having experience in delivering the highly complex project with Agile and Scrum methodology.
Quick learner and up-to-date with industry trends, Excellent written and oral communication



Client : Financial

             

Similar Jobs you may be interested in ..