Job Description :
Mandatory Skills :pySpark, Python, Spark, ETL / ELT architecture.
Job Description
· The Senior Data Engineer will support and provide expertise in data ingestion, wrangling, cleansing, technologies. In this role they will work with relational and unstructured data formats to create analytics-ready datasets for analytic solutions.
· The senior data engineer will partner with the Data Analytics team to understand their data needs and build data pipelines using cutting edge technologies.
· They will perform hands-on development to create, enhance and maintain data solutions enabling seamless integration and flow of data across our data ecosystem.
· These projects will include designing and developing data ingestion and processing/transformation frameworks leveraging open source tools such as Python, Spark, pySpark, etc.
Responsibilities:
· Translating data and technology requirements into our ETL / ELT architecture.
· Develop real-time and batch data ingestion and stream-analytic solutions leveraging technologies such as Kafka, Apache Spark, Java, NoSQL DBs, AWS EMR.
· Develop data driven solutions utilizing current and next generation technologies to meet evolving business needs.
· Develop custom cloud-based data pipeline.
· Provide support for deployed data applications and analytical models by identifying data problems and guiding issue resolution with partner data engineers and source data providers.
· Provide subject matter expertise in the analysis, preparation of specifications and plans for the development of data processes.
Qualifications:
· Strong experience in data ingestion, gathering, wrangling and cleansing tools such as Apache NiFI, Kylo, Scripting, Power BI, Tableau and/or Qlik
· Experience with data modeling, data architecture design and leveraging large-scale data ingest from complex data sources
· Experience building and optimizing ‘big data’ data pipelines, architectures and data sets.
· Advanced SQL knowledge and experience working with relational databases, query authoring (SQL) as well as working familiarity with a variety of databases.
· Strong knowledge of analysis tools such as Python, R, Spark or SAS, Shell scripting, R/Spark on Hadoop or Cassandra preferred.
· Strong knowledge of data pipelining software e.g., Talend, Informatica