Job Description :


Job Summary:

Our interests are in enabling data science and search-based applications on large and low latent data sets in both a batch and streaming context for processing. To that end, this role will engage with team counterparts in exploring and deploying technologies for creating data sets using a combination of batch and streaming transformation processes. These data sets support both off-line and in-line machine learning training and model execution. Other data sets support search engine-based analytics. Exploration and deployment of technologies activities include identifying opportunities that impact business strategy, collaborating on the selection of data solutions software, and contributing to the identification of hardware requirements based on business requirements

Responsibility also includes coding, testing, and documentation of new or modified scalable analytic data systems including automation for deployment and monitoring. This role participates along with team counterparts to develop solutions in an end-to-end framework on a group of core data technologies.

Job Duties:

  • Significantly contribute to the evaluation, research, experimentation efforts with batch and streaming data engineering technologies in a lab to keep pace with industry innovation while assessing business impact and viability for use cases associated with efforts in hand
  • Work with data engineering related groups to inform on and showcase capabilities of emerging technologies and to enable the adoption of these new technologies and associated techniques
  • Significantly contribute to the definition and refinement of processes and procedures for the data engineering practice
  • Work closely with data scientists, data architects, ETL developers, other IT counterparts, and business partners to identify,
  • capture, collect, and format data from the external sources, internal systems and the data warehouse to extract features of interest
  • Code, test, deploy, monitor, document and troubleshoot data engineering processing and associated automation


  • Proficient in working on the MS Azure Platform
  • Experienced in building Data Lakes on Azure platform.
  • Experience with Data Sore; Data Fabric and being able to bridge the gap between On Prem H3 Data Fabric and Cloud data base
  • Proficient with processing large data sets with Kafka, RabbitMQ, Flume, Hadoop, HBase, Cassandra and/or Spark or similar distributed system
  • Proven track record with NoSQL data stores such as MongoDB, Cassandra, HBase, Redis, Riak or other technologies that embed NoSQL with search such as MarkLogic or Lily Enterprise
  • In-depth knowledge of ETL concepts and Business Intelligence technologies such as Informatica, DataStage, Ab Initio, Cognos, BusinessObjects or Oracle Business Intelligence
  • Strong knowledge of operating systems, applications and associated hardware (e.g., Windows Desktop OSs, Windows Server OSs, OS/400, UNIX/Linux)

Similar Jobs you may be interested in ..