Job Description :
In this role you will be responsible for:

Data modeling, coding, analytical modeling, root cause analysis, investigation, debugging, testing and collaboration with the business partners, product managers, architects & other engineering teams.
Adopting and enforcing best practices related to data ingestion and extraction of data from the big data platform.
Extract business data from multiple data sources and store in MapR DB HDFS location.
Work with Data Scientists and build scripts to meet their data needs
Work with Enterprise Data Lake team to maintain data and information security for all use cases
Build automation script using AUTOSYS to automate the loads
Design and develop scripts and configurations to successfully load data using Data Ingestion Frameworks or Ab initio
Coordinate user access requests for data loaded in Data Lake
Post-production support of the AIES Open Source Data Science (OSDS) Platform
Supporting end-to-end Platform application delivery, including Infrastructure provisioning & automation and integration with Continuous Integration/Continuous Development (CI/CD) platforms, using existing and emerging technologies
Provides design and development support for production enhancements, problem tickets and other issue resolution.
Follows SDLC documentation needs for fixes to code
Develops new documentation, departmental technical procedures and user guides
Monitor production execution and respond to failures with processing
Review code execution and recommend optimizations for production processes.
Candidates must:

Be willing to work non-standard hours to support production execution or issue resolution
Be willing to be on-call/pager support for production escalation
Required Qualifications



BS/BA degree
1+ year experience with Ab Initio suite of tools – GDE, Express>IT
3+ years experience with Big Data platforms – Hadoop, MapR, Hive, Parquet
5+ years of ETL (Extract, Transform, Load) Programming with tools including Informatica
2+ years of Unix or Linux systems with scripting experience in Shell, Perl or Python
Experience with Advanced SQL preferably Teradata
Strong Hadoop scripting skills to process petabytes of data
Experience working with large data sets, experience working with distributed computing (MapReduce, Hadoop, Hive, HBase, Pig, Apache Spark, etc.
Possession of excellent analytical and problem-solving skills with high attention to detail and accuracy
Demonstrated ability to transform business requirements to code, metadata specifications, specific analytical reports and tools
Good verbal, written, and interpersonal communication skills
Experience with SDLC (System Development Life Cycle) including understanding of project management methodologies used in Waterfall or Agile development projects
Desired Qualifications



MS/MA degree
Experience with Java and Scala Development
Experience with analytic databases, including Hive, Presto, and Impala
Experience with multiple data modeling concepts, including XML and JSON
Experience with loading and managing data using technologies such as Spark, Scala, NoSQL (MongoDB, Cassandra) and columnar MPP SQL stores (Redshift, Vertica)
Experience with Change and Release Management Processes
Experience with stream frameworks including Kafka, Spark Streaming, Storm or RabbitMQ
Experience working with Cloud Architectures including Amazon Web Services (AWS) Cloud services: EC2, EMR, ECS, S3, SNS, SQS, Cloud Formation, Cloud watch