Job Description :
AWS with Big Data

Philadelphia, PA

6 - 12 months contract



Job Description:

The person will be mainly developing the Big Data solutions in AWS to cover, and not limited to,

· Journaling process on historical data

· Realtime Ingress

· Realtime Egress

· Interface with client project sponsors to gather, assess and interpret client needs and requirements

· Develop a data model and Data Lake design around stated use cases to capture client’ s KPIs and data transformations

· Identify one or more relevant AWS services - especially on Amazon EMR and/or Databricks, RDS, Redshift, EC2, S3, VPC, IAM, AWS Monitoring, Cloud Formation, Kubernates - and an architecture that can support client workloads/use-cases; evaluate pros/cons among the identified options before arriving at a recommended solution optimal for the client’ s needs.

· Be able to explain to the client the trade-offs among the various AWS options, and why the recommended solution(s) and architecture was chosen as an optimal one for the client’ s needs.

· Work closely with the client and broader Architecture, Platform, Delivery team to implement in Agile fashion the architecture and chosen AWS services using AWS Best Practices and principles from the AWS Well-Architected Framework

· Assess, document and translate goals, objectives, problem statements, etc. to our offshore team and onshore management

· Advising on database performance, altering the ETL process, providing transformations in SQL or Map Reduce, discussing API integration, and deriving business and technical KPIs

· Help transition the implemented solution into the hands of the client, including providing documentation the client can use to operate and maintain the solution.

· Help organization with its Continuous Improvement processes to learn from each customer project, including doing project retrospectives and writing up “ Lessons Learned”.

· Strong Design / Development Experience on Amazon EMR and/or Databricks, preferably with Spark (PySpark, Scala)

· Strong troubleshooting / admin experience with EMR – specific infrastructure (CloudFormation) code, deployment via AWS CLI, and bootstrap actions.

· Ability to implement transient infrastructure (e.g. transient EMR clusters) that leverages decoupled storage (S3) and compute. Implement these using reproducible automated mechanisms like AWS CLI scripts, CloudFormation templates, and custom code leveraging AWS SDKs.

· Strong experience on one or more MPP Data Warehouse Platforms preferably Amazon EMR (incl. Presto), Amazon Athena, AWS RedShift, PostgreSQL, Teradata or similar

· Possess in-depth working knowledge and hands-on development experience in building Distributed Big Data Solutions including ingestion, caching, processing, consumption, logging & monitoring

· Strong Development Experience on at least one or more event-driven streaming platforms preferably Kinesis, Firehose, Kafka, Spark Streaming, or Apache Flink

· Strong Data Orchestration experience using one or more of these tools: AWS Step Functions, Lambda, AWS Data Pipeline, AWS Glue orchestration, Apache Airflow, Luigi or related

· Strong understanding and experience with Cloud Storage infrastructure, and operationalizing AWS-based storage services & solutions preferably S3 or related

· Strong technical communication skills and ability to engage a variety of business and technical audiences explaining features, metrics of Big Data technologies based on experience with previous solutions

· Strong Understanding of at least one or more Cluster Managers (YARN, Hive, Presto, Kubernates, Pig, etc)



Nice to Have:

· Strong Data Cataloguing experience preferably using AWS Glue or Other

· Strong Development Experience on at least one NoSQL OR Document databases

· Experience on at least one or More Ingestion Integration tools Like Hortonworks, Globbin or Streamset or related

· Strong Development Experience on at least one Caching Tool like Amazon Elasticache (with Redis or Memcached) or Lucene

· Strong Understanding and experience in Big Data Audit Logging and Monitoring solutions like AWS CloudTrail and CloudWatch.



Additional Qualifications:

· 5+ years of AWS Solutions implementation, professional services experience, prefer real-time data processing and analytics.

· Hands on with AWS development frameworks, Python, Scala, Java, Node.js, and etc.

· Proven analytical, problem solving, and troubleshooting expertise.

· Proficiency in SQL, preferably across a number of dialects (we commonly write MySQL, PostgreSQL, Redshift, SQL Server, Presto, Hive, SparkSQL, and Oracle

· Exposure to developer tools/workflow (e.g., git/github, *nix, SSH)

· Experience optimizing database/query performance.

· Experience with AWS ecosystem (EC2, S3, RDS, Redshift

· Experience with business intelligence tools with a physical model (e.g., MicroStrategy, Business Objects, Cognos

· Experience with data warehousing.

· Exposure to NoSQL-based, SQL-like technologies (e.g., Hive, Pig, Spark SQL/Shark, Impala, BigQuery)

· Excellent verbal and written communication skills