Job Description :
Position : Big Data /Hadoop Developer
Location : Boston , MA
Duration of contract : Long term
Job Description:
Expertise with Big Data/ Hadoop Ecosystem: Spark, Hive, Kafka, Sqoop, Impala, Oozie, HBase, NIFI, Flume, Storm, Zookeeper, Elasticsearch, Solr, Kerberos.
In-depth understanding of Spark Architecture and performed several batches and real-time data streaming operations using Spark (Core, Streaming, SQL), RDDs, D-Stream RDDs, Data frames, Datasets. Experience in handling large datasets using Spark in-memory capabilities, Partitions, Broadcasts, Accumulators, Effective & efficient Joins. Used Scala to perform Spark jobs.
Performed Hive operations on large datasets with proficiency in writing HiveQL queries using transactional and performance efficient concepts: MERGE, Partitioning, Bucketing, efficient and effective Join operations.
Imported data from relational databases to HDFS/Hive, performed operations and exported results back using Sqoop.
Performed Cassandra data modeling, data operations using Cassandra Query Language.
Experience designing Talend ETL processes and developed source to target mappings. Performed Data Profiling, Migration, Extraction, Transformation, Loading and data conversions.
Performed operations on real-time data using Storm, Spark Streaming from sources like Kafka, Flume.
Experience Pig Latin scripts to process, analyze and manipulate data files to get required statistics.
Experience in storing and retrieval of documents using ELK Stack, Apache Solr.
Experience with different file formats like Parquet, ORC, Avro, Sequence, CSV, XML, JSON, Text files.
Experience with Big Data Hadoop distributions: Cloudera, Hortonworks and Amazon AWS.