Job Description :
Job Description :

Expertise with Big Data/ Hadoop Ecosystem: Spark, Hive, Kafka, Sqoop, Impala, Oozie, HBase, NIFI, Flume, Storm, Zookeeper, Elasticsearch, Solr, Kerberos
In-depth understanding of Spark Architecture and performed several batches and real-time data streaming operations using Spark (Core, Streaming, SQL), RDDs, D-Stream RDDs, Data frames, Datasets
Experience in handling large datasets using Spark in-memory capabilities, Partitions, Broadcasts, Accumulators, Effective & efficient Joins
Used Scala to perform Spark jobs.Performed Hive operations on large datasets with proficiency in writing HiveQL queries using transactional and performance efficient concepts: MERGE, Partitioning, Bucketing, efficient and effective Join operations
Imported data from relational databases to HDFS/Hive, performed operations and exported results back using Sqoop.Performed Cassandra data modeling, data operations using Cassandra Query Language
Experience designing Talend ETL processes and developed source to target mappings
Performed Data Profiling, Migration, Extraction, Transformation, Loading and data conversions
Performed operations on real-time data using Storm, Spark Streaming from sources like Kafka, Flume
Experience Pig Latin scripts to process, analyze and manipulate data files to get required statistics
Experience in storing and retrieval of documents using ELK Stack, Apache Solr
Experience with different file formats like Parquet, ORC, Avro, Sequence, CSV, XML, JSON, Text files
Experience with Big Data Hadoop distributions: Cloudera, Hortonworks and Amazon AWS.
             

Similar Jobs you may be interested in ..