Job Description :
Position : Big Data Architect (Spark Streaming)

Location: Milwaukee, WI

Contract Duration : 6+ Months

Interview process: Webex

Visa: USC, GC, GC-EAD, H1B, H4-EAD, TN


12+ years of total IT experience including 3+ years of Big Data experience (Amazon EMR,
Amazon Kinesis, Java, Spark Streaming, Spark SQL, HBase, Hive and Sqoop Hands on
experience on Big Data tools and technologies is mandatory.
Experience in building real time data streaming pipelines from Amazon Kinesis to Hive using
Spark streaming on Amazon EMR
At least one-year experience in design and executing Hadoop solutions on Amazon EMR.
Knowledge on Amazon services including S3, EC2, Kinesis, Firehose and Cloud watch.
Proven experience of driving technology and architectural execution for enterprise grade
solutions based on Big Data platforms.
Designed at least one Hadoop Data Lake end to end using the above Big Data Technologies.
Experience in designing Hive and HBase Data models for storage and high-performance queries.
Knowledge of standard methodologies, concepts, best practices, and procedures within Amazon
EMR Big Data environment.
Proficient in Linux/Unix scripting.
Bachelor''s degree in Engineering - Computer Science, or Information Technology. Master''s
degree in Finance, Computer Science, or Information Technology a plus.
Experience in Agile methodology is a must.
Experience in Storm and NoSQL Databases (e.g. Cassandra) is desirable.
Good communication skills and problem-solving techniques
Job Description
Define big data solutions that leverage value to the customer; understand customer use cases
and workflows and translate them into engineering deliverables
Architecting and Designing Hadoop solution.
Actively participate in Scrum calls, work closely with product owner and scrum master for the
sprint planning, estimates and story points.
Break the user stories into actionable technical stories, dependencies and plan the execution
into sprints.
Designing batch and real time load jobs from a broad variety of data sources into Hadoop. And
design ETL jobs to read data from Hadoop and pass to variety of consumers / destinations.
Perform analysis of vast data stores and uncover insights.
Responsible for maintaining security and data privacy, creating scalable and high-performance
web services for data tracking.
Propose best practices / standards and implement them in the deliverables.
Analyze the long running queries and jobs, performance tune them by using query optimization
techniques and Spark code optimization.