Data architect is an expert in definition, design, implementation of data solutions adhering to enterprise architecture strategies, processes and standards. This specific role needs expertise in big data technologies: Hadoop Eco System, NoSQL and other distributed storage, Spark, Kafka and their cloud compatible services found in AWS analytical or Azure analytical components. Should have experience implementing big data on a large-scale project.
This individual must be hands-on, responsible for solution developments and demos using IOT platforms, cloud infrastructure, and analytics services.
Responsibility
• Shape and drive the architecture, design, and technical capabilities in Big Data Ecosystem technologies with hands on experience implementing scalable data ingestion, storage, processing and publication solutions
• Experience in architecting large scale systems on the cloud which can handle more than large volumes of data in real-time and batch modes
• Expertise in designing end to end data engineering solutions using a variety of data processing patterns
• Experience in building deployment architecture over the cloud for a large-scale distributed system
• Solid experience in developing data governance practices at enterprise level - Meta data, data lineage, data classification, data security & data life cycle
• Hands on experience with public and private cloud capabilities including compute, storage, database and APIs
• Experience with Normalized data store, Operational data store, Dimension data store & Enterprise Data lake
• Should have experience in some of the below analytical use cases - fraud detection (credit card industry), forecasting and budgeting (finance), developing cellular/mobile packages by analyzing call patterns (telecommunication industry), market basket analysis (retail industry), customer risk profiling (insurance industry), usage monitoring (energy and utilities), and machine service times (manufacturing industry)
• Own design and development of the solutions being developed
• Design and implement NFR strategy across different layers of the architecture
Skills
• Experience with big data processing and analytics frameworks–Hadoop/EMR/HDInsight), AWS Analytics services, Azure Analytics Services
• Ingestion frameworks - Sqoop, Flume, AWS Glue, Azure Data Factory, Apache Kafka/AWS Kinesis/Azure Event Hub
• Big data storage (HDFS, AWS S3, Azure Blob), Big data file formats, compression and serialization
• Data lake – Azure & AWS offerings
• Experience with some of the distributed processing engines – Apache Spark, Apache Flink, SOLR
• Experience with data warehouse systems – Apache Hive, AWS Athena, AWS Redshift, Azure Synapse Analytics
• Meta Data management for big data system – HCatalog, AWS Glue Data Catalog, Azure Data Catalog
• Workflow frameworks – Apache Oozie, Apache Airflow, Luigi
• NoSQL database modelling - Cassandra, MongoDB
• Modelling for cloud databases – Aurora, RDS, Azure SQL server, DynamoDB, Cosmos DB,
• Data visualization - Kibana, Power BI & Tableau
• Experience working with ETL tools like Informatica, Talend & cloud specific tools
Nice to have
• Machine learning and deep learning experience
• Certificates - AWS Certified Data Analytics, Azure Data Solution certificates