Job Description :
Current data platform to be moved from HDFS to AWS
EC2, S3, databases, batch processing using spark (lang
Start of data fabric, platform of data
Taking cafeteria style model, walk and pick what data you want and use it
Current is specific data shapes in S3, then to database for apps (it doesn’t scale, it’s one off)
To build to scale and democratic the data
Spark is in clusters now and they want to move to a platform service and allow database as a service (auto load data shape)
AWS is required as they want to build a platform team around this resource going forward.
Spark is also required (Scala is base code, some in Python, but they want to move to Java and Scala
Need to drive system by metadata (semantic layer, model, etc
Abstract design for meta data systems. Client is using AIRFLOW
Moving toward Severless app framework.