Job Description :
Title: Data engineer
Location: Menlo Park, CA
Interview Mode: Phone and skype
Client : HCL/Facebook

Key skills – Python, Tableau, Data Pipelines, SQL, Hive

Preferred Skills
These skills are outlined in order of importance, including some of the programming languages that will be used.
Ability to develop shared and/or independent micro services for CaMPe using Python and Thrift.
Expertise with SQL and building Tableau dashboards.
Building data pipelines using Python against Hive and MySQL tables.
Ability to write Javascript (in particular the React framework) to help with custom UI features where needed across all systems in CaMPe.
Ability to write integration and unit tests, as well as benchmarking tests in Python.
Ability to debug and diagnose PHP code to help support our legacy systems (e.g., CaMP
Expertise in Excel, plotting, basic analysis and support of various views needed by the business but not supported by our systems yet.

Projects
Here we outline a number of projects for the contractors, based on our current business needs and resourcing gaps.
Again, outlined in order of importance.

DEVELOPING MICRO SERVICES
A few examples of these micro services include:
A micro service that acts as a wrapper around the new Hardware Roadmap Portal maintained by Bizapps. Given a hardware program or CEA rack type, along with a time period, return resource info (e.g., disk, flash, RCU, memory) and other attributes (e.g., power, CFM, cost
A micro service that acts as a dictionary look-up for all our planning units (e.g., server types, rack types, regions
A micro service that maintains our product hierarchy to be used across all tools maintained by CEA. A mapping from Product Group to Product to Service.
A micro service that archives all critical MySQL tables to Hive on a daily basis for further processing and analytics.
Extracting MySQL statements from code into separate micro services to make the codebase more modular and flexible.

These micro services have clear requirements, inputs, outputs and well-defined test cases. They are also relatively independent and will not require a lot of domain expertise or extensive code reviews.
They are also relatively small in size and the designs are somewhat simple in nature. It is just that we have many such micro services that need to be
built but we currently don''t have the bandwidth to tackle them, especially since it is mostly plumbing work that doesn''t
directly help our stakeholders, but is critical for building the underlying platform for CaMPe (our suite of capacity management and performance evaluation solutions


CREATING TABLEAU DASHBOARDS
We essentially need a Tableau dashboard to serve as the landing page for every system we maintain as part of CaMPe.
A few, non-exhaustive, examples of such dashboards include:
CORD
o Breakdown of orders by service, major rack type and region. Based on pre-defined selectors.
o Further breakdowns by CEA rack type where appropriate.
o Overview of top services ordering a particular rack type, globally and per region.
o Overview of top services based on power used or Capex, globally and per region.
o Time-series view of orders across quarters over time.
o A map-based view of our orders across regions, data centers and clusters.
o A histogram of delivery weeks for all our orders. Broken down by rack type and region.
Expense Allocation Portal (EAP)
o Breakdowns and views that are essentially identical to CORD, but with some minor differences.
For demand as opposed to orders.
Using need-by weeks as opposed to delivery weeks.
Only looking at major rack types and limiting location info to regions.
o Overview of top services that got cut. Globally, per region and per rack type.
o Overview of top services that will pull from spares. Globally, per region and per rack type.
o Overview of data discrepancies when compared to CORD, to capture last-minute changes in CORD (due to urgent business needs) that were not reflected in EAP.

Capacity Requests Portal (CRP)
o End of quarter summary report for planned quotas, operational quotas and allocations.
o Broken out by Product Group (or Product or Service), major rack type and region.
o Status report for a Product Group (or Product or Service) in terms of allocations against operational quota and remaining headroom.
o Headroom broken down by major rack type and region.

HW Refresh Portal (HWR)
o Reconciliation report that ensures orders (in CORD) exceed agreed counts (in the HW Refresh Portal) to support all migrating clusters.
o An initial reconciliation view already exists in the HW Refresh Portal.

On Demand Planner (ODP)
o Output of regionalization module broken out by Product Group, rack type and region.
o Variety of views used to present the ODP data in leadership meetings that have already been compiled in a number of slide decks.

CODE AND DATA MANAGEMENT
There are a number of platform pieces related to code and data management, namely:
Moving and refactoring codebase around to be consistent across all systems in CaMPe.
Writing integration and unit tests where needed.
Writing performance benchmarks for our APIs and UIs.
Excel support for data pulled out of various systems to enable various types of analyses needed by our stakeholders.
Helping with oncall rotations, especially in terms of debugging legacy systems that often involve finding data integrity problems, as opposed to requiring any code changes.


REVAMPING W8 DASHBOARD
There is a need to revamp the W8 dashboard, based on guidance from CEA members, to make the data more actionable,
and more connected to utilization statistics (and other views) in Overwatch. Possibly cleaning up and generalizing the underlying data pipelines as well.