Responsibilities:
1. Working with the team to prototype human rating and semi-automatic model evaluation protocols.
2. For high stake and / or high complexity protocols, conduct small-to-medium scale in-house
human rating on generated content.
3. Working with engineers to identify and summarize model gaps based on the above evaluation.
4. Working with engineers to identify data needed to close those gaps and mine such data to
prepare training data for the next iteration of model training.
5. Working with PDO teams to scale up validated evaluation protocols, including coordination and
auditing.
Qualifications:
1. Associate's degree or equivalent training required in Computer Science, Electronic Engineering,
Physics, Bioinformatics, or other STEM subjects.
2. Basic knowledge of Python, Unix, and SQL.
3. Basic knowledge of computer vision and generative models
4. Hands-on experience working with large language models (LLMs).
5. Strong understanding of core AI/ML concepts.
6. Solid experience working with data pipelines, data engineering workflows, and image data
processing.
7. Experience prompting models and interpreting model outputs.
8. Demonstrated ability with model-assisted annotation or analysis tools.
9. Attention to details and an aptitude for experimental investigations.
10. Prior industrial experience in software development and testing and / or research experience in
human computer interaction are preferred.