ABOUT THE TEAM
OpenAI’s People team hires, engages, and retains world-class talent to safely build and deploy AGI that benefits all of humanity. The People Analytics team helps leaders make rigorous, evidence-based talent decisions and ensures that the systems supporting those decisions are valid, reliable, fair, and accountable.
ABOUT THE ROLE
As a People Data Scientist focused on AI fairness and bias testing, you will help establish how OpenAI evaluates AI-assisted People systems and high-impact talent processes. You will design and conduct rigorous assessments to identify, measure, and mitigate potential bias across the lifecycle of models, agents, decision-support tools, and automated workflows.
Your work will span the entire employee life-cycle, such as hiring, performance, promotion, employee development, workforce planning, etc. You will evaluate both technical systems and the broader human-AI decision processes in which they operate, examining not only model performance but also data quality, measurement validity, differential outcomes, human oversight, and unintended consequences.
We’re looking for an experienced data scientist or applied researcher who can translate complex fairness questions into defensible evaluation strategies, scalable testing infrastructure, and clear recommendations for technical teams and senior leaders.
This role is preferred to be based in San Francisco, CA.
IN THIS ROLE, YOU WILL
- Define and lead fairness and bias-testing strategies for AI-assisted People processes, models, agents, and decision-support systems from development through deployment and ongoing monitoring.
- Design rigorous algorithmic audits and validation studies, including adverse-impact analysis, subgroup and intersectional evaluation, error-rate analysis, calibration, measurement invariance, reliability, criterion-related validity, and sensitivity testing.
- Identify the appropriate fairness criteria for each use case, evaluate tradeoffs among competing definitions of fairness, and clearly document the assumptions, limitations, and residual risks of each approach.
- Evaluate end-to-end human-AI decision systems, including model outputs, user behavior, human overrides, escalation pathways, and whether AI assistance changes the quality, consistency, or equity of decisions.
- Develop evaluation approaches for generative and agentic AI, including test-set design, counterfactual testing, behavioral evaluation, human-rating studies, robustness testing, and analysis of disparate performance across populations and contexts.
- Investigate the sources of observed disparities, including data representation, label and measurement bias, proxy variables, model design, decision thresholds, workflow design, and differential adoption or usage.
- Partner with engineering, People Operations, Legal, Privacy, Security, and People Systems teams to recommend and evaluate mitigations such as data improvements, model changes, threshold adjustments, workflow redesign, monitoring controls, and additional human oversight.
- Build scalable fairness-evaluation infrastructure, including reusable datasets, automated validation pipelines, regression tests, monitoring systems, self-service tools, and standardized reporting.
- Establish research and documentation standards for fairness test plans, dataset and model documentation, validation reports, limitations, monitoring plans, and decision records.
- Translate complex findings into concise, decision-ready narratives, helping leaders understand the significance of identified risks, the strength of the evidence, available mitigation options, and remaining uncertainty.
YOU MIGHT THRIVE IN THIS ROLE IF YOU HAVE
- Deep expertise in algorithmic fairness, bias measurement, responsible AI, psychometrics, applied statistics, or the evaluation of high-impact decision systems.
- Exceptional strength in research design, measurement, experimentation, causal inference, and statistica