Full-timeProductWorldwide

Senior+ Software Engineer - Research Platform, Consumer Devices

at Open AI

Join OpenAI's Future of Computing Research team to build tools and services that enable AI research, evaluation, and data generation workflows. Work with world-class ML researchers and design talent to push the frontier of model capabilities.

Job Description

ABOUT THE TEAM

The Future of Computing Research team is an Applied Research team within the Consumer Devices group focused on developing new methods and models as we advance forward in our mission of building AGI that benefits all of humanity.

As a Software Engineer on the Future of Computing Research team, you will work together with both the best ML researchers in the world and the greatest design talent of our generation to push the frontier of model capabilities.

ABOUT THE ROLE

We are looking for a Software Engineer to join our team to build tools and services that enable AI research, evaluation, and data generation workflows.

The best work in this role will start with an ambiguous design question and turn it into working research systems. You will work closely with researchers, designers, and engineers to build the evaluation systems, synthetic data generation pipelines, review tools, and supporting platform services.

The goal is to make these workflows easier to create, run, and trust without requiring bespoke engineering support for each new design concept. You will help ensure that research artifacts have a clear lifecycle, runs are reproducible and observable, and results provide useful evidence for product and model-training decisions while the underlying systems remain reliable and reusable.

This role is based in San Francisco, CA. We use a hybrid work model of four days in the office per week and offer relocation assistance to new employees.

IN THIS ROLE, YOU WILL

Build web applications, APIs, data models, and backend services for AI research workflows.

Build tools to author and manage evaluation tasks, rubrics, graders, suites, and rollout configurations, including workflows for publishing, versioning, auditing, and sharing research artifacts.

Automate evaluation runs and generate useful reports for design, research, and engineering teams.

Support synthetic data generation workflows for multimodal and conversational research, including tools that combine transcripts, media, and model comparisons.

Translate product and research questions into measurable scenarios, automated graders, and human-evaluation campaigns, and develop measures of task quality, coverage, diversity, and semantic spread.

Diagnose issues across application code, workers, model endpoints, deployments, and compute infrastructure, and improve reliability through health checks, observability, reproducible launch paths, data integrity safeguards, and automated verification.

Lead migrations and dependent changes across research tools, evaluation systems, and supporting services.

Partner closely with designers, model researchers, research engineers, and infrastructure teams, and onboard contributors to create high-quality evaluation and synthetic-data workflows.

YOU MIGHT THRIVE IN THIS ROLE IF YOU

Have 7+ years of professional software engineering experience.

Have strong full-stack experience across web applications, backend services, APIs, and data models, including ownership of complex systems spanning multiple services or repositories.

Have expertise in generative AI, multimodal models, or model-evaluation systems.

Have built effective internal tools for both technical and non-technical users.

Are comfortable debugging distributed workflows and production infrastructure.

Have strong product judgment and can translate ambiguous requirements into concrete plans.

Are energized by working between designers and researchers in a multidisciplinary team, connecting qualitative judgment to rigorous evidence.

Communicate clearly and work effectively across engineering, design, and research.

Nice to have

Expertise in synthetic data generation, simulation, conversational AI, speech, video, motion, or embodied interaction.

Experience with automated graders, human evaluation, supervised fine-tuning, reinforcement learning, or experiment- and dataset-management platforms.

Responsibilities & Requirements

Responsibilities

Build web applications, APIs, data models, and backend services for AI research workflows
Build tools to author and manage evaluation tasks, rubrics, graders, suites, and rollout configurations, including workflows for publishing, versioning, auditing, and sharing research artifacts
Automate evaluation runs and generate useful reports for design, research, and engineering teams
Support synthetic data generation workflows for multimodal and conversational research, including tools that combine transcripts, media, and model comparisons
Translate product and research questions into measurable scenarios, automated graders, and human-evaluation campaigns, and develop measures of task quality, coverage, diversity, and semantic spread
Diagnose issues across application code, workers, model endpoints, deployments, and compute infrastructure, and improve reliability through health checks, observability, reproducible launch paths, data integrity safeguards, and automated verification
Lead migrations and dependent changes across research tools, evaluation systems, and supporting services
Partner closely with designers, model researchers, research engineers, and infrastructure teams, and onboard contributors to create high-quality evaluation and synthetic-data workflows

Requirements

7+ years of professional software engineering experience
Strong full-stack experience across web applications, backend services, APIs, and data models, including ownership of complex systems spanning multiple services or repositories
Expertise in generative AI, multimodal models, or model-evaluation systems
Experience building effective internal tools for both technical and non-technical users
Comfortable debugging distributed workflows and production infrastructure
Strong product judgment and ability to translate ambiguous requirements into concrete plans
Energized by working between designers and researchers in a multidisciplinary team, connecting qualitative judgment to rigorous evidence
Clear communication and effective collaboration across engineering, design, and research

Preferred Qualifications

Expertise in synthetic data generation, simulation, conversational AI, speech, video, motion, or embodied interaction
Experience with automated graders, human evaluation, supervised fine-tuning, reinforcement learning, or experiment- and dataset-management platforms

Benefits & Perks

Hybrid work model of four days in office per week
Relocation assistance to new employees

Skills

Full-stack developmentBackend servicesAPIsGenerative AIModel evaluationDistributed systemsData pipelinesInternal toolsResearch platformsMultimodal models

Senior+ Software Engineer - Research Platform, Consumer Devices

Job Description

Responsibilities & Requirements

Benefits & Perks

Skills

Tags