ABOUT THE TEAM
The Future of Computing Research team is an Applied Research team within the Consumer Devices group focused on developing new methods and models as we advance forward in our mission of building AGI that benefits all of humanity.
As a Software Engineer on the Future of Computing Research team, you will work together with both the best ML researchers in the world and the greatest design talent of our generation to push the frontier of model capabilities.
ABOUT THE ROLE
We are looking for a Software Engineer to join our team to build tools and services that enable AI research, evaluation, and data generation workflows.
The best work in this role will start with an ambiguous design question and turn it into working research systems. You will work closely with researchers, designers, and engineers to build the evaluation systems, synthetic data generation pipelines, review tools, and supporting platform services.
The goal is to make these workflows easier to create, run, and trust without requiring bespoke engineering support for each new design concept. You will help ensure that research artifacts have a clear lifecycle, runs are reproducible and observable, and results provide useful evidence for product and model-training decisions while the underlying systems remain reliable and reusable.
This role is based in San Francisco, CA. We use a hybrid work model of four days in the office per week and offer relocation assistance to new employees.
IN THIS ROLE, YOU WILL
- Build web applications, APIs, data models, and backend services for AI research workflows.
- Build tools to author and manage evaluation tasks, rubrics, graders, suites, and rollout configurations, including workflows for publishing, versioning, auditing, and sharing research artifacts.
- Automate evaluation runs and generate useful reports for design, research, and engineering teams.
- Support synthetic data generation workflows for multimodal and conversational research, including tools that combine transcripts, media, and model comparisons.
- Translate product and research questions into measurable scenarios, automated graders, and human-evaluation campaigns, and develop measures of task quality, coverage, diversity, and semantic spread.
- Diagnose issues across application code, workers, model endpoints, deployments, and compute infrastructure, and improve reliability through health checks, observability, reproducible launch paths, data integrity safeguards, and automated verification.
- Lead migrations and dependent changes across research tools, evaluation systems, and supporting services.
- Partner closely with designers, model researchers, research engineers, and infrastructure teams, and onboard contributors to create high-quality evaluation and synthetic-data workflows.
YOU MIGHT THRIVE IN THIS ROLE IF YOU
- Have 7+ years of professional software engineering experience.
- Have strong full-stack experience across web applications, backend services, APIs, and data models, including ownership of complex systems spanning multiple services or repositories.
- Have expertise in generative AI, multimodal models, or model-evaluation systems.
- Have built effective internal tools for both technical and non-technical users.
- Are comfortable debugging distributed workflows and production infrastructure.
- Have strong product judgment and can translate ambiguous requirements into concrete plans.
- Are energized by working between designers and researchers in a multidisciplinary team, connecting qualitative judgment to rigorous evidence.
- Communicate clearly and work effectively across engineering, design, and research.
Nice to have
- Expertise in synthetic data generation, simulation, conversational AI, speech, video, motion, or embodied interaction.
- Experience with automated graders, human evaluation, supervised fine-tuning, reinforcement learning, or experiment- and dataset-management platforms.