Researcher, Misalignment Futures

About the team

Safety Systems sits at the forefront of OpenAI’s mission to build and deploy safe AGI, ensuring our most capable models can be released responsibly and for the benefit of society. As frontier capability accelerates, the most consequential failures will not announce themselves as benchmark regressions. They will arrive as believable stories: incentive structures that drift, interfaces that invite misuse, organizational rituals that normalize risk, and systems that behave “fine” until they don’t.

We are building a Misalignment Futures function: a small, expeditionary research practice that identifies, clarifies, and pressure-tests plausible misalignment pathways well before they are obvious in telemetry. This work treats speculation as a disciplined research instrument—turning weak signals into reality-anchored demonstrations, evaluations, and system-level stress tests that materially shape product launches and long-term safety strategy.

About the role

This role is adjacent to red-teaming and adversarial evaluation, but it is not purely instrumental. The work begins earlier—at the edge where we do not yet have crisp definitions, stable taxonomies, or agreed-upon “known bads.” Misalignment Futures operates like an expeditionary unit: we go out into the changing landscape of deployment contexts, user practices, and institutional incentives, then return with artifacts that make risks legible and testable.

You will craft compelling, reality-anchored demonstrations (not sci-fi, not satire) that show how misalignment could manifest in systems that appear aligned under standard checks. You will then translate those demonstrations into rigorous evaluations and automated stress tests. Along the way, you will draw on methods that are unusually effective at seeing what technical monocultures miss: scenario practice, design fiction as evidence-making, ethnographic attention to rituals and incentives, and philosophy-of-technology sensibilities that help locate “the question” before we rush to “the fix.”

You’ll collaborate with researchers, engineers, policy, legal, and product teams to ensure the work lands as action: concrete mitigations, guardrails, governance inputs, and clear decision criteria.

The Work of This Research Taskforce Spans Four Pillars

Worst-Case Demonstrations: Craft compelling, reality-anchored demonstrations that reveal how misalignment could surface in plausible deployment contexts—especially high-impact cases where incentives, interfaces, and organizational practices combine to produce harm.
Adversarial & Frontier Safety Evaluations: Convert demonstrations into rigorous, repeatable evaluations that measure dangerous capabilities and residual risks, including deception, reward hacking, sandbagging, goal misgeneralization, power-seeking, and other emergent failure modes.
System-Level Stress Testing: Build automated infrastructure to probe entire product stacks end-to-end, escalating tests until you find breaking points even as systems continue to improve.
Misalignment Stress-Testing Research: Investigate why mitigations break and publish insights that reshape strategy, evaluation practice, and next-generation safeguards—internally and externally where appropriate.
Expeditionary Discovery: Maintain a lightweight practice of sensing weak signals across adjacent domains (industry workflows, media production, education, governance, developer tooling, community norms) and translating them into testable safety hypotheses.
Artifact-Led Communication: Produce artifacts that make risks legible to mixed audiences—engineers, product, policy, and leadership—without diluting rigor (e.g., demos, scenario briefs, annotated traces, and “how this fails” narratives tied to concrete reproductions).

You Might Thrive in This Role If You

Are motivated by OpenAI’s mission and the OpenAI Charter, and want your work to directly reduce catastrophic risk from advanced AI systems.
Have experience designing adversarial tests, red-team exercises, safety evaluations, or high-stakes failure-mode research in AI, security, sociotechnical systems, or adjacent domains.
Can move from ambiguity to evidence: you can take a messy risk intuition, craft a credible demonstration, and translate it into a repeatable evaluation or stress test.
Communicate clearly with both technical and non-technical audiences, translating complex findings into actionable recommendations and decision criteria.
Enjoy collaboration and can drive cross-functional projects spanning research, engineering, product, policy, and legal.
Have strong software instincts: you’re comfortable hacking on large codebases, building evaluation harnesses, and instrumenting systems—without treating engineering as the only lens.
Bring cultural and human-systems literacy: you can reason about incentives, rituals, workflows, and institutions as first-class components of safety.
Have a research track record demonstrated through publications, open-source work, influential internal work, exhibited artifacts, or other evidence of original contribution.

Preferred Qualifications

Experience building evaluation frameworks for frontier models and product stacks, including reliability, robustness, and adversarial resilience.
Experience with scenario practice, design fiction as method, incident retrospectives, threat modeling, or other structured approaches for anticipating failure modes.
Comfort designing studies or inquiries that capture human practice (e.g., field observation, interviews, workflow analysis) and converting insights into testable hypotheses.
Familiarity with governance and compliance realities (privacy, security, policy), and the ability to make safeguards practical rather than ornamental.
A track record of producing work that is legible and compelling beyond a narrow technical audience.
Experience collaborating across labs or with external researchers when useful, and a bias toward sharing findings responsibly.

Education & Experience

PhD, Master’s, or equivalent experience in a field that supports seeing across technical and human systems. Examples include: History of Science/Technology, Science and Technology Studies, Anthropology, Human-Computer Interaction, Media Arts, Philosophy of Science/Technology, Security Studies, Cognitive Science, or related disciplines.
Alternatively: substantial industry experience delivering high-impact work across safety, evaluation, research engineering, or sociotechnical risk—especially where the failure modes are subtle, adversarial, or incentive-driven.
Evidence of independent judgment and taste: you have produced artifacts (papers, tools, demos, narratives, methods) that helped others see what mattered and changed what they did next.

In This Role, You Will

Design and implement worst-case demonstrations that make misalignment risks concrete for stakeholders, grounded in realistic use cases and deployment constraints.
Develop adversarial and system-level evaluations grounded in those demonstrations, driving adoption across evaluation pipelines and product launch processes.
Create automated tools and infrastructure to scale stress testing and red-teaming, including data collection, reproducibility, and regression tracking.
Partner with engineering, research, policy, and legal teams to integrate findings into safeguards, governance processes, and decision frameworks.
Publish influential internal or external work that shifts safety practice—methods, results, case studies, or evaluation standards—when doing so accelerates collective progress.
Mentor researchers and engineers and help build a culture of rigorous, impact-oriented safety work that stays connected to real-world contexts.

Pay transparency

The base salary range for this role is $380,000—$460,000, depending on experience, skills, and location. This role is also eligible for benefits and may include additional compensation components. Details will be shared during the interview process.

Notices

OpenAI is an equal opportunity employer. We do not discriminate on the basis of race, religion, color, national origin, sex, sexual orientation, age, veteran status, disability, genetic information, or other legally protected characteristics.

Background checks for applicants will be administered in accordance with applicable law, and qualified applicants with arrest or conviction records will be considered for employment consistent with applicable laws, including relevant Fair Chance Ordinances where applicable.

This posting is speculative—a design fiction artifact exploring what roles might exist if misalignment research formalized an expeditionary, culture-literate practice alongside technical evaluation.

We will work with and provide reasonable accommodations for applicants with disabilities.