OpenAI President Greg Brockman: Doubling Down on Text Models, The Superapp Plan, Codex’s Potential
OpenAI is shifting strategies yet again. Here's the logic behind the latest moves and what they mean for the company's direction.
OpenAI is in the midst of yet another strategic shift. The company has abandoned video generation, it’s building a ‘superapp’ that combines coding, chat, and browsing, and it’s zeroing in on a use case where AI uses your computer to make you more effective at work, and assist you everywhere else.
Underlying the shift, OpenAI President Greg Brockman told me, is a belief that the company’s core large language models are the right architecture to bet on, and that tree of AI is growing increasingly capable and reliable.
“There’s been this debate about how far text models can go,” Brockman said. “I think we have definitively answered that question — it is going to go to AGI. We have line of sight to much better models coming this year.”
In an extended Big Technology Podcast interview, Brockman offered revealing comments about the company’s research direction, how far it can push its Codex coding assistant, and the logic of supporting all of this with mountains of compute.
You can read the full Q&A below, edited lightly for length and clarity, or listen on Apple Podcasts, Spotify, or your podcast app of choice.
Alex Kantrowitz: OpenAI has shut down video generation and is preparing a ‘superapp’ that will combine business and coding use cases. From the outside, it looks like you were winning in consumer but decided to shift toward business. Why do that?
Greg Brockman: We’ve been in a world where we’re developing this technology, deep learning, to really see, can it have the positive impact that we have always pictured? And we’ve separately had an effort to actually try to deploy this technology, whether that’s to help sustain the business, to start getting some practice with getting real world impact, those kinds of things.
We’re at a moment now where we’ve really seen this technology is going to work, and that we’re moving out of testing on benchmarks and these almost cerebral demonstrations of capability to it actually being the case that for us to develop it further, we need to see it in the real world and get feedback from how people are using it in knowledge work, in various applications. And so the way I think about it is that this is a bigger strategic shift because of the phase of the technology.
It’s not so much that we’re saying we’re moving from consumer to B2B, what we’re saying is — What are the most important applications that we can focus on, because we can’t focus on everything. What are the things that we can bring to life that will actually synergize together as we build them, and that will deliver meaningful impact and help elevate everyone?
When we look at the list, there’s consumer, which you can think of it as many things, but there’s a personal assistant — something that knows you, that’s aligned with your goals, it’s going to help you achieve whatever it is that you want in your life. There’s also creative expression and entertainment and many other applications. On the business side, maybe if you zoom out, it looks more like: You have a hard task, can AI go do it? Does it have all the context to do all these things?
For us, it’s very clear that the stack rank includes two things at the top. One is the personal assistant, the other is the AI that can go and solve hard problems for you. And when we look at the compute we have, we are not even going to have enough compute to fund those two things. And then once we start adding in many other applications, many other things that AI is going to be very useful for and is going to help people with, we just can’t possibly get to all of them.
So this is a recognition of the maturation of the technology and the incredible impact it’s going to have very quickly, and our need to prioritize and to actually pick the set of applications that we want to really bring to the world.
I’ve heard you compare OpenAI’s various bets to Disney’s, where you have a company with a core advantage that it farms out in different ways. Disney has Mickey Mouse, and it can do movies, theme parks, Disney+. At OpenAI, you have the model, and you can do video generation, a personal assistant, enterprise work. Is that no longer possible?
In some ways, that story is even more true than it’s been. But the thing that’s important to realize is technologically, the Sora models — which are incredible models, by the way — are a different branch of the tech tree than the core reasoning GPT series. They’re just built in a very different way. And to some extent, we’re really saying that pursuing both branches is very hard for us to do for these applications.
We are actually continuing the Sora research program in the context of robotics, which I think is very clearly going to be a transformative application, which is still a little bit in the research phase.
And so it’s a recognition that for this moment, we really need to put the primary focus on developing the GPT series. And that doesn’t just mean text. It doesn’t just mean cerebral things. For example, bidirectional communication, having a great speech to speech interface — that is something that also is going to make this technology very usable and very useful. But it’s not a different branch of the tech tree. It’s all one model, and we just sort of tweak that in slightly different ways.
If you branch too far and you have two different artifacts, it is very hard to sustain in a world where there is limited compute, and the reason there’s limited compute is because there’s so much demand. There’s so much that people want to do with every single model that we create.
Betting on Text vs. World Models
Why is your bet on the GPT model tree, when you had been seeing real progress with Sora?
The problem in this field is too much opportunity. The thing that we observe very early on in OpenAI is that everything we could imagine works now. There’s different levels of friction associated with it, different amounts of engineering effort, different compute requirements, all those things, but every single different idea, as long as it’s kind of mathematically sound, you actually can start getting some pretty good results.
That shows you the power of the underlying technology of deep learning, the ability to really take any sort of problem and to get to the meat of it, to have an AI that really understands the underlying rules that generated the data. So it’s not about data itself, it’s about understanding the underlying process and then being able to apply it in new contexts. So you can do that in world models. You can do that in scientific discovery. You can do that in coding.
Every single different idea, as long as it’s kind of mathematically sound, you actually can start getting some pretty good results.
And I think that where we are as we think about the rollout of this technology is that there’s been this debate of, how far will the text models go, how far can text intelligence go? Can you have a real conception of how the world operates? And I think that we have definitively answered that question — it is going to go to AGI. We see line of sight. And at this point we have line of sight to much better models that are coming this year, and the amount of pain within OpenAI that we’ve had to decide how to allocate compute goes up, not down, over time.
And so I think that maybe the core of it is that it’s about sequencing and timing, and that in this moment, the kinds of applications that we’ve always dreamed of are starting to come into reach. For example, solving unsolved physics problems — we had this result recently where a physicist had been working on a problem for some time, he gave it to our model, 12 hours later we have a solution. And he said this is the first time he’s seen a model where he felt like it was thinking, that it felt like this is a problem that maybe humanity would never solve, and our AI solved it. When you see something like that, you have to double down, you have to triple down, because we can really unlock all of this potential for humanity.
And so for me, it’s not about relative importance of these things. It’s more about what is OpenAI’s mission of delivering AGI to the world, our vision of how it can benefit everyone, and the fact that we have a tech tree that we see how to just push it, how to do the engineering, do the further science and research to then have that come to fruition.
Does open AI potentially miss something by doubling down on the text model tree?
So two answers. One is: absolutely, yes. In this field, you do have to make choices. You have to make a bet, and that’s actually where OpenAI started — we really said, what is the path to AGI that we believe in? And really focused hard on that. The sum of random vectors is zero, but if you align your vectors, then you can go in a direction.
But the second point is — it’s actually image generation that has been very, very popular within ChatGPT, and that’s something we’re continuing to invest in, continuing to prioritize. And the reason we’re able to do that is because it’s not actually on the world model, diffusion model tech branch — it’s actually based on the GPT architecture. And so there, even though it’s a different data distribution, the actual core technology, the core stack, it’s all one thing.
That is actually the pretty wild thing about what AGI is, is that sometimes these very different looking applications — between speech to speech, image generation, text — and text is, by the way, itself many facets, like science and coding and personal wellness information, those kinds of things — all of that you can do in one technological envelope. And so a lot of what I’m looking at, and what we as a company are looking at from a technological perspective, is how to have as much unification of our efforts, because we really see this technology as being something that’s going to uplift and power the whole economy. The whole economy is a massive thing, and so we can’t possibly do all of it, but we can do our part.
That’s the general part in artificial general intelligence. That’s the G.
That’s the G…
Our conversation continues below…
Should Software Companies Embrace AI or fight it? — With Asana Chief Product Officer Arnab Bose (sponsor)
Arnab Bose is the chief product officer of Asana. Bose joins us to discuss whether software companies should embrace AI or fight it as agents begin to reshape how work gets done. Tune in to hear how Asana is thinking about AI Teammates, the future of work management, and what onboarding AI into an organization looks like in practice. We also cover Asana’s work with leading model providers, the case for and against open source, and what widespread agent adoption could mean for labor. Hit play for a smart conversation about whether AI strengthens software companies like Asana or threatens to make them obsolete.
OpenAI’s Super App
Speaking of unifying things. What is this ‘super app’ going to be? It’s going to bring together coding, browser, and ChatGPT?
That’s right. We want to build an endpoint application for you that really lets you experience the power of AGI — the generality. Think about what chat is today. I think chat is really going to become your personal assistant, your personal AGI. An AI that’s looking out for you, that knows a lot about you, that’s aligned with your goals, that’s trustworthy, that kind of represents you in this digital world.
Codex you can think of as — right now it’s been a tool that we built for software engineers, but it’s becoming Codex for everyone, that anyone who wants to build can use Codex and to get the computer to go do the thing that they want. And it’s not just about the actual software anymore. It’s really about almost the use of a computer — whether it’s to set up, I used to set settings on my laptop, I forget how to set the hot corners. You ask Codex to do it, it just does it. That’s what computers were always supposed to be — contort to the human, rather than the human contort to them.
And so imagine one application where anything you want your computer to do, you can ask it. There’s computer-use browsing built in, for an AI to be able to actually use a web browser, and for you to be able to oversee what the AI is doing. All of your conversations, regardless of application — whether it’s for chat or whether it’s for code, whether it’s for general knowledge work — that’s all unified in one way, so that the AI has memory, knows about you. That is what we are building.
But it’s really an iceberg, because that’s the tip. What to me is actually much more important is the technological unification. The thing that’s really changed over the past couple of years has been that it’s no longer just about the model. It’s about the harness. It’s about how does the model get context? How is it connected to the world? What actions can it take? How does the loop of interacting with the model work as you get new context?
All of that was something that we had multiple implementations of, slightly different, and we’re converging it. We’re going to have one version of that, and almost end up with this AI layer that can be pointed at specific applications in a very thin way. So you can build a little plugin, a little UI if you really want something that’s great for finance, great for legal, but you generally won’t have to — because this one super app will be very broad.
This app is for business use cases, personal use cases?
Your laptop – is it for personal? Is it for business? It’s both. It’s your personal machine that gives you an interface to this digital world, and that’s what we want to build.
So just talk a little bit about from a non-business standpoint — I’m using the super app in my personal life. What am I using it for? How does my life change?
I would think of it as: personal life, just the way that you use ChatGPT right now. And people use it for such a diversity of really amazing applications — sometimes that’s just asking, I’m going to give a speech at a wedding, can you help me with drafting it? Can you give me some feedback on this idea that I have? I’m working on a small business, can you give me some ideas there? Which maybe starts to bridge between personal and work. Any of those questions should be things that you can go to the super app for and it answers.
But if you think about what ChatGPT has been, it’s already been evolving. It used to not have any memory — it’s just the same AI for everyone starting from scratch. It’s almost like talking to a stranger. It’s way more powerful if it remembers the interactions you’ve had. It’s way more powerful if it has access to context — if it’s hooked up to your email and to your calendar, and really knows your preferences and has this almost deeper set of past experiences with you that it’s able to leverage to achieve your goals. Pulse is a feature in ChatGPT right now where every day, it surfaces for you things that you might be interested in based on what it knows about you. So in the personal capacity, the super app will be doing all of that, and will be doing it in a much deeper and richer way.
When are you planning to ship it?
We’re taking incremental steps to get there. Over the next couple of months we should have shipped the complete vision of what we’re talking about here, but it’s going to come in pieces.
The place that we’re starting is with, for example, the Codex app today — which is really two things in one. It’s a general agent harness that can use tools, and it’s also an agent that knows how to write software. That general agent harness can be used for so many different things. You hook it up to spreadsheets, you hook it up to Word documents, it’s able to help you with knowledge work. And we’re going to make the Codex app just so much more usable for general knowledge work, because what we’ve already seen within OpenAI is all this organic adoption of people using it for that. So that’ll be the first step, and there are many to come.
Codex’s Potential Beyond Software Engineering
I was speaking with one of your colleagues yesterday taking a look at Codex, and he mentioned that someone using Codex had instructed it to help them with video editing. It builds a plugin for Adobe Premiere, started separating it into chapters and started the edit.
I love hearing that. That’s exactly the kinds of things that we want this system to be useful for.
The Codex app itself was originally built for software engineers, and the current usability of it for non-software engineers is actually quite low, because there’s a bunch of little things where when you set things up, you run into some error that a developer knows what it means, knows how to fix it — it’s just kind of what we’re used to. But if you’re not a developer, you’re like, what is this? This is not something that I’ve encountered before. And despite that, we are seeing people start to use this who have never programmed before, to be able to build websites, to be able to do exactly the kinds of things you said — to be able to automate their interactions with different pieces of software, to be able to get lots and lots of leverage. Someone on our communications team uses it hooked up to Slack and to their email — they’re able to go through a bunch of feedback and synthesize it very well.
People who are very motivated can jump through the hoops and then get great return from it. We did the super hard part of building an AI that is really smart, capable, and can actually accomplish your task. Now we have to do the much easier part, in some sense, of making it broadly useful and removing these barriers to entry.
Looking at the competitive landscape, Anthropic has the Claude app. You can use Claude the chatbot, Claude Code, so they have a version of a super app of their own. What do you think Anthropic saw that got them to this position earlier, and what do you think your chances are of catching up there?
If you rewind 12, 18 months, we have always been focused on coding as a domain. We always had the best numbers on different programming competitions, these very cerebral things. But the thing that we didn’t invest in as much was that last mile of usability, of really trying to think about — okay, this AI is so smart it can solve all these great programming competitions, but it’s never seen someone’s real world codebase, which is messy and not quite as pristine as the world that it sort of has experienced. And I think that is something that we were behind on.
But about maybe mid last year is when we got very serious about that, and we had a team very focused on: what are all the gaps, what are all the kinds of messiness of the real world that we haven’t encountered? How do we actually get training data, build training environments that let the AI experience what it’s like to actually do software engineering, be interrupted in weird ways, all those things?
I’d say at this point we are caught up — when people go head to head, us versus competitors, people tend to prefer us. We know we’re digging in on front end, we’re going to fix that.
But this is the general motion that we’ve been taking — to say that usability, thinking about the product end to end, not just a model and then building a separate thing, but really thinking of it as one product. When we’re doing the research, we’re thinking about how it will be used. That has been a motion that we’ve been changing within OpenAI. And so the way I would look at it is that we have incredible step-up models coming all year — I look at the roadmap, it’s truly inspiring, what will be possible. And then we’ve been really focusing now on: let’s also get the last mile usability.
You just used the phrase ‘we’re caught up.’ Is there a different vibe within the company, whereas now, instead of being the one that’s far ahead, you’re in a real fight?
For me personally, the scariest moment at OpenAI was actually after we launched ChatGPT. And I remember being at the holiday party and just feeling this vibe of: we won. I have never felt that. I was like: No. We are the underdog, and we always have been. The competitors in this space are established companies that have just so much more capital, so much more human resources, data, the whole thing. Why is OpenAI able to compete at all? And to some extent, the answer is only because we never feel complacent, where we always feel like we are the challenger.
And it actually, for me, has been a very healthy thing to see us start to see that in the marketplace, to see other competitors emerge and do a good job. In my mind, you can never fixate on your competitors. If you focus on where they are, then you’ll be where they are, and they’ll already have moved. And I think that’s what’s been happening in the other direction — a lot of people focus on exactly where we are and we get to move.
I’d say that the world that we’re in is one where you’re never as good as they say you are, you’re never as bad as they say you are. It’s been very steady. The core of the model production — that is something where I actually feel extremely confident in our roadmap and the research investments we’ve been making. And on the product side, we have such great energy that’s all coming together to deliver this to the world.
Forthcoming GPT Models: What’s Spud?
You’ve foreshadowed a couple of times already that you have some good models on the way. What is Spud? The Information said Sam Altman has told the staff the team believes this new “Spud” model can really accelerate the economy. What is it?
It’s a good model, but I think that it’s really not about any one model. The way that our development process works is you have pre-training, so you produce a new base model that is then the foundation that we build further improvements on top of. And that is always a huge effort across many people in the company — that’s where I’ve actually been spending most of my efforts over the past 18 months, really focused on our GPU infrastructure, on supporting the teams that do all of the training frameworks to scale up with these big runs.
I think of Spud as a new base, as a new pretrain. I’d say it’s like we have maybe two years worth of research that is coming to fruition in this model. It’s going to be very exciting.
But then there’s a reinforcement learning process. So you take this AI that has learned lots of things about the world, and it applies that knowledge, and then we do a post-training process where you really say: okay, now you know how to solve problems, you practice it in all these different contexts, and then here’s the last mile of behavior and usability.
So I think of Spud as a new base, as a new pre-train. I’d say it’s like we have maybe two years worth of research that is coming to fruition in this model. It’s going to be very exciting. And the way that the world will experience it is just improved capabilities. For me, it’s never about any one release, because as soon as we have this one release, it’ll be an early version of what we have coming. We’ll do much more of each of these steps of the improvement process. And so where we’re going is we have this engine of progress that just moves faster and faster, and Spud is just one step along the way.
What do you think it’ll be able to do that today’s models can’t?
I think it’s going to be able to solve both much harder problems. I think it will be much more nuanced. It’ll understand instructions better, understand the context much better. There’s this thing called ‘big model smell’ that people talk about, where when these models are just actually much smarter and much more capable, they bend to you much more. And you feel it — when you ask a question and the AI doesn’t quite get it, it’s always so disappointing. You have to explain it, you’re like: you really should be able to figure this out.
And so I would think of it as: quantitatively, lots of shifts, and qualitatively, there will just be new things where you would have been frustrated before, you’d never use AI for it, and now you just use it without thinking very much. I’m super excited to see how it raises the ceiling — we’ve already seen these physics applications and things like that, and I think we will be able to just solve way more open-ended problems over way longer time horizons. And I’m also very excited to see how it raises the floor, where just for anything you want to do, it’s just so much more useful for you.
It can be kind of tough for everyday users to really feel the change. There was a lot of buildup before GPT-5 came out, and actually the initial reaction was somewhat disappointment among the public. But then I think people realized that for certain tasks it was really good.
With these next series of models, do you expect that it’ll really be felt in the trenches in certain occupations? Or do you think it will be a broadly tangible improvement for everyone?
I think that it will be a similar story — when you release it, there will be people who will try it and be like, this is night and day different from anything I’ve seen. And then there will be some applications where we weren’t necessarily intelligence-bottlenecked. And so if you have a model that’s more intelligent, maybe you won’t feel it right there.
But I think over time you will feel it, because the fundamental thing that shifts is how much do you rely on the system. We all interact with AI by having some mental model for what we think it can do. And that mental model shifts actually fairly slowly — as you get more experience, it does something magical for you, you’re like: oh wow, it can do that, I never imagined that.
We see this, for example, in applications like access to health information. I have a friend who used ChatGPT to understand different treatments for his cancer. He was told by doctors that he was terminal, that there was nothing they could do for him. He used ChatGPT to actually research a bunch of different ideas, and he was able to get treatment that way. And that’s something where you need to have some level of belief that the AI is going to be helpful in that application, for you to really put in the effort to get something out of it. And I think what we’re going to see is that for any application like that, it’s going to become so much more evident to everyone that the AI can help you. So it’s a little bit of the technology getting better, but it’s also our understanding of the technology shifting and catching up to that.
AI Takeoff
You have an automated AI researcher in the works, it’s supposed to come out this fall. What is that?
So the direction of travel right now: we are in this early phase of takeoff of this technology.
What does takeoff mean?
Takeoff is as the AI gets better and better on this exponential — and in part because we can use the AI to make the AI better, so our development process speeds up. But I also think when I think of takeoff, it’s also about real-world impact. In some ways, every technology is an S curve. Or if you zoom out, some of those S curves end up being an exponential. And I think that’s what we’re encountering right now. So the technology development is moving with increasing speed, and it’s this engine that’s picking up momentum. But it’s also in the world — there’s all of these tailwinds, because there’s chip developers that are getting more resourcing into their programs. There’s this economy of people who are building on top of it, trying to figure out how it fits into every different application. And all of that energy is just accumulating more and more into this takeoff phase of AI, becoming just a kind of sideshow to being the main driver of economic growth.
And so the researcher will be a moment where the AI — which we’re building right now — is doing a larger percentage of tasks that can run autonomously. And that doesn’t necessarily mean that we just let it off on its own and come back later and see if it did something good. We are going to be very involved in managing it — just like right now, if you have a junior researcher and you leave them on their own too long, they’re probably going to go down a path that’s not very useful. But if you have a senior researcher, or someone who has a vision, they don’t even necessarily need to know the mechanical skills — they will be able to provide feedback, review the outputs that the intern is producing, and provide direction in terms of the vision of what they want accomplished.
And so I think of this as a system that we’re going to build that will massively accelerate our ability to produce models, to make new research breakthroughs happen, to be able to make these models more useful and usable in the real world, and to do that at increasing speed.
So what’s it going to do? Are you going to say: go find AGI?
I think the way I think of it is something like that, to first order. And at a practical level, I would view it as taking the full end-to-end of what one of our research scientists does, and being able to do that in silicon.
Another way to think about takeoff is that progress in AI goes from incremental to gathering momentum and then sort of this unstoppable march to an intelligence that’s smarter than humans. Do you worry that there are possibilities for that process to go wrong?
Absolutely yes. I think that the way to get the benefits of this technology is also to really think about the risks. And if you look at how we’ve approached technology development from a technical perspective, we invest a lot in safety, security. A good example of this is prompt injections — if you’re going to have an AI that is very smart and very capable, hooked up to lots of tools, you want to make sure that it can’t be subverted by someone giving it a weird instruction. And that’s something that we’ve invested in quite a lot, and I think have really incredible results. We have an incredible team working on it.
It’s interesting to think about some of these problems where you can make analogies to humans — humans are also susceptible to phishing attacks, to being deceived in different ways, to not really understanding the full context of what they’re working on. And we bring those analogies into our development process and think about: whenever we develop a model, how do we ensure that it’s going to be aligned with people and be able to actually be helpful?
And that is something that we care quite a lot about. I think that there are bigger questions about the world, the economy, how does everything change, how does everyone benefit from this technology — they’re not purely technical, not purely something that OpenAI on our own will be able to solve. But yes, I think quite a lot about not just pushing forward the technology, but also really about: how do we ensure that we have the positive impact that is its potential?
Many of your counterparts have said, if everybody agrees to stop AI progress, we’ll stop it — and yey it doesn’t seem like it’s going to slow at all. Is the reward worth the risk?
I think the reward is worth the risk, but I think that is too coarse-grained of an answer in some sense. The way that I think about it is that we’ve asked from the beginning of OpenAI: what does a great future look like? How can this technology really be something that uplifts everyone? And you can think of there almost being two different angles. One is the centralization view of saying that, well, the way to make this technology safe is that you have only one actor building it, and so then you don’t have any pressures — you can really think about getting it right, and then figure out how to roll it out to everyone when it’s ready. That’s a pretty tough pill in some ways.
And I think that there’s a lot of properties that you can instead think about approaching differently, which we refer to as resilience. To think of it as this open system where there are lots of players who are developing the technology, but it’s not just about the technology — it’s about building societal infrastructure that helps this technology really go well.
If you think about how electricity has developed, that’s something where lots of people produce it, it actually has dangers and risks, but we all build our safety infrastructure in a diversity of different ways — around safety standards for electricity, around different ways of harnessing it, about how you scale it. There are regulations at these massive scales. Lots of people are able to use it in a democratized fashion. There are inspectors. There’s a whole system that’s been built around the needs of that technology, the proclivities of that specific technology.
And I think that one thing that we have really seen with AI is that it is something where we need this broad conversation. We need lots of people to be aware — if this technology is going to come and change everything for everyone, people need to participate in that. It can’t be something that’s done in secret by just one sort of centralized group. And so this has been, to me, a very core question to how this technology should play out, and something we really believe in — this resilience ecosystem that should emerge around the development of this technology.
70-80% to AGI
So you said we’re in takeoff. Nvidia’s CEO Jensen Huang said recently that he believes AGI has been achieved. Do you agree?
I think that AGI has a different definition to many people, and there are many people who would say that what we have right now is AGI. I think you can debate it. But maybe the thing that’s interesting is that the technology we have right now is very jagged — it is absolutely superhuman at many tasks. When it comes to writing code and those kinds of things, AI can just do it, and it really removes a lot of the friction to creating things. But there are some very basic tasks that a human can do that our AI still struggles with. And so it’s almost: where do you draw the cut line? It’s a little bit more of a vibe than a feeling, than it is science at the moment.
For myself, we’re definitely going through that moment. And if you were to show me five years ago the systems we have today, I’d have said: oh yeah, that’s what we’re talking about. But it’s just so different from anything we ever pictured. And so I think we need to adjust our mental models appropriately.
So you’re not there yet.
I’d say I’m basically like 70-80% there. So I think we’re quite close. I think it’s extremely clear that we are going to have AGI within the next couple of years in a way that is still going to be jagged, but where the floor of tasks — for almost any intellectual task of how you use your computer — the AI will be able to do that. Right now I have to give a little bit of an uncertain answer, because it’s almost like an uncertainty principle kind of thing — you can debate it. For my own personal definition, I think we’re almost there, and with maybe a little bit more, we will absolutely be there.
Greg, what happened in December 2025? It seems like it was an inflection point where all this idea of letting the machine code for hours uninterrupted went from theory to a moment where everyone said, I think I can trust this to keep going for a while.
New model releases really went from the AI being able to do like 20% of your tasks to like 80%, and that was this massive shift, because it went from being kind of a nice thing to do to — you absolutely need to retool your workflow around these AIs.
And for myself, I’ve very much had this moment where I have a test prompt that I’ve been using for years — build a website for me. I’d actually built this website back when I was learning to code, took me months. Over the course of 2025, it used to take like four hours and a bunch of different prompts to get it right. In December: one shot, just asked the AI one time, and it produced it, and did a great job.
So how did those models make the leap?
A lot of it is about the better base models. One thing about OpenAI is that we’ve been working on improving our pre-training technology for quite some time, and in that moment we got to see a little taste of what is going to be coming for the rest of this year. But it’s also really about not any one thing — we’re constantly pushing on every single axis of innovation.
The thing that’s very interesting about these models is that in some ways you get these leaps, and in some ways it’s all continuous. It didn’t go from 0% to 80% — it went from 20% to 80%. So in some ways it just got better. And I think we’ve actually seen this improvement continue with every single point release that we’ve had.
One of my engineers I work with very closely went from: he couldn’t get it to do the low-level, hardcore systems engineering he does, to now — he gives it a design doc, it actually implements it, adds metrics, observability, runs the profiler, improves it to the point that it’s the exact thing that he was hoping to produce. And so the way to think about it is: slowly, slowly, slowly — all at once. But it is all indicated by what’s working right now. Certainly within a year, sometimes much sooner, it’s going to be incredibly reliable.
And it’s surprised you, because I heard you talking not long ago about how Codex — this autonomous coder — was just for software developers. And earlier in this conversation, you said actually everyone can use this stuff. What led to your changed perspective on it?
I’d been focusing on Codex — it’s got the ‘code’ in it — as really being for coders, and thinking about people within OpenAI, because many of us are software engineers building for ourselves. It’s very natural to think that way.
As this technology has been progressing, we’ve started to realize that the underlying technology we produced is mostly not about code at all. It’s mostly about solving problems.
But as this technology has been progressing, we’ve started to realize that the underlying technology we produced is mostly not about code at all. It’s mostly about solving problems. It’s mostly about being able to manage context and harnesses and think about how an AI should integrate and do work. And that’s something that becomes — even for code — suddenly anyone can have access, because you can manage something that’s going to go do work. If you have a vision, something you want to accomplish, you can describe your intent, the AI can execute, can get that done. But then it also starts to be like: why am I just focused on coding? There’s so much very mechanical skill associated with Excel spreadsheets, with presentations. And if the AI has the context, it has the raw intelligence now to be able to do these things at a great level. So if we can just make it more accessible, suddenly it goes from Codex is for coders to Codex is for everyone.
The OpenClaw Vision
You talked a little bit about the AI as something that will help run your life for you. Is that the logic for bringing the OpenClaw team in house?
The core thing about this technology is that figuring out how it’s useful, how people want to use it, what is the vision for agents, how it’s going to slot into people’s lives — that is a hard problem. And one thing I’ve seen across many generations of this technology is the people who really lean in, who have a lot of curiosity, who have a lot of vision — that’s a real skill, and that’s an emerging, very valuable skill in this new economy.
And Peter, who is the OpenClaw founder, is someone who’s got incredible vision and incredible creativity. And so to some extent it’s about the specific technology, but to some extent it’s really about: how do we take these capabilities and figure out how they slot into people’s lives? As a technologist, it’s very exciting, but as someone who is focused on bringing utility to people, that’s something that we are doubling down on and investing quite a lot in.
You had a pretty interesting quote about this recently, talking about getting these autonomous AI agents to work on your behalf. You said you become this CEO of a fleet of hundreds of thousands of agents that are completing your objectives, your goals, your vision, and you’re not in the weeds on exactly how different things are solved. And in some ways, this new way of work can make you feel like you’re losing your pulse on the problem. Is that good?
I think there’s a mixed bag. We need to acknowledge the strengths of what these tools can deliver and mitigate the weaknesses. Giving people leverage and agency, making it so that if you have a vision, something you want to accomplish, you can have a fleet of agents that will go do it for you. But if you think about how the world works, at the end of the day there’s an accountable party. If you’re trying to build a website, and your agent messes it up, and your user is affected — it’s not really the agent’s fault. It’s your fault. And so you need to care. And for people to use these tools right, you need to realize that human agency, human accountability — that’s a core part of the system. How the human uses the AI — that’s something that is deeply fundamental. And so the important thing is that as a user of these agents, you cannot abdicate responsibility. You cannot just say: the AI is just going to do stuff.
But you said feeling like you’re losing your pulse on the problem itself — that’s different than accountability.
To me they actually are linked together. Because the point is that if you’re a CEO, and you’re too far from the details — you’re running this company, this team, and you’ve lost your finger on the pulse — that is something that’s not going to lead to great results. And so the point I was trying to make is not that it’s a desirable thing for humans to not know about what’s going on.
There are some details that you probably don’t need to worry about, because you can trust they’ll be taken care of — like if you are working with a general contractor to build a house, there’s a bunch of details there that you probably don’t need to worry about. But at the end of the day, if there are details that are wrong, you should care about it. You should be aware. And so you cannot just blindly say I’m okay with losing my finger on the pulse. You need to lean in and say: I need to keep it there, to really understand the strengths and weaknesses. And as you disengage from some of these lower-level mechanical things, you should do it because you have built trust with a system that it will do a good job.
One last question about the models. You talked a little bit about the evolutions that the models have gone through — pre-training and fine-tuning, reinforcement learning that gets it more equipped to solve problems step by step and go out on the internet and do things. And now we’re in this moment where the models have learned through that process to use tools. What is next in that progression?
The world that we’re in is one of this increasing capability and depth of what the machine can do. Some of this is about: we’ve got tool use, but now we also need to actually build really great tools. Think about something like computer use — an AI that can actually use a desktop — then it is really able to do anything that you can do. But we also have to build a little bit for the machine to think about: how does enterprise credentialing work? How do audit trails and observability work? There’s a lot of technology to build to catch up with what the core model capability is.
And I think the overall direction of travel includes things like a really great speech interface, so you can just talk to your computer naturally — as natural as this conversation — and it understands you, it does what you need, it has good advice. You wake up in the morning, it says: here’s your daily report of how much progress your agents made overnight. Maybe it’s running a business for you — which I think is going to be a huge application of this technology. The democratization of entrepreneurship is absolutely coming. And it’ll say: here’s these problems, there’s this customer that’s upset, they want to talk to a real human, you should go talk to them. All of that is going to happen.
And then I think that the raising of the ceiling of ambition — of challenges humanity can solve — is also a next step for this technology, and we’re seeing the leading edges of it. The thing that I am just very excited to see is: if you remember AlphaGo move 37 — this move that no human ever would have come up with, that was creative, and it changed humanity’s understanding of the game — that is going to happen in every single domain. It will happen in science, in math, in physics and chemistry. It’s going to happen in material science, in biology, in healthcare, in drug discovery. But it may also even happen in literature, in poetry, in a bunch of other fields. It’s going to unlock human creative understanding and ideation in ways we can’t imagine right now.
Why do you think that hasn’t happened yet, given how strong you say the models are?
I think that there is an overhang of what the models are capable of and how people are using them. Our understanding of what is in these models is still emerging. So I think that even with no further progress, there’s still a massive shift that will happen. The economy being powered by compute and AI is still going to happen.
But I think there’s also something where what we’ve gotten very good at is training models on tasks that could be measured. So what we started with was math problems, programming problems where you have a perfect verifier. And a lot of what the progress has been in bringing us to more open-ended problems has been expanding the space of what can be evaluated. AI itself can really help with that — if AI is smart and understands things, you give it a rubric for how well a task goes. And for things like creative writing — is this a good poem? — that’s a much harder thing to grade. And so we’ve had less ability to teach the AI, for it to experience and try things out. But all of that is changing, and something that we have a lot of sight for.
AI Infrastructure Logic
As you shift really to these more agentic use cases, there’s been discussion about whether the bigger training runs really need to happen. If you get the model good enough, could you let it go out in the world, and then you can effectively get much of the uplift in areas that aren’t the pre-training, which is what these big data centers are needed for. What do you think about that argument?
I think it misses something very important for how the technology development goes, because it is absolutely the case that every single step of the model production pipeline multiplies, and so you want to improve all of them. And the thing that we see is: we improve the pre-training, it makes all the other steps much easier. And it makes sense, because it’s a model that is able to learn faster, a model that is already more capable to start when it’s trying out different ideas and learning from its own mistakes — that process is just faster, it needs to make fewer mistakes.
And so I think that the big shift has been from thinking of it as just training this cerebral system on its own, just making it bigger, to: it’s also about trying things out, it’s also about understanding how people are using it in the real world, and connecting that back into your training. But it doesn’t remove the value and the importance of continuing that research.
And the thing that I think has also shifted is we used to really just focus on the raw pre-training capability, but not think as much about the inference ability. And that’s been a big change over the past 24 months — to realize that it’s a balance. You can have this model that has all those great properties in the base, but then you really need it to be able to run at inference, because you need to do reinforcement learning, you need to serve it to the world. And that means you don’t necessarily go as big as you possibly could, because you also really think about all this downstream use, and you want the thing that has the best intelligence times that cost — to optimize those two things together.
Do you still need the Nvidia GPU if things move mostly to inference?



