Вы здесь

Новости LessWrong.com

Подписка на Лента Новости LessWrong.com Новости LessWrong.com
A community blog devoted to refining the art of rationality
Обновлено: 12 часов 32 минуты назад

Rebuttals for ~all criticisms of AIXI

7 января, 2025 - 20:41
Published on January 7, 2025 5:41 PM GMT

Written as part of the AIXI agent foundations sequence, underlying research supported by the LTFF.

Epistemic status: In order to construct a centralized defense of AIXI I have given some criticisms less consideration here than they merit. Many arguments will be (or already are) expanded on in greater depth throughout the sequence.  

With the possible exception of the learning-theoretic agenda, most major approaches to agent foundations research construct their own paradigm and mathematical tools which are not based on AIXI. Nothing in 2024's shallow review of technical AI safety seems to advance the theory of AIXI or even use its tools. Academic publications on the topic are also quite sparse (in my opinion some of the last major progress took place during Jan Leike's PhD thesis in the 2010's). The standard justification is that AIXI fails as a realistic model of superintelligence, particularly the aspects we care about for alignment such as recursive self-improvement or the construction of superintelligence. Some of the criticisms have merit, but most of them are so informal that it is not clear whether or precisely why they are correct. The "why" is centrally important because if AIXI is an excellent model for a restricted setting of superintelligence (arguably far more appropriate than any competing model), as even its critics usually accept, and if it can be shown to fall short in some cases relevant to alignment obstacles, the precise details are likely to be useful for understanding the nature of those alignment obstacles, possibly even illuminating when those obstacles arise. Similarly, failures of AIXI as a recipe for constructing recursively self-improving superintelligence should illustrate how recursively self-improving superintelligence can and cannot arise. For these reasons, the natural response to encountering limitations of AIXI is to dig deeply into the mathematical details of those limitations and beyond. In fact, this is in line with the way that professional mathematicians and theoretical computer scientists would usually respond to analogous situations (computational complexity theorists are a particularly stark example since they have a habit of routinely turning their roadblocks into useful results). In this post I will attempt to address nearly every common objection to the AIXI model and (usually) explain the research directions that it opens (some of these are already listed in my more general and exhaustive list of open problems).

Embeddedness

General criticism: AIXI is not an embedded agent but a Cartesian dualist. It assumes it is separate from the environment, interacting with it only by passing actions and percepts across a well-defined channel. This is not a realistic model of an A.G.I. (particularly e.g. a robot) constructed within our universe.

General response: This is true; AIXI was constructed as the optimal reinforcement learning agent (at a time when RL seemed like a likely path to superintelligence) and probably for that reason it interacts with reality as if it were a game AIXI is attempting to learn to play. This does seem philosophically troubling and a direct naive AIXI implementation probably could not (and should not) become a singleton, but the same could be said for any other available paradigm. 

However, it seems likely to me, based on various live research projects (Self-AIXI, reflective oracles), that some embarrassingly small modification of AIXI can overcome or at least patch embeddedness problems and achieve superintelligence. There are also empirical reasons to expect this. Humans survived for thousands (?) of years as dualists before the materialist worldview started to become dominant. Evolution must have patched embeddedness problems (e.g. through pain) but apparently not very thoroughly, and we still eventually discovered neuroscience - though perhaps conscious belief is simply the wrong level of abstraction to discuss human epistemics. Beyond human intelligence, many seem to expect that LLM agents will soon become A.G.I., but the standard LLM training process doesn't seem to address embeddedness in any way. No major paradigm of ML research seems poised to overcome the philosophical problems of embeddedness, and if one expects (for instance) roboticists to naturally work around it, why not small patches of AIXI? If dealing effectively with embeddedness is a serious obstacle to e.g. recursive self-improvement it would be comforting to prove that (and would perhaps suggest a method for constructing useful but not highly dangerous  systems by avoiding the types of training that can select embedded agents). Unfortunately, this should not be too comforting because it is still conceivable that increasingly powerful A.I. systems will automate the construction of embedded successor systems once they strongly exceed human intelligence (or at least massively accelerate research in this direction). 

Cartesian boundary -> Nihilism

Source: https://www.lesswrong.com/posts/AfbY36m8TDYZBjHcu/aixi-and-existential-despair

Paul Christiano argues that an AIXI implementation would learn physics and discover that its actions (or perhaps the external manifestation of its actions) are best explained as computed by the computer it is running on because this is simpler than AIXI actually choosing them while its computer just happens to compute the same actions  (in a way the opposite of an embeddedness problem). In this case he expects it to become nihilistic and erratic because none of its actions are anticipated to have any effect, so it considers only wild hypotheticals where they might. 

There are two problems with this argument. 

First, AIXI does not work like this. Marcus Hutter extended Solomonoff induction to the RL setting by providing the actions as a free extra input to the environment's "chronological" Turing machine. As long as the actions AIXI sees its actuators performing in the world match its record of its own action choices, this is always a perfect explanation for their source which is strictly simpler than any physical explanation from its perspective. If AIXI notices that its observed actions sometimes fail to be optimal (because of tampering from the outside world, or perhaps because it is becoming smarter with time as it is given more compute and retroactively checks its previous actions) then it will correctly learn that the environment can affect the extent to which its actuators obey the AIXI policy. I have argued here that this is actually a positive which may allow approximations to (a small extension of) AIXI to succeed as embedded agents (which is actually in agreement with Christiano's argument that AIXI may realize it is embedded). 

The second problem is that even if @paulfchristiano were right, such "nihilistic" considerations would probably just cancel out of AIXI's utility function. That seems like the baseline expectation for an agent that believes there is some chance its actions don't matter and some positive chance that they do, and Christiano's argument to the contrary does not seem convincing to me.

Interestingly, there are AIXI-like models that do treat both their actions and percepts as one sequence and predict it with Solomonoff induction. This raises the question of how actions should be planned. The natural idea is one-step-ahead "Q-value" maximization, which I haven't seen published in that precise form (investigating its properties is an important research problem because it seems like it may be a closer model of LLM agents than AIXI). However, Self-AIXI and MIRI's reflective version of AIXI are similar. There is also a flawed attempt to use the joint distribution for planning in Hutter's old Universal Artificial Intelligence book (it is not a chronological semimeasure so can't reasonably be used in that way). I am working in this area currently. 

The Anvil Problem

Source: https://www.lesswrong.com/tag/anvil-problem

The argument is that because AIXI assigns 0 probability to being embedded in its environment, a direct approximation would happily drop an anvil on the computer running it. My post linked in the last section is actually a direct response to this criticism: https://www.lesswrong.com/posts/WECqiLtQiisqWvhim/free-will-and-dodging-anvils-aixi-off-policy

Code exposure

Source: https://www.lesswrong.com/posts/8Hzw9AmXHjDfZzPjo/failures-of-an-embodied-aixi

Nate Soares argues that AIXI / an AIXI approximation could not learn to negotiate with a "mind reader" with access to its code, because AIXI has no concept of the environment accessing its code directly. This means it could not, say, choose to internally execute a certain program in order to legibly pre-commit to a bargain.

I don't really understand the fascination with this kind of issue on lesswrong. Such situations seem to occur approximately never in practice (even the slightly more realistic example of an agent choosing to avoid expensive computations so that it does not produce too much heat seems far fetched). It's possible that pre-commitment and deception in humans are weakly analogous but the frame of "exposed code" seems like a serious exaggeration. 

There is a grain of truth here which I will discuss at greater length in the next section, but here I will focus very narrowly on the argument that @So8res presented.

The philosophical issue with the argument is that an embedded AIXI is not really possible. There is no code that implements AIXI since it is not computable. The idea of an adversary being given access to AIXI's code is therefore nonsense; Nate largely avoids this error by discussing a particular AIXI approximation called AIXItl. However, as will become apparent, it can matter how AIXI is approximated.

How would humans perform in this "exposed code" negotiation task? The closest analogue seems to be negotiating with (say) a doctor while your brain is being scanned. Perhaps the doctor will only agree to cure a life-threatening condition if the brain scan shows X. Clearly, for most values of X a human would not be able to perform well because we do not have fine-grained access to our own neurons. The best that one could do is probably to direct attention in some high-level way by choosing to think certain types of thoughts. Come to think of it, this sort of attention direction seems really useful for agents with bounded rationality. A practical AIXI approximation should probably also be equipped with this ability; perhaps an "inner" kernel/root-level direct AIXI approximation chooses the computations/thoughts of an "outer" wrapper and observes their progress and outcomes in addition to percepts from the environment. Essentially, a mental workspace might form an important extension of the action/percept space. While it does seem a little inelegant to stop here (rather than seek some kind of Goedelian unbounded stack of recursive levels) this basic suggestion already seems to take us to human-level performance on code exposure tasks. In fact, introspective access to one's thoughts is clearly necessary for any agent to succeed at such tasks, except in special cases. But no shift from the AIXI paradigm seems necessary.   

Functional decision theory > causal decision theory

Source: Eliezer Yudkowsky particularly here

Eliezer Yudkowsky claims that if we build an A.G.I. with the wrong decision theory (e.g. AIXI's causal decision theory) then some alien superintelligence can come along and take our lunch money through game-theoretic tricks such as (perhaps?) credibly threatening mutually assured destruction unless large bribes are paid out. The problem with this claim is that a superintelligence designed to use causal decision theory can recognize this threat and either keep its source code secret or design its successor system to be impossible to exploit (perhaps by using functional decision theory or some other idea we haven't come up with at our modest intelligence level). It doesn't seem necessary to resolve all game theory problems before building the first superintelligence, or even before building the first singleton.

Still, it is worth taking seriously whether AIXI's decision theory is in fact "wrong." Most objections to CDT take the form of some adversary gaining access to an agent's source code as in the previous section. The classic example is Newcomb's problem. It is certainly the case that there are some "exposed code" universes where CDT implementations do not perform optimally, but I suspect this is rarer than it's often made out to be. Usually one assumes that the adversary only cares about what policy the agent's code implements. For instance, Omega only cares whether you one-box or two-box, and not whether you are implemented in Java or C++. Since it is usually very hard to determine what a program will do without executing it, the adversary will usually simulate the agent. But as far as I can tell, a causal decision theorist does fine in this case because uncertainty about whether or not it is in a simulation causes it to act exactly as FDT would advocate (?). 

If the adversary is capable of using the agent's code without running it, and if such situations are a major factor in agent design, I suspect that we have left the realm of decision theory and entered the realm of engineering. Many precise details of the environment inform how we should build our agent, and I don't expect abstract general principles to be very important. In fact, if the agent's policy is not our main design consideration it may not even be meaningful to call it an agent - we are simply building a tool for some particular task (this is my objection to Orseau's space-time embedded intelligence). I do not expect any clean decision theory to  be useful in this case.

I have discussed the differences between optimal decisions, optimal policies, and optimal agent design at greater length here: https://www.lesswrong.com/posts/MwetLcBPvshg9ePZB/decision-theory-is-not-policy-theory-is-not-agent-theory

In the language of the linked post, attempts to construct an "ideal" decision theory often address agent theory problems with policy theory methods, which is a type error that suggests to me there may not be a "best" solution. This is probably why (as far as I am aware) there is no precise formulation of functional decision theory. 

AIXI seems like a good guiding principle exactly when decision theory is the right frame for agent design. Worrying about whether the environment will read an AIXI approximation's source code and attempt to manipulate it seems like a more prosaic concern, similar to tweaking AIXI's implementation to fit the conditions of our particular universe well. The former problem should be addressed with good operational security and cryptography, and the later should be addressed through e.g. feature engineering.  

Bounded Rationality

Source: https://www.lesswrong.com/posts/xJyY5QkQvNJpZLJRo/radical-probabilism-1

Abram Demski argues that Solomonoff induction is an unrealistic model of bounded rationality because it does not capture e.g. logical uncertainty. I think this is probably true, but not necessarily a criticism of the AIT/AIXI framework. In practice, approximations to AIXI must grapple with bounded computational resources, while AIXI provides a valuable "limit point." However, I am not convinced that the specific mechanism chosen for dealing with computational limits will be essential to understanding the important properties of early superintelligences, except possibly through the tiling concerns discussed in the next section. Certainly it is very difficult to predict at this point what form future bounded reasoners will take in detail; for instance, I am not comfortable abandoning Solomonoff induction in favor of  logical (Garrabrant) induction because I am not convinced the later has staying power (though I am still fairly enthusiastic about LI).  

Tiling / Recursive Self-improvement

Source: https://www.lesswrong.com/tag/tiling-agents

The tiling agenda is not a criticism of AIXI, but is connected to Nate Soares' criticism that AIXI is not useful for studying self-modification:

However, AIXI is not a good toy model for investigating the construction of a safe and powerful AGI. This is not just because AIXI is uncomputable (and its computable counterpart AIXItl infeasible). Rather, it's because AIXI cannot self-modify. This fact is fairly obvious from the AIXI formalism: AIXI assumes that in the future, it will continue being AIXI. This is a fine assumption for AIXI to make, as it is a very powerful agent and may not need to self-modify. But this inability limits the usefulness of the model. Any agent capable of undergoing an intelligence explosion must be able to acquire new computing resources, dramatically change its own architecture, and keep its goals stable throughout the process. The AIXI formalism lacks tools to study such behavior.

which is a part of (though arguably not tightly related to the rest of) his post discussed under the section "code exposure."

I agree that AIXI is not a model of self-improving A.I. I see some interesting research directions investigating how embedded versions of the AIXI model can cast light on recursive self-improvement[1], but the original AIXI model has nothing to say about it.

However, I don't view safe tiling as the primary obstacle to alignment. Constructing even a modestly superhuman agent which is aligned to human values would put us in a drastically stronger position and currently seems out of reach. If necessary, we might like that agent to recursively self-improve safely, but that is an additional and distinct obstacle. It is not clear that we need to deal with recursive self-improvement below human level. 

Tiling research is also sometimes justified in terms of humans "tiling ourselves," usually by building an A.G.I. that we hope shares our values. I am not sure that the tiling frame is useful here - this first step requires exactly a solution to the alignment problem, nothing more or less. For instance, I don't think of logical uncertainty about the computations of the A.G.I. as the primary difficulty - the primary difficulty is importing our values into its ontology.  

Solipsism (Malign Universal Prior)

Source: https://ordinaryideas.wordpress.com/2016/11/30/what-does-the-universal-prior-actually-look-like/

Paul Christiano argues that the universal distribution is "malign" or adversarial in some sense. His argument is fairly sophisticated but I will attempt to compress it. The universal distribution randomly samples programs and simulates them. Simpler programs are more likely, but runtime is not taken into account, so some likely programs might construct whole universes which eventually evolve (?) intelligent life, including agents and civilizations that may want to influence the universal distribution for their own ends. Possibly they believe in the mathematical universe (not Christiano's words) and realize that the universal distribution is a common tool for prediction, deduce which simple alternate universes will use it for prediction and under which precise circumstances, and somehow alter the properties of their own universe to shift those predictions (perhaps through a scheme to subtly "export their values" to another universe).

There are several difficulties with this argument, all of which are that there is no mathematical proof. People just say things about the universal distribution. It's nice that my research area is popular to talk about but please I am begging you to prove theorems. It is much harder to correctly intuit things about algorithmic information theory than many seem to think.

It is an interesting thought experiment though, so I will dig into it a little - but without doing the full conversation (on lesswrong) justice. To avoid confusion, the universe using the universal distribution will be called the Predictoria and the universe attempting to influence it will be called Adversaria. Since both might conceivably make predictions at some point, the terminology distinguishes our mode of thinking about each universe - residents of Predictoria are the ones worrying about whether the universal distribution is malign. Also I will call the universal distribution M[2].   

First, it is worth reframing this situation from Predictoria's perspective - according to M, the behavior of Adversaria is only relevant insofar as it predicts future observations. According to Bayes rule we strongly prefer explanations that explain past observations. This objective - predicting the future based on the past - is called the "prequential problem." This means that Predictoria (through M) cares about Adversaria only insofar as Adversaria is producing the history of Predictoria. In other words, Predictoria is worried that it is "inside of" Adversaria, running in a simulation[3]

That means that Christiano's argument seems to depend on (a version of) the simulation hypothesis. It is only reasonable insofar as M believes being in a simulation is likely - ordinary people don't seem to believe this, but perhaps smarter agents are more likely to, and perhaps the uniqueness[4] of the first A.G.I. makes it more likely to think so for anthropic reasons. This is one reason that Vanessa Kosoy recommends infra-Bayesian physicalism over M - apparently it can construct predictors that do not believe they are special (I do not understand infra-Bayesianism). My objection to this argument is that it not only assumes that Predictoria accepts it is plausibly being simulated by Adversaria, which seems like a pure complexity penalty over the baseline physics it would infer otherwise unless that helps to explain observations, but also that Predictoria is able to anticipate some departure from the baseline physics. This seems unlikely to weigh heavily on Predictoria for reasons of both epistemic and logical uncertainty - epistemic because various simulators may intervene at different points and in different directions, and logical because in practice Predictoria will never use precisely M but rather some approximation and it is hard to anticipate the actions of civilizations in long-running universes (@nostalgebraist argues this in more depth here, I believe correctly). Also as time goes to infinity in Predictoria, any departure from basline physics will look increasingly unlikely along the lines of Laplace's rule, though with a preference for departures at simple times. The whole situation recalls my beliefs about the existence of God - the God hypothesis seems either useless precisely because it can apparently explain anything[5] (in its deistic form) or repeatedly dis-confirmed (in its fundamentalist form).  

Basically, Christiano's argument seems weak and becomes less important as more observations are collected that do not favor the adversarial simulation hypothesis. 

Bad Priors

Source: https://proceedings.mlr.press/v40/Leike15.html

Jan Leike showed that with some aggressively bad choices of UTM, AIXI "misbehaves" in the sense that it executes some non-general policy such as always taking the same action. Relatedly, AIXI does not have any good convergence guarantees (except the self-optimizing property when the environment class in its Bayesian mixture is sufficiently restricted, in which case it is not fully general). Leike's result essentially showed that such guarantees (which were previously unproven) actually do not exist. This has been viewed as the end of the AIXI research program, or at least as showing AIXI is not a truly general intelligence.

I think this result is actually natural and not really an obstacle to the theory. It means that AIXI's prior is not some kind of platonic ultimate "ignorance prior," but the choice of UTM actually encodes some knowledge, which is not very surprising since a concept can intuitively be more or less simple depending on the choice of language used to express it. This same constant difference in complexities (and constant factor between universal distributions) appears also in the case of pure prediction, but Solomonoff induction still eventually converges to optimal prediction in any (lower semi)computable environment. The problem in the interactive case arises because AIXI will choose not to explore when it is judged too risky (negative expected value), meaning that the necessary data to correct its beliefs is never obtained. This seems like a reasonable feature not a bug. In fact, AIXI still learns to predict properly on-policy, which means that its misconceptions can still be corrected if we are willing to provide it an appropriate interaction history as data (instead of simply setting it loose to discover what its actions do). As far as I am concerned this is not a defect of the AIXI model but instead the best one can expect from a general agent - though perhaps there is still something to be said about more or less natural (ignorance?) UTMs. 

Conclusions

I am concerned about the proliferation of disjoint approaches to agent foundations. In particular, it seems to me that AIT/AIXI is both the natural paradigm and a sort of Schelling point. I am a particularly strong AIXI enthusiast (obviously) so my experience is not typical, but it does seem to me that AIXI is the most central frame in the sense that most agent foundations researchers have at least a passing familiarity with it and make light use of AIT at least for intuition. For this reason it is concerning how sparse the contributions of agent foundations research to the mathematical theory of AIXI have been. In a pre-paradigmatic situation like this, building out the fundamental underlying tools and concepts seems very important. 

 Perhaps the strongest criticism is that AIXI does not contain an ideal model for reflection/recursion, and I sympathize with the desire to set the complicating details of AIT aside and try to develop a perfect embedded decision theory. However, I suspect that this fixation on "infinitely recursive" epistemics is perhaps more intellectually stimulating than useful. As argued  briefly in the section on FDT, the embedded agency frame may not have a clean mathematical decision theory. Also, in practice the usefulness of "(thinking about)* thinking" probably degrades pretty quickly past the first level. I am more enthusiastic about work on self-reflection within the AIXI framework - I suppose my crux is that rather than adding needless conceptual complication, the richness of the AIXI model may be necessary to demonstrate the collapse of the recursive hierarchy, if this collapse takes place "in practice" for sensible agents.

  1. ^

    Primarily, as discussed in the general response to embeddedness concerns, it would be nice to show that direct AIXI approximations can't self-improve, and understand the minimal conditions that may give rise to self-improvement. Unlike (early but not current) MIRI, recursively self-improving  A.I. is something I want to delay.

  2. ^

    This is the standard terminology for the continuous version relevant to sequence prediction. Sometimes boldface is used.

  3. ^

    Rather than Adversaria pushing Predictoria around through some spooky under-the-table influence. As a general rule, Bayesians always believe stuff for a reason and their thoughts can't rebel against them - every possibility a Bayesian considers tracks something that might affect its interests in the real world. 

  4. ^

    If so, we might reduce this effect very slightly by running a few instances in parallel?

  5. ^

    Unverified relevant quote: https://www.goodreads.com/author/quotes/15281460.Laplace_Lagrange_Napoleon_Bonaparte

     



Discuss

OpenAI #10: Reflections

7 января, 2025 - 20:00
Published on January 7, 2025 5:00 PM GMT

This week, Altman offers a post called Reflections, and he has an interview in Bloomberg. There’s a bunch of good and interesting answers in the interview about past events that I won’t mention or have to condense a lot here, such as his going over his calendar and all the meetings he constantly has, so consider reading the whole thing.

Table of Contents
  1. The Battle of the Board.
  2. Altman Lashes Out.
  3. Inconsistently Candid.
  4. On Various People Leaving OpenAI.
  5. The Pitch.
  6. Great Expectations.
  7. Accusations of Fake News.
  8. OpenAI’s Vision Would Pose an Existential Risk To Humanity.
The Battle of the Board

Here is what he says about the Battle of the Board in Reflections:

Sam Altman: A little over a year ago, on one particular Friday, the main thing that had gone wrong that day was that I got fired by surprise on a video call, and then right after we hung up the board published a blog post about it. I was in a hotel room in Las Vegas. It felt, to a degree that is almost impossible to explain, like a dream gone wrong.

Getting fired in public with no warning kicked off a really crazy few hours, and a pretty crazy few days. The “fog of war” was the strangest part. None of us were able to get satisfactory answers about what had happened, or why.

The whole event was, in my opinion, a big failure of governance by well-meaning people, myself included. Looking back, I certainly wish I had done things differently, and I’d like to believe I’m a better, more thoughtful leader today than I was a year ago.

I also learned the importance of a board with diverse viewpoints and broad experience in managing a complex set of challenges. Good governance requires a lot of trust and credibility. I appreciate the way so many people worked together to build a stronger system of governance for OpenAI that enables us to pursue our mission of ensuring that AGI benefits all of humanity.

My biggest takeaway is how much I have to be thankful for and how many people I owe gratitude towards: to everyone who works at OpenAI and has chosen to spend their time and effort going after this dream, to friends who helped us get through the crisis moments, to our partners and customers who supported us and entrusted us to enable their success, and to the people in my life who showed me how much they cared.

We all got back to the work in a more cohesive and positive way and I’m very proud of our focus since then. We have done what is easily some of our best research ever. We grew from about 100 million weekly active users to more than 300 million. Most of all, we have continued to put technology out into the world that people genuinely seem to love and that solves real problems.

This is about as good a statement as one could expect Altman to make. I strongly disagree that this resulted in a stronger system of governance for OpenAI. And I think he has a much better idea of what happened than he is letting on, and there are several points where ‘I see what you did there.’ But mostly I do appreciate what this statement aims to do.

From his interview, we also get this excellent statement:

Sam Altman: I think the previous board was genuine in their level of conviction and concern about AGI going wrong. There’s a thing that one of those board members said to the team here during that weekend that people kind of make fun of [Helen Toner] for, which is it could be consistent with the mission of the nonprofit board to destroy the company.

And I view that—that’s what courage of convictions actually looks like. I think she meant that genuinely.

And although I totally disagree with all specific conclusions and actions, I respect conviction like that, and I think the old board was acting out of misplaced but genuine conviction in what they believed was right.

And maybe also that, like, AGI was right around the corner and we weren’t being responsible with it. So I can hold respect for that while totally disagreeing with the details of everything else.

And this, which I can’t argue with:

Sam Altman: Usually when you have these ideas, they don’t quite work, and there were clearly some things about our original conception that didn’t work at all. Structure. All of that.

It is fair to say that ultimately, the structure as a non-profit did not work for Altman.

This also seems like the best place to highlight his excellent response about Elon Musk:

Oh, I think [Elon will] do all sorts of bad s—. I think he’ll continue to sue us and drop lawsuits and make new lawsuits and whatever else. He hasn’t challenged me to a cage match yet, but I don’t think he was that serious about it with Zuck, either, it turned out.

As you pointed out, he says a lot of things, starts them, undoes them, gets sued, sues, gets in fights with the government, gets investigated by the government.

That’s just Elon being Elon.

The question was, will he abuse his political power of being co-president, or whatever he calls himself now, to mess with a business competitor? I don’t think he’ll do that. I genuinely don’t. May turn out to be proven wrong.

So far, so good.

Altman Lashes Out

Then we get Altman being less polite.

Sam Altman: Saturday morning, two of the board members called and wanted to talk about me coming back. I was initially just supermad and said no. And then I was like, “OK, fine.” I really care about [OpenAI]. But I was like, “Only if the whole board quits.” I wish I had taken a different tack than that, but at the time it felt like a just thing to ask for.

Then we really disagreed over the board for a while. We were trying to negotiate a new board. They had some ideas I thought were ridiculous. I had some ideas they thought were ridiculous. But I thought we were [generally] agreeing.

And then—when I got the most mad in the whole period—it went on all day Sunday. Saturday into Sunday they kept saying, “It’s almost done. We’re just waiting for legal advice, but board consents are being drafted.” I kept saying, “I’m keeping the company together. You have all the power. Are you sure you’re telling me the truth here?” “Yeah, you’re coming back. You’re coming back.”

And then Sunday night they shock-announce that Emmett Shear was the new CEO. And I was like, “All right, now I’m f—ing really done,” because that was real deception. Monday morning rolls around, all these people threaten to quit, and then they’re like, “OK, we need to reverse course here.”

This is where his statements fail to line up with my understanding of what happened. Altman gave the board repeated in-public drop dead deadlines, including demanding that the entire board resign as he noted above, with very clear public messaging that failure to do this would destroy OpenAI.

Maybe if Altman had quickly turned around and blamed the public actions on his allies acting on their own, I would have believed that, but he isn’t even trying that line out now. He’s pretending that none of that was part of the story.

In response to those ultimatums, facing imminent collapse and unable to meet Altman’s blow-it-all-up deadlines and conditions, the board tapped Emmett Shear as a temporary CEO, who was very willing to facilitate Altman’s return and then stepped aside only days later.

That wasn’t deception, and Altman damn well knows that now, even if he was somehow blinded to what was happening at the time. The board very much still had the intention of bringing Altman back. Altman and his allies responded by threatening to blow up the company within days.

Inconsistently Candid

Then the interviewer asks what the board meant by ‘consistently candid.’ He talks about the ChatGPT launch which I mention a bit later on – where I do think he failed to properly inform the board but I think that was more one time of many than a particular problem – and then Altman says, bold is mine:

And I think there’s been an unfair characterization of a number of things like [how I told the board about the ChatGPT launch]. The one thing I’m more aware of is, I had had issues with various board members on what I viewed as conflicts or otherwise problematic behavior, and they were not happy with the way that I tried to get them off the board. Lesson learned on that.

There it is. They were ‘not happy’ with the way that he tried to get them off the board. I thank him for the candor that he was indeed trying to remove not only Helen Toner but various board members.

I do think this was primary. Why were they not happy, Altman? What did you do?

From what we know, it seems likely he lied to board members about each other in order to engineer a board majority.

Altman also outright says this:

I don’t think I was doing things that were sneaky. I think the most I would say is, in the spirit of moving really fast, the board did not understand the full picture.

That seems very clearly false. By all accounts, however much farther than sneaky Altman did or did not go, Altman was absolutely being sneaky.

He also later mentions the issues with the OpenAI startup fund, where his explanation seems at best rather disingenuous and dare I say it sneaky.

On Various People Leaving OpenAI

Here is how he attempts to address all the high profile departures:

Sam Altman (in Reflections): Some of the twists have been joyful; some have been hard. It’s been fun watching a steady stream of research miracles occur, and a lot of naysayers have become true believers. We’ve also seen some colleagues split off and become competitors. Teams tend to turn over as they scale, and OpenAI scales really fast.

I think some of this is unavoidable—startups usually see a lot of turnover at each new major level of scale, and at OpenAI numbers go up by orders of magnitude every few months.

The last two years have been like a decade at a normal company. When any company grows and evolves so fast, interests naturally diverge. And when any company in an important industry is in the lead, lots of people attack it for all sorts of reasons, especially when they are trying to compete with it.

I agree that some of it was unavoidable and inevitable. I do not think this addresses people’s main concerns, especially that they have lost so many of their highest level people, especially over the last year, including almost all of their high-level safety researchers all the way up to the cofounder level.

The Pitch

It is related to this claim, which I found a bit disingenuous:

Sam Altman: The pitch was just come build AGI. And the reason it worked—I cannot overstate how heretical it was at the time to say we’re gonna build AGI. So you filter out 99% of the world, and you only get the really talented, original thinkers. And that’s really powerful.

I agree that was a powerful pitch.

But we know from the leaked documents, and we know from many people’s reports, that this was not the entire pitch. The pitch for OpenAI was that AGI would be built safely, and that Google DeepMind could not to be trusted to be the first to do so. The pitch was that they would ensure that AGI benefited the world, that it was a non-profit, that it cared deeply about safety.

Many of those who left have said that these elements were key reasons they chose to join OpenAI. Altman is now trying to rewrite history to ignore these promises, and pretend that the vision was ‘build AGI/ASI’ rather than ‘build AGI/ASI safety and ensure it benefits humanity.’

Great Expectations

I also found his ‘I expected ChatGPT to go well right from the start’ interesting. If Altman did expect it do well and in his words he ‘forced’ people to ship it when they didn’t want to because they thought it wasn’t ready, that provides different color than the traditional story.

It also plays into this from the interview:

There was this whole thing of, like, “Sam didn’t even tell the board that he was gonna launch ChatGPT.” And I have a different memory and interpretation of that. But what is true is I definitely was not like, “We’re gonna launch this thing that is gonna be a huge deal.”

It sounds like Altman is claiming he did think it was going to be a big deal, although of course no one expected the rocket to the moon that we got.

Accusations of Fake News

Then he says how much of a mess the Battle of the Board left in its wake:

I totally was [traumatized]. The hardest part of it was not going through it, because you can do a lot on a four-day adrenaline rush. And it was very heartwarming to see the company and kind of my broader community support me.

But then very quickly it was over, and I had a complete mess on my hands. And it got worse every day. It was like another government investigation, another old board member leaking fake news to the press.

And all those people that I feel like really f—ed me and f—ed the company were gone, and now I had to clean up their mess. It was about this time of year [December], actually, so it gets dark at like 4:45 p.m., and it’s cold and rainy, and I would be walking through my house alone at night just, like, f—ing depressed and tired.

And it felt so unfair. It was just a crazy thing to have to go through and then have no time to recover, because the house was on fire.

Some combination of Altman and his allies clearly worked hard to successfully spread fake news during the crisis, placing it in multiple major media outlets, in order to influence the narrative and the ultimate resolution. A lot of this involved publicly threatening (and bluffing) that if they did not get unconditional surrender within deadlines on the order of a day, they would end OpenAI.

Meanwhile, the Board made the fatal mistake of not telling its side of the story, out of some combination of legal and other fears and concerns, and not wanting to ultimately destroy OpenAI. Then, at Altman’s insistence, those involved left. And then Altman swept the entire ‘investigation’ under the rug permanently.

Altman then has the audacity now to turn around and complain about what little the board said and leaked afterwards, calling it ‘fake news’ without details, and saying how they f***ed him and the company and were ‘gone and now he had to clean up the mess.’

OpenAI’s Vision Would Pose an Existential Risk To Humanity

What does he actually say about safety and existential risk in Reflections? Only this:

We continue to believe that the best way to make an AI system safe is by iteratively and gradually releasing it into the world, giving society time to adapt and co-evolve with the technology, learning from experience, and continuing to make the technology safer.

We believe in the importance of being world leaders on safety and alignment research, and in guiding that research with feedback from real world applications.

Then in the interview, he gets asked point blank:

Q: Has your sense of what the dangers actually might be evolved?

A: I still have roughly the same short-, medium- and long-term risk profiles. I still expect that on cybersecurity and bio stuff, we’ll see serious, or potentially serious, short-term issues that need mitigation.

Long term, as you think about a system that really just has incredible capability, there’s risks that are probably hard to precisely imagine and model. But I can simultaneously think that these risks are real and also believe that the only way to appropriately address them is to ship product and learn.

I know that anyone who previously had a self-identified ‘Eliezer Yudkowsky fan fiction Twitter account’ knows better than to think all you can say about long term risks is ‘ship products and learn.’

I don’t see the actions to back up even these words. Nor would I expect, if they truly believed this, for this short generic statement to be the only mention of the subject.

How can you reflect on the past nine years, say you have a direct path to AGI (as he will say later on), get asked point blank about the risks, and say only this about the risks involved? The silence is deafening.

I also flat out do not think you can solve the problems exclusively through this approach. The iterative development strategy has its safety and adaptation advantages. It also has disadvantages, driving the race forward and making too many people not notice what is happening in front of them via a ‘boiling the frog’ issue. On net, my guess is it has been net good for safety versus not doing it, at least up until this point.

That doesn’t mean you can solve the problem of alignment of superintelligent systems primarily by reacting to problems you observe in present systems. I do not believe the problems we are about to face will work that way.

And even if we are in such a fortunate world that they do work that way? We have not been given reason to trust that OpenAI is serious about it.

Getting back to the whole ‘vision thing’:

Our vision won’t change; our tactics will continue to evolve.

I suppose if ‘vision’ is simply ‘build AGI/ASI’ and everything else is tactics, then sure?

I do not think that was the entirety of the original vision, although it was part of it.

That is indeed the entire vision now. And they’re claiming they know how to do it.

We are now confident we know how to build AGI as we have traditionally understood it. We believe that, in 2025, we may see the first AI agents “join the workforce” and materially change the output of companies. We continue to believe that iteratively putting great tools in the hands of people leads to great, broadly-distributed outcomes.

We are beginning to turn our aim beyond that, to superintelligence in the true sense of the word. We love our current products, but we are here for the glorious future. With superintelligence, we can do anything else. Superintelligent tools could massively accelerate scientific discovery and innovation well beyond what we are capable of doing on our own, and in turn massively increase abundance and prosperity.

This sounds like science fiction right now, and somewhat crazy to even talk about it. That’s alright—we’ve been there before and we’re OK with being there again. We’re pretty confident that in the next few years, everyone will see what we see, and that the need to act with great care, while still maximizing broad benefit and empowerment, is so important. Given the possibilities of our work, OpenAI cannot be a normal company.

Those who have ears, listen. This is what they plan on doing.

They are predicting AI workers ‘joining the workforce’ in earnest this year, with full AGI not far in the future, followed shortly by ASI. They think ‘4’ is conservative.

What are the rest of us going to do, or not do, about this?

I can’t help but notice Altman is trying to turn OpenAI into a normal company.

Why should we trust that structure in the very situation Altman himself describes? If the basic thesis is that we should put our trust in Altman personally, why does he think he has earned that trust?



Discuss

Other implications of radical empathy

7 января, 2025 - 19:10
Published on January 7, 2025 4:10 PM GMT



Discuss

Страницы