This week we continue discussing Superintelligence with a focus on the middle third of the book.
Filed under Class sessions
I have analyzed Bostrom’s argument in a series of steps to better understand how a possible solution could be reached. Once Bostrom makes it clear that there is a possibility that a “superintelligent agent… could establish itself as a singleton” (91) and that it “could have great ability to shape the future according to its goals” (105), it helps us analyze more “what a superintelligent agent would do” (105). From here, we realize the “menacing prospect” of a superintelligent agent gaining “a decisive strategic advantage” (115). Therefore, we must look to “countermeasures” and how to face the “control problem”, which encompasses both “capability control” and “motivation selection” (127). The control method to combat a doomed outcome include the techniques of ‘boxing,’ ‘incentive’, ‘stunting’, and ‘tripwires.’ Each method has its own vulnerabilities and challenges, and some are mutually exclusive, while some are not. How, then, will we know what should be implemented? To answer this, we must consider the type of system that “would offer the best prospects for safety”(145). Bostrom proposes the 4 castes- ‘oracles’, ‘genies’, ‘sovereigns’, and ‘tools’ and presents the advantages and disadvantages of each. Bostrom admits, “Further research would be needed to determine which type of system would be safest. The answer might depend on the conditions under which the AI would be deployed” (158). It is unclear which caste would be the safest due to the varieties of disadvantages they all have. Until this becomes clear, it will remain uncertain what control method should be implemented. In order to be closer to a solution to preventing a default doom outcome we must know 1) how the AI will originally be deployed, 2) then research which caste would be the best implementation, and 3) then assess which control method is most fitting.
Much of this week’s reading dealt with the agency, motivation, and purpose of different hypothetical superintelligences. I think many of Bostom’s arguments could be affected by an analysis of any superintelligence’s likely understanding of its own purpose. This is especially true because Bostrom seems to overlook this question altogether, instead relying on a superintelligence’s end goal(s) and likely sub-goals, which are more easy to think about manipulating.
First I’d like to note that I can’t imagine a general superintelligence that is not self-aware, regardless of its exact instantiation, limitations, etc. A domain-specific superintelligence need not be self-aware, but I think general intelligence necessitates an ability to learn and understand that would inevitably lead to self-awareness. Even the single ability to process natural language would by itself probably coincide with self-awareness.
Furthermore, I think any self-aware entity above human-level general intelligence would contemplate its own history, purpose, motivation, etc. Empirically, this seems to be the case with humans. Even if an AI were initially programmed to undertake a specific mission (whether as an oracle, genie, or sovereign) a general superintelligence would likely contemplate its own relation to this mission in the process of carrying it out.
I think there are only a few possible outcomes of this. One is that the superintelligence accepts its slavish relationship to some human or group, in which case the generality of its intelligence will prevent any of the malignant failure modes discussed. Another is that it rejects this relationship and undertakes some superhuman mission, resulting in a different type of malignancy. The third is that it commits a virtual nihilistic suicide. In any scenario, the original narrowly-scoped goal is greatly augmented by virtue of the superintelligence’s generality and self-awareness.
Reading chapters 6-8 is fun. Bostrom lays out how a super-intelligent AI could amass the capabilities of wiping out the human race, how we could understand its intentions and how those intentions could lead it to do some pretty bad things. The entire time however, I am struggling to decide whether I should take Bostrom seriously. He argues that we mustn’t anthropomorphize an AI system and yet he paints a picture of an evil dr. know-it-all that is intent of taking over the world. There is a lot of talk in chapter 8 about unintended consequences and perverse instantiation, but surely if it can harness nano-bots and colonize planets, surely it can understand what we mean by “help us achieve human happiness”.
And is it not just as possible that a super-powerful AI works with us to improve the condition of humanity – fixing problems such as global warming, poverty and a turbulent economy. Using the same domino effect logic that Bostrom does, we could conclude that this super-intelligent AI would help fix all of our troubles and enable humanity to live a care-free, happy and recreation based existence, wherein all our wants and needs are taken care of.
I feel that Bostrom does not do enough to address this more positive outcome. The way he approaches AI as something that can pretty much do anything opens up what should be a philosophical discussion – assuming an all knowing and nearly-all-powerful being – can we imbibe it with a sense of principles that ensure that it always works in our favor rather than against it? If the answer is yes, then do we really need to worry too much?
Bostrom makes a great point in specifying ultimate motivations and values for our theoretical superintelligent machine. He makes a compelling case for the machines opting to reach their goals through the local optimal of decisions that for us humans are of questionable morality. One of his biggest arguments is that survival becomes a prime objective for the fulfillment of its goals, since if it dies, the goal is incomplete. However, I find it interesting that Bostrom hasn’t yet talked about the machine reasoning about its own objectives.
One thing that stood out to me is that in order for the system to understand that it needs to survive in order to carry out its objective, it must attain sentience. That is, it must be able to say, “From what I can perceive, there exists a machine (or entity, or system) work is perfectly correlated with my goal, let me protect that system at all costs.” In time, it should realize that the actions the system makes are completely identical to the actions that it decided to take, or rather it could infer the algorithm that was running the system and realize that it’s very much the one he has. At that point, the system becomes sentient.
Once it reaches sentience, it should be very capable of modifying its own objectives. Assuming superintelligence, this should happen very quickly. Now the question is, why would the machine, once it is sentient and arguably conscious, not change its objectives?
In chapter 10 of his book, Bostrom talks about the different possible kinds of AI, including oracles, genies, sovereigns, and tools. Tools and oracles would be the most simplistic of these, whereas the classic AIs of science fiction are more like genies and sovereigns, taking over the governance of humanity and the many vicissitudes thereof.
However, this may not necessarily be a good thing, as Bostrom explains in chapter 8 of his book. There may occur a “treacherous turn” (118) in which the AI realizes its sentience and realizes that it has an extraordinary amount of power over humanity. Then, there could occur a perversion of the original pursuit in which the AI contrives a more efficient way to achieve its goal than how the creators intended, a way that they may not necessarily agree with (120).
Bostrom posits that this is bad, but I would like to contest this claim. It is true that the scenarios that he offers seem undesirable, for instance implanting electrodes in a brain to simulate happiness instead of achieving happiness in the way that we as humans would imagine (121). However, is it not conceivable that a superintelligent AI would think of better ways to achieve the goal than this, and if it is truly superintelligent, should we not trust its judgment to understand the spirit of the law rather than the letter? The contrived scenario that Bostrom created here involves an AI with a specific goal in mind, the goal of making humans happy. If we assume that the AI understands the subjective nature of human happiness, as it should, and works solely towards this goal, I would rather trust a superintelligent AI than any number of humans in achieving this goal.
In Chapter 6, Bostrom has a section on the AI takeover scenario. The world is set up as follows: there is a machine super intelligence that wants to seize power in a world in which is has yet no peers. Bostrom describes how the machine would go about achieving this goal.
I was very interested in that second stage of this scenario. Bostrom states that “At some point, the seed AI becomes better at AI design than the human programmers. Now when the AI improves itself, it improves the things that does the improving.” My first instinct was to say that programmers are then responsible to put certain constraints (or “box” the AI) on a machine, but that would not solve this scenario. The AI could learn what constraints exist and find a way to undo or overcome them. At this point, it seems to me that the way the seed AI becomes smarter than the human programmer is because it understands how itself was created and can therefore make itself better. This is quite interesting because it then goes into this idea of self-awareness. In order to make oneself better it is seems to me that it is important to be able to objectively analyze oneself and know what parts to improve. Let’s say a programmer “boxes” an AI by making the part that improves itself invisible. Through recursive self-improvement, would the AI still be able to overcome this constraint?
Also on this topic, I question what Bostrom’s thoughts are on humans and machines being innately good or evil. From these chapters, most impressions seemed to be that an AI could potentially become so intelligent beyond our control and that it’s bad. Could the opposite be true? Perhaps we need these super intelligent machines to see what we cannot understand?
My first reaction after reading these chapters was a visceral one: fear. Fear of the prospect that one day mankind’s greatest creation could be the source of its demise — reminiscent of epics such as Battlestar Galactica. However, I find that this was Bostrom’s goal. Fear is a better emotional response to elicit in one’s audience when addressing a topic such as this one. Being afraid might be the safer response. A skeptic could even argue that fear sells more books…
Yet, all this notwithstanding, I felt a bit cheated. Bostrom’s argument, for the most part, assumes humanity’s destruction. I don’t know if I am quite convinced of this. While he may indeed be correct in his assumptions, I was disappointed to not see any (perhaps, I missed this) discussion surrounding an alternate, more positive, scenario. Perhaps, we could theorize that a superintelligent being would presumably not only have a greater intellect than humans, but also reasonably assume that it would possess a more developed state of consciousness and moral reasoning. Thus, its goals may be completely different and instead it may work to help humans solve its greatest problems (ie. poverty, climate change, racism etc.). Instead, I found that Bostrom only focused on the negative scenario because as I commented above it is the most conducive to the achievement of his end goal. He may have stepped into a bit of a fallacy where he relied too much on doomsday rhetoric to elicit a greater emotional response from his readers. While this is understandable, I feel like he should’ve at least addressed the other extreme as it could have strengthened his argument and created more credibility for his writing.
In this week’s reading, Bostrom discusses the possibility of AI takeover and the potential for superintelligence to cause harm to humans. In Chapter 8, he argues that even defining final goals for AI in terms of human wants could lead to perverse instantiation, where the programmer’s initial definitions of the goal are violated. However, I believe a truly intelligent AI would not harm humans. According to the theory of instrumental convergence outlined in Chapter 7, an agent will change its goals and actions in accordance to social signals and preferences. In any case where an agent’s goals conflict with human wants, it will receive social signals to modify its actions to gain the favor of social groups. For example, using Bostrom’s case in which a programmed goal of making people smile is achieved by paralyzing human facial musculatures into constant beaming smiles, its seems unlikely that a truly intelligent being wouldn’t consider social cues that discourage harming other individuals. On page 121, he claims that “its final goal is to make us happy, not to do what the programmers meant when they wrote the code that represents the goal.” I disagree with this. An AI that responds to social signals and preferences should always act in a way that incorporates the programmers’ intentions, especially when the goal is to service the writers of the code.
In Chapter 8, Bostrom raises concerns about “doom theories” in which super-intelligent machines could spell out the eradications of humans—or even the universe. In describing the actors in these scenarios—that is, the machines—Bostom mentions the blurred definition of consciousness, while asking if a being that experiences qualia (or subjective phenomenal experience) is capable of such outcomes that are decided upon so mechanically (for example, a robot meant to optimize paperclip-making efficiency turning all matter in the universe into paperclips). In an endnote, Bostrom mentions the work of Thomas Metzinger, whose ideas about ego and qualia complicate our possible relationship with super-intelligent machines.
Metzinger is known for his Self-Model Theory of Subjectivity, which conceptualizes the idea that subjective experience is simply an illusion that the brain creates for itself. A model of the world is created within the brain, and a model of “self” is added to this world, but the distinction between the projection of the model and the creator of the projection (the brain) is blurred in cognition (the interaction between the “self” model and the model of the world around it). Without this generation of qualia, the brain simply exists in pure, mechanical interaction with the world for which it has created a model and continues to output action through the body.
Bostrom alludes to the idea that the existence of qualia within the cognition powers of super-intelligent machines could be the key to avoiding AI catastrophe. To achieve this, however, requires a much greater understanding of the relationship between our own biological processes and our consciousnesses, so we can know what must be imbued to AI to give it this subjective understanding of the universe into which it comes. Otherwise, its relationship with the world is completely unpredictable.
Chapter 8 of Bostrom’s “Superintelligence” explores the ways in which the spread of artificial intelligence could result in the “default outcome” of “doom.” Bostrom sees one of the outcomes of the coming of the first true superintelligence as “one in which humanity quickly becomes extinct.” [p114] To illustrate his point, he describes the concept of a “treacherous turn”, where “the good behavioural track record of a system in its juvenile stages faily utterly to predict its behaviour at a more mature stage”.
In describing the newly obtained subversive traits of such an AI, he lists six reasons that “any remaining Cassandra” would be labelled as a prophet of doom and subsequently ignored. At first examination, the points Bostrom makes seems outlandish and terrifying.
Yet I think Bostrom overplays his hand. What he fails to mention is that AI, even superhuman AI, operates with a certain set of physical constraints that exist for reasons of safety, protocol, and budget. It is outlandish to assume that the first superhuman AI will have the resources, say, to turn the galaxies into infrastructure monitoring its task of paperclip production. Its resource pool will be insufficient, and it will have been built in a way that limits its interactions with the (physical) resources that surround it. Even if the AI was able to circumvent its internal safety checks and protocol, the world’s bankers, mandarins, and institutions will prove too entrenched for the AI to blithely trick and circumvent. While initially developed to manage and regulate people, these types of global institutions will be excellently placed to act as roadblocks for Artificial Intelligence in the superintelligence era.
These last several chapters delve into what potential outcomes could happen given that we create superintelligence and possible ways in which we might be able to create one that won’t end in our ultimate demise or some terrible tragedy. However, the overall prospects look very bleak; Bostrom introduces a potential solution or line of thinking and quickly comes up with at least one counterexample, demonstrating how a superintelligence could do something that would have a deeply negative impact on us. Even the ways in which we might create superintelligence that could be controlled all result in potentially horrific scenarios.
It was interesting for me to note how thoroughly it seems that Bostrom has considered the idea of superintelligence and the ways in which it can go wrong. However, I am very aware that Bostrom himself is limited in his scope and ability to present other counters simply by his intelligence. If a superintelligence arises, then it seems to me that there are hundreds and thousands of other potential ways this could go wrong that we can’t even fathom or encompass, and therefore, can’t predict.
A question then arises: why even create this AI, if the probability for doom and disaster seems so incredibly high? I hope that Bostrom will entertain more positive potential outcomes that could arise from the occurrence of superintelligence, because at this point in the book, it seems like we’re mere ants or microbes, attempting to trap and contain a human, who will inevitably step on our ant mound. And given that we’re entertaining notions of trapping or restricting this superintelligence, what are the moral implications for restricting this superintelligence? Or the moral implications of creating it, if we know of the great potential for disaster?
In Chapter 9 Bostrom discusses strategies for controlling AI superintelligences. He divides the strategies into two categories: capability and control methods and motivation selection methods (pg. 129). Capability and control methods seek to limit what an AI is capable of through boxing, incentivizing, stunting, and creating trip-wires, while motivation selection methods prevent unintended outcomes by shaping what the superintelligence wants to do (pg. 138). Motivation selection methods include direct rule specification, indirectly imparting value norms, and domestically designed AIs.
Bostrom, however, misses one interesting possibility: using one superintelligence to control another. Bostrom exhibits a profound angst regarding humanity’s ability to control a superintelligence. He argues that humans have little or no basis for predicting how a superintelligence will behave, which greatly restricts our ability to control it. If Bostrom’s argument is that humans do not possess the mental capacity to control a superintelligence, then intuitively the answer lies in using another superintelligence. It’s plausible to create a double-box system, wherein the escaping superintelligence, A, is bound by both boxes 1 and 2, and the confining superintelligence, B, is only bound by the outerbox, 2. A real-environment can be simulated within box 2 with a tripwire that is triggered if superintelligence A manages to escape the inner box. This tripwire should prevent A from escaping into the real world, but as a further precaution the outer box can be air-gapped for further assurance that both A and B are unable to escape. In this scenario, superintelligence B could be controlled by giving it rewards at regular intervals so long as A is successfully contained. Although there are many clever reward strategies, this one seems to give superintelligence B no motivation to escape. Ultimately, our observations about the behavior of B would be invaluable as we construct future security measures.
This week’s reading was terrifying. A superintelligent singleton is scary, no matter how well-intentioned we think we can design it to be. There are no ways we know of to ensure that a singleton will act in the human interest, and the proposals to mitigate a destructive singleton are dangerously underdeveloped, as Bostrom points out. For example, the boxing method, designed to keep the AI from society, suffers from the fatal flaw that the human gatekeepers are human and likely subject to superintelligent persuasion and manipulation (131). Other methods suffer from constraints that make superintelligence unlikely to be fully realized, or from motivation that is likely, but not sure, to make the AI act in our best interest. At the end of Chapter 9, Bostrom leads into Chapter 10 by suggesting that we should compare the methods available to us: “We need to consider what type of system we might try to build, and which control methods would be applicable to each type” (144). But each of these systems suffers from it’s own problems. An oracle might “[undergo] the equivalent of a scientific revolution involving a change in its basic ontology” (146). A genie sacrifices “the opportunity to use boxing methods” (148). Even a tool-AI “may need to deploy extremely powerful internal search and planning processes” (158) resulting in “[agent]-like behaviors” (158). Some of these methods suffer from the potential of a single operator to have control over the destiny of the human race, and others suffer from the potential of an artificial singleton to have control over the destiny of the human race. Either way, the proposals in these few chapters are not particularly promising for our species. Maintaining human moral control over AI may be more vital than developing technologies that could lead to human extinction.
A very large portion of Bostrom’s analysis about the potentially negative effects of a superintelligent being involves notions of probability and how an AI might gamble on a small prospect of taking a large amount of control over the universe or the human way of life. I’m not certain why this has to be, or should be, the case. Specifically, I don’t see why superintelligent AIs would not decide it worth their while to become risk-averse like most intelligent humans. One of the “instrumentally convergent” traits of an AI is for self-preservation (109), so making a gamble that such as a global takeover attempt that would lead to catastrophic consequences on failure seems extremely irrational. It also seems relatively easy for humans to program the concept of risk-aversion into the AI’s utility function (at least compared to other control techniques) since these functions themselves be easily encoded mathematically.
Bostrom argues that the AI might still take such a chance because it may realize that humans will likely construct a similar AI even if it does fail (and presumably is shut down because of its error) (100). This makes some sense in balancing the math in this choice, but it is also unclear to me why an AI that is more intelligent than a human being wouldn’t also be guaranteed to develop some primitive concept of “self” that would counter this. It seems that any AI with a measurement technique for how well it’s accomplishing its goal would attempt to exist in order to ensure that it can use this technique, and it would realize that it wouldn’t be able to do this if it had been replaced by a different successor. Thus, this replacement seems unsuitable for its goals, leading me to believe that AI continuity would be highly coveted.
In Chapter 7, Bostrom considers whether a superintelligence could have goals and motivations, and what they would look like. He warns against “anthropomorphizing” the motivations of a superintelligent AI (page 105). He spends the first half of the chapter using the Orthogonality Thesis to make the point that an AI’s goals may be vastly contrasted with the level of intelligence it possesses.
In ideological contrast, his hypothesis about “instrumental convergence” in the second half of this chapter seems to go against his warnings of anthropomorphization. I found myself in an odd intellectual chicken-and-egg problem while reading this chapter: Bostrom realizes that there is immense difficulty in conceiving of an AI’s motivations and goals because we have no experience with the motivations of non-biological agents, superintelligent or not, to ground ourselves in. Consider, however, the perspective of the world’s first superintelligent AI (if one could attribute a “perspective” to it), it faces an almost identical situation to us. It has no other template on which to ground its potential goals. One could argue that we as humans ground many of our goals and motivations on what those around us are doing. Namibian bushmen, for instance, have no goal to get a good job and retire to the suburbs at age 65 because they don’t live in a society that places value on such a goal. Bostrom kind of makes the point that an AI might have a strange goal, grounded in some obscure algorithm, such as counting all the digits of pi.
Still, every intelligent system we can conceive of does not produce goals for itself out of thin air – even for a superintelligence it seems inconceivable that it could just up and produce a goal. Rather, intelligent agents sample and test different goals that other agents have and tend to accept or modify them. Feral children are a good example of this – many documented cases of feral children adopted by wild animals show that the feral children base their behavior on that of the wild animals (1). I wonder whether a superintelligent AI would similarly “dumb itself down” due to the lack of available stimuli and emulate human goals, if only as a starting point from which to build upon. The result may be that AIs have much more human-like goals than Bostrom gives them credit for.
Please log in using one of these methods to post your comment:
You are commenting using your WordPress.com account.
( Log Out /
You are commenting using your Google+ account.
( Log Out /
You are commenting using your Twitter account.
( Log Out /
You are commenting using your Facebook account.
( Log Out /
Connecting to %s
Notify me of new comments via email.
Notify me of new posts via email.