This week we conclude our discussion of Superintelligence by Nick Bostrom.
Filed under Class sessions
In Chapter 13, Bostrom discusses how we should fundamentally model an AI’s ability to make decisions, ultimately with the intention of avoiding catastrophes against humanity. His mention of morality models poses integral questions that are not readily solved. One such example is the “trolley problem” within the case of self-driving cars (I believe we have mentioned this example in class already—if not, I will explain). These types of situations display the need for philosophers within high-powered tech companies, but even then, ethical solutions to modern problems are not so easy to come by (e.g. only 68% of philosophers in a 2013 study agreed to pull the lever within the trolley problem).
With both this kind of dissent and the natural existence of human error, one can call into question to what extent we want humans at the helm of artificially intelligent decisions. At the end of the chapter, Bostrom illustrates a scenario in which, by the guide of an oracle AI to tell us what our robots will do in any given situation, we can “ratify” certain AI decisions and prevent others from happening. Are there scenarios in which our ability to reverse or stop AI decisions will have negative effects on humanity? Will AI know what is good for us, better than humans do? This may pose some uncomfortable existential questions as well, especially concerning our effect on our biosphere and how relatively positive an outcome human extinction would be for our environment. In this way, programming and overseeing AI to never harm humans, as one of its core tenets, may be going directly in contradiction to how a naturally intelligent being would make decisions in our world.
Life on earth has been predicated on the principle of evolution. Archeological records show that modern humans only came to be a few hundred thousand years ago; relatively nothing compared to the vast scale of time preceding our existence. And, thus, to assume that humans will be unchanged from their present state for the rest of time from here onwards, I find, is naive. Just as early hominids paved the path for our existence, this book has left me wondering whether we are simply paving the path for a future species. The big question I have been tackling is what if the next logical step in human evolution is the arrival of a superintelligent being. What if a we are simply the forebearers, or parents, if you will, of a strongly intelligent AI that will be better equipped to carry forward the torch of humanity.
In some way, I found that Bostrom’s book automatically pits humans versus superintelligent machines in an antagonistic setting. However, I think that one can make the argument that the two are not as dissimilar as Bostrom’s book makes it out to seem. Obviously, this is a deeply philosophical train of thought that is far too complicated to develop in 300 words. I am not saying it is the right way of processing the prospect of superintelligence, however, I think it is valid enough to have been considered by Bostrom. Bostrom’s argument naturally plays on people’s desire for self-preservation to further his cautionary tale. All I want to point out is that, maybe, in some possible world, a superintelligent being is simply the next step in human evolution and maybe this way of framing the argument changes the way we think about it; instead of fearing it, maybe we embrace it?
In his final chapter entitled Crunch Time, Bostrom gives a brief overview of his thoughts on artificial intelligence and discusses potential strategies for future development, and how we might address the imminent possibility (or threat) of a superintelligence.
In the section entitled “philosophy with a deadline”, he explains how we could potentially defer all other progress until after the development of artificial intelligence because a superintelligence would make all this progress much easier and faster. He calls this “a strategy of deferred gratification” (page 256). However, we, as humans, much prefer instant gratification, and therefore would never be content with this. We would not accept deferring all scientific and humanitarian progress until the arrival of a superintelligence because we would be too busy attempting to solve the problems ourselves.
Bostrom does acknowledge this aspect of human nature when he makes the analogy of artificial intelligence being like a bomb, and humans being like children poking at it until something happens (259). Faced with the prospect of a superintelligence, we would throw caution to the wind and focus all our efforts on simply achieving it, regardless of the consequences. This is why I agree with Bostrom when he says that more efforts should be made to ensure that artificial intelligence is safe to use before we develop it. Even more, I think that the idea of an artificial intelligence gaining control and possibly abusing its power over humanity in any of the ways Bostrom has previously described (perverse instantiation, turning the universe into paperclips…) is so pressing, that before any progress is made on actually achieving machine intelligence, humanity should figure out how to control it.
In Chapter 11, Bostrom discusses the possible consequences of multipolar outcomes on elements of human experience such as society and wealth. He warns us of the potential machine substitutions for human labor and how that might lead to poorly distributed wealth and widespread poverty. He even describes a scenario where the majority of people’s funds are so depleted that they resort to stunting their growth and slowing their metabolism to lower their cost of living. While he does consider redistribution of wealth, capital income, and niche human work, he did not give attention to two possible positive implications of an intelligence explosion that would be relevant to this chapter.
First, highly efficient machine workers would increase production and thus, lower costs for the capital-holders. And so, they could afford to sell their products at reduced costs, lowering living expenses for consumers. For example, food suppliers could offer their goods for cheaper prices. Another way an intelligence explosion could lower the cost of living and actually support the well-being of biological humans is in the medical arena. Devices to detect and treat illness more effectively and cheaply would ultimately reduce personal, private, and public health care spending and free more consumer funds. Second, these technological advancements could actually create jobs for humans. New machines could be supported by human labor in production, maintenance, regulation, and design.
At the very end of chapter 15, Bostrom makes an analogy comparing humans and artificial intelligence to small children playing with a ticking bomb. He claims that super intelligence is something that we are not ready for now and will not be ready for for a long time. At the same time, however, we all know that the sensible thing to do when a child has a bomb is to put it down and back away. Yet, Bostrom claims that very little people will be able to do that and in his words, “some little idiot is bound to press the ignite button just to see what happens.”
What I found interesting was that prior to that section, Bostrom also states “we want to work on problems that are elastic to our efforts at solving them. Highly elastic problems are those that can be solved much faster, or solved to a much greater extent, given one extra unit of effort.” He gave the example of “encouraging more kindness in the world.” Although Bostrom backs that claim up by saying that bigger problems are more complex to tackle, it reminded me of the analogy of the child and idea of the immaturity of our conduct – tackling the smaller problems rather than looking at larger complex issues. I had many questions when Bostrom said that we should aim for more elastic problems. First, I wondered how he defined kindness in the context of bringing more kindness into the world. Kindness, like many other words is subjective. Additionally, I wondered if larger problems (ex. world peace) could be perpetuated by us only trying to solve highly elastic problems. For example, would increasing world kindness include being kind to individuals who hurt other people/communities? If so, wouldn’t that in turn cause some pain?
The element from the reading that stood out to me the most was the principle of Coherent Extrapolated Volition. I was very happy and impressed to see the formulation, as expressed by Yudkowsky, in an attempt to solve the problems of malignant instantiations of value functions.
If we think about it, in order for the superintelligence to try and optimize our CEV, then the AI would need to simulate a smarter, better, nicer, version of mankind that is close enough to ours that we would be able to understand its motives, reasoning, and actions. Which brings up the interesting point that a good enough simulation of this alternate mankind must arrive even before the AI is able to evaluate the results of its (maybe potential) actions.
Yudkowsky’s postulation excels at taking a value function and making it recursively self-referential between mankind (or whoever is defined as ‘we’) and the agent. At each time step, it is the burden of our agent to explain why it decided to do what it did, which in turn changes both the state of mankind and the state of the agent.
I personally think that a formulation that self-contained is beautiful, since it forces our superintelligent agent into understanding our condition as its first step. However, if the data fed into our agent comes from biased sources (‘we’ is a group that is very much against all possible versions of ‘them’ and it is a subset of humanity), then we run the risk of optimizing for the CEV of evil people.
I honestly wish I had an answer as to how to solve this problem without falling in the paradox explained in page 239. Moreover, I wish I could confidently say that our wishes and ideals could be appropriately extrapolated from the evidence we leave in our world. At the end, CEV depends on the fact that our best selves can be extrapolated from our evidence. I really really hope that’s true.
Bostrom’s “common good principle” (254) is, I think, essential to promote human happiness, however happiness is defined. The principle says “Superintelligence should be developed only for the benefit of all of humanity and in the service of widely shared ethical ideals” (254). It seems obvious at first, but the phrase “all of humanity” is radically inclusive and rules out rent-seeking and all other abuses of power. The key to developing a superintelligence (whether it be multipolar or a singleton) that works for the common good is to ensure that, at each stage of development, as many human interests as possible are represented in the decision-making process. This requires a massive amount of collaboration between humans, and it requires a massive reduction in competition between humans, so that our interests are aligned and we don’t have horror scenarios like “a winner-takes-all situation” (249) where “a lagging project might be tempted to launch a desperate strike against its rival” (249). The common good principle commands a cautious approach that is incompatible with competition and unilateral decision-making, which is unlikely to “[minimize] the risk of catastrophic error” (227).
Technocracy is dangerous, as we’ve seen throughout the book. In a technocracy, the technology rules and the people obey, and superintelligent technology is unpredictable. If we care about things like “music, humor, romance, art, play, dance, conversation, philosophy, literature” (175) and “justice, freedom, glory, human rights, [and] democracy” (187), and we’re not sure robots will want the same things, we have to be extremely cautious in ceding power to technology. Superintelligence development should be inclusive and democratic, and the superintelligence itself should maximize the ability of humans to retain control over our own affairs. Then we can design a superintelligent world where the people– “all of humanity”–rule and the technology obeys.
After discussing the issues with both unipolarity and multipolarity, Bostrom suggests that the more productive task is to “return to the question of how we could safely keep a single superintelligent AI” (184). Safety then seems to come down to the “value-loading problem,” which is the question of how to get a value into an AI and to make it “pursue that value as its final goal” (185). The value-loading problem is difficult because “if we delay the procedure until the agent is superintelligent, it may be able to resist our attempt to meddle with its motivation system” (185). Therefore, this problem must be confronted and dealt with. Once we figure out how to transfer human values to a digital computer, we would confront a further problem: “the problem of deciding which values to load. What… would we want a superintelligence to want? … The decision as to which value to install could then have the most far-reaching consequences” (208-209). I believe that this is ultimately the most important question that must be considered first and foremost. It seems likely that if we act soon enough, we as humans have the power to direct the destiny of AI and Superintelligence. The last section title of Chapter 15 sums it up perfectly in saying, “Will the Best in Human Nature Please Stand Up.” We have no control over who’s hands the fate of AI land in, and after seeing all the variability in the different possible outcomes and solutions to the control problem and value-loading problem, I think we are left with just hoping that the best in human nature will direct the future of AI in deciding how to load values and control the system.
Bostrom seems to take the stance that superintelligence is inevitable, and we need more research and more minds working on it to ensure that the process goes smoothly (Ch. 15), but I think the concern of the fickleness of the human race itself is something that Bostrom has not fully addressed and accounted for in this book.
For example, Bostrom talks about (p. 215) how CEV could result in a dynamic process, where we aren’t necessarily “micromanaged”. I can perhaps see how people with differing views can agree to go ahead with CEV, after believing that their own views are the correct one that will arise; however, given the range of opinions in America alone about gun control, abortion, religion, etc., it’s difficult for me to see how the end result could be enforced while still allowing people to be “in charge of our own destinies”.
If CEV determines that X (ex: guns) should be outlawed, the faction of people who feel their rights are being impinged upon would surely protest, say the process was somehow rigged or unfair. Given the diversity of opinion in the world, it seems that “unfairness” will inevitably result and could very well lead to mayhem if enforced.
Ultimately, we come to a tricky crossroads: balancing our rights to certain freedoms and the quality of people’s lives as whole. But since the superintelligence knows much more than us and the “clearest” wish was this, shouldn’t we enforce these determined ideas to optimize our world? How can we even evaluate if the superintelligence is “right” in certain regard when we’re biased towards these notions of freedom/rights and have such little knowledge in philosophy?
One of the arguments of Bostrom that I found most interesting from these final chapters is how even some universal, enforceable decision against pursuing AI research could be catastrophic if humans were able to develop another technology (i.e. nanotechnology, more advanced particle physics, etc.) that subsequently spiraled out of our control (231). Interestingly it appears that, even if we can never be certain how well we can control some superintelligent mind, we might be able to become confident that we will be more efficient at controlling it than other technological advancements that might become possible in the near future. While this does seem to presuppose that many current areas of scientific research are bound to destroy us, I think it’s fascinating that creating computational intelligence seems like a scarier proposition than pursing many of these technologies that may have at least as much potential to cause the world harm.
I wonder, then, how much our fear of superintelligence comes from our concern of building something that may operate too much like ourselves. We do not particularly understand how we came into being, so the idea of us trying to build something that replicates our capabilities seems more likely to end in failure than projects that have more to do with particular disciplines and less with fundamental questions that concern all of us. Bostrom certainly has convinced me that there are many potentially harmful consequences of superintelligence, but it’s difficult for me to determine whether these arguments are genuinely extremely pressing or whether they are just easier to relate to than concerns about other technologies. To me, this may be the most interesting question to consider.
In previous posts, I questioned whether super-intelligence should be taken seriously or if it were such a bad thing. However, once accepting the premises that Bostrom sets out for super-intelligence, his arguments make a lot of sense to me. Towards the end of the book, Bostrom delves deeper into the philosophical arguments surrounding how we could try and control super-intelligence. I was particularly excited by the game theoretic idea that if there were multiple teams working to achieve super-intelligence, they would rather spend money on trying to be the first to make a working AI rather than ensuring there were safety checks in place. This makes me wonder if going forward, there should be government regulation of AI research in the same way there is regulation of people who want to do research into defense or rocket systems.
I also feel that a lot more time could have been spent talking about Whole Brain emulation and how that interlinks with AI. After all, if WBE is easier to achieve and more easy to comprehend, we should pay attention to the risks surrounding this method, and how a WBE connected to the internet could change the way it behaves to something more in line with the AI discussed throughout the book. If WBE is really a stepping stone towards AI, we need to spend more time discussing how we might regulate it or ensure that it doesn’t advance inequality e.g. billionaires can ensure immortality by having their brain emulated or transferred into a computer system. I think WBE that leads to human synthesis into computers would result in a less scary outcome that some of the things Bostrom has predicted, but leaves us with the question as to how would we be different if our brains were significantly faster and had access to infinite information?
While the preceding 14 chapters of Bostom’s “Superintelligence” do more to point out concerns about the impending coming of artificial intelligence with superior capacity for thought than our own, Bostrom concludes his examination by examining “what is to be done”.
One of the heuristics Bostrom proposes to “limit the risk of doing something actively harmful or morally wrong” is to “prefer to work on problems that seen robustly positive-value”. Bostrom justifies this risk mitigation strategy by explaining that it’s often impossible to tell whether a certain action is negative-value or positive-value, and that it’s far better to err on the side of caution when dealing with such a potentially volatile issue. Bostrom’s conservatism, however, seems misplaced, for at least two reasons.
A large amount of technical innovation comes from the military sphere. Superintelligence clearly has huge military potential, as discussed to some extent in chapter four. Much like very much ‘negative-value’ research into the hydrogen bomb lead to groundbreaking leaps in nuclear power generation, military advances in the development of superintelligence are both inevitable and should be emphasized — and can and will lead to the coming of unintended positive consequences.
Military aside – considering the imminent nature of the coming of superintelligence, it would obvious to have research in this field conducted by known good actors. If, as Bostrom painstakingly explains, superintelligence is so dangerous in the hands of ‘the good guys’, it seems imperative that research into the potential ‘negative-value’ aspects of superintelligence be done by known good actors so rules, regulations and standards can be put in place that protect the population from the potential negative externalities described at length in the preceding chapters.
By the conclusion of this book I’ve been convinced that an unrestricted, singleton superintelligence has the potential to pose a significant risk to humanity. On the other hand I remain unconvinced that any group working towards the creation of one will be short-sighted enough to not incorporate barriers to the creation of a malevolent superintelligence.
Bostrom presents what he considers the dangerous game theory scenario of AI development where multiple organizations are so caught up in the profit motive of being the first to develop a superintelligence that no group takes the requisite time to make the end product safe. This didn’t really make much sense to me, and ties into the broader problem with a lot of his negative forecasting about AI development which is that it could be extremely easy to make sure that an AI doesn’t have the capabilities to wreak destruction. Why, for instance, couldn’t a prototypical human-level or superintelligence be “confined” to a set of hardware by ensuring that there was no external connection from the hardware to the external world? Would there be no way to create a hardware/software system that can create only pull requests from the Internet while forbidding push requests so that the AI could learn and grow without having the potential to affect the external world?
This train of thought is easily set aside by the reality that secret/dangerous software is made inadvertently public all the time (see Stuxnet, for instance) and from that perspective Bostrom seems correct to worry about how to implement values in an intelligent system. His return to considering whole brain emulation as a prototype towards a true AI felt undeveloped. In particular I felt as if emulation was almost a more dangerous strategy than a true AI – an emulation, for example, is almost guaranteed to feel rage, hate, prejudice, etc. Perhaps, though, it serves as a preliminary first point of exploration, given that we are able to select a cooperative and benevolent brain to emulate.
Why could the future not be like the Culture?
Please log in using one of these methods to post your comment:
You are commenting using your WordPress.com account. ( Log Out / Change )
You are commenting using your Twitter account. ( Log Out / Change )
You are commenting using your Facebook account. ( Log Out / Change )
You are commenting using your Google+ account. ( Log Out / Change )
Connecting to %s
Notify me of new comments via email.
Notify me of new posts via email.