It's Monday, 4 AM, and singularitarianism is asleep. The SL4 archive doesn't show a message for the last 7 days, which I don't believe, since they had an all-time high of 650 messages last month. The AGIRI mailing list archive ends with a "MindFORTH" message by A.T. Murray in February, acceleratingfuture gives a 404, and the SIAI blog has 4 (in words: four) entries so far this year. Meanwhile, Eliezer is blogging  on the questions whether lookup tables have consciousness (Footnote: To me, a static, two-dimensional spatial pattern is a dynamic, one-dimensional spatiotemporal pattern (=Turing machine tape) with the temporal axis rotated into the spatial dimension.   So what's the difference?) Nothing much from Peter de Blanc, Nick Hay, Shane Legg, or Michael Wilson, either. (But I like your new wordpress template, Shane.) All this doesn't exactly bolster my hopes for the Friendly AI problem being solved in the near future. Well, there was a message on SL4 last month titled Friendliness SOLVED!, but something kept me from reading it. Maybe it was the boldface, maybe the exclamation mark.
Besides, the website of the publishing company where I'm supposed to submit my manuscript has apparently gone defunct over the weekend, or so it seems after half an hour of re-submitting, and it's still dark outside, and it rains, and I had my coffee already, so I can't go back to sleep,  so I say hey, why not write a bit on Friendliness.
Eliezer once formulated the challenge of bringing AIXI to maximize the number of iron atoms in the universe. (Why iron ?) AIXI is an example of a reinforcement-learning based agent architecture, meaning the agent gets a cookie whenever he behaves in way we think is fruitful. It's generally impossible to make such agents do something more difficult than coaxing the reinforcer (us) into handing out cookies by whatever means possible - imagine, for illustration, you're on a deserted island, with a Gorilla and a jar full of cookies. Current reinforcement learners are far too stupid to push us around, but this is not the case for the hypothetical infinitely-powerful AIXI. And maximizing the number of iron atoms is probably much more difficult than, say, secretely putting all humans into a VR-Matrix where things look like as if the number of iron atoms has been maximized. (Or, less elegantly, putting a gun at our head.) On the other hand, the iron-problem is at least an (arbitrarily) specified problem, whereas the more important problem of building a Friendly AI is not even clearly defined. (We don't know what we really want.) So the iron problem can serve as a little finger exercise to warm up for the real challenge.
One way to make a reinforcement learner more controllable is to internalize the reward structure via a goal function. A goal function is a function that takes a description of the world and computes how "similiar" it is to an arbitrary "goal" state, basically, just how good a certain world is. Instead of maximizing the number of cookies, the agent tries to maximize the goal function. AIXI could be modified to incorporate such a goal function.
The challenge here, however, is to explicitely define a goal function that says "Maximize the number of iron atoms". To formulate such a function, we might have to define what an iron atom is, and that definition might, in fact, turn out to be flawed, just as many earlier physical concepts have turned out to be flawed. It's like trying to get an agent to extinguish fire in terms of phlogiston. The agent, if smart enough, may decide there isn't something like phlogiston IRL and therefore he can't, and shouldn't, do anything about that blazing orphanage over there.
So you cannot straightforwardly write down a few pages of axioms describing a ca. 1870 system of atomist physics and then go on to define the number of iron atoms to be maximized. Neither can you go "all the way" and formulate an axiomatic system based on our contemporary understanding of multi-particle wavefunctions, since this a) will make it very difficut to specify what an "iron atom" is in this axiomatic, in fact, only slightly less difficult than specifying what a "Rolex" is in term of iron atoms, and b) our contemporary understanding will, in the long term, turn out to be just as flawed as earlier systems.
This doesn't mean that maximizing the number of iron atoms is impossible, or nonsensical, like computing the last digit of pi. Iron atoms, like porn, do exist, even if we can't give a rock-solid definition. Unfortunately, telling AIXI to maximize that you know, little thingies, will not work, since for to understand that command, AIXI would not only have to have a good understanding of the human mind, but also a goal function that says: "Do what humans want you to do." Now go ahead and define human and want. There's a hole in my bucket...
Nevertheless, this points us already in the right direction. We again write down our atomistic system of physics, and the goal Maximize the number of iron atoms! , but we quote that. Then we go on and define the following goal function: "maximize the goal function of the agent who would say such a thing (quote), that is, who would give this text and this goal function to an AIXI." Specifying what an agent, a goal function, and AIXI is is not all too difficult. Now, in order to maximize this goal function, AIXI will have to speculate about the goal function of agents believing in atomistic systems of physics, and saying they want to maximize "iron atoms". What makes them tick ? What kind of people are they? What experiments might they have conducted, and what reasoning processes might they have employed to arrive at their worldview? The answer could range from a downfallen civilization of robot creatures who need iron for reproduction to something as outrageous as us humans today. What's common to all these people is their somewhat poorly articulated desire to maximize the number of that little metal thingies.
Note that this is by no means the only information about the universe the AIXI has access to. Being smarter, and presumably more powerful than we are, AIXI will quickly discover the "real" laws of physics governing the universe, as well as insights about the nature and plausibility of various agent structures. This general level of world-understanding is absolutely necessary to conduct the above speculation. For example, the text quoted in the goal function could have been produced by people who want to minimize the number of iron atoms in the universe, but are so neurotic they always ask for the opposite of what they really want. That this is not impossible, but relatively implausible with respect to the more straightforward interpretation, can only be seen with some level of insight about the general way the world works.
My current best shot at making AIXI generally Friendly goes vaguely in the same direction. Instead of an atomistic system one could imagine using the totality of human cultural artefacts, (starting with the internet?) and instruct AIXI to reason about the motivations of the agents who created such things. ("First result: They crave pr0n." OK, start with something else than the internet.) One of the open questions here is whether we want AIXI to care about hypothetical creators of that artefacts (subjunctive humans) too, or just that very people who actually created that stuff. My current guess is the first.
Eliezer once formulated the challenge of bringing AIXI to maximize the number of iron atoms in the universe. (Why iron ?) AIXI is an example of a reinforcement-learning based agent architecture, meaning the agent gets a cookie whenever he behaves in way we think is fruitful. It's generally impossible to make such agents do something more difficult than coaxing the reinforcer (us) into handing out cookies by whatever means possible - imagine, for illustration, you're on a deserted island, with a Gorilla and a jar full of cookies. Current reinforcement learners are far too stupid to push us around, but this is not the case for the hypothetical infinitely-powerful AIXI. And maximizing the number of iron atoms is probably much more difficult than, say, secretely putting all humans into a VR-Matrix where things look like as if the number of iron atoms has been maximized. (Or, less elegantly, putting a gun at our head.) On the other hand, the iron-problem is at least an (arbitrarily) specified problem, whereas the more important problem of building a Friendly AI is not even clearly defined. (We don't know what we really want.) So the iron problem can serve as a little finger exercise to warm up for the real challenge.
One way to make a reinforcement learner more controllable is to internalize the reward structure via a goal function. A goal function is a function that takes a description of the world and computes how "similiar" it is to an arbitrary "goal" state, basically, just how good a certain world is. Instead of maximizing the number of cookies, the agent tries to maximize the goal function. AIXI could be modified to incorporate such a goal function.
The challenge here, however, is to explicitely define a goal function that says "Maximize the number of iron atoms". To formulate such a function, we might have to define what an iron atom is, and that definition might, in fact, turn out to be flawed, just as many earlier physical concepts have turned out to be flawed. It's like trying to get an agent to extinguish fire in terms of phlogiston. The agent, if smart enough, may decide there isn't something like phlogiston IRL and therefore he can't, and shouldn't, do anything about that blazing orphanage over there.
So you cannot straightforwardly write down a few pages of axioms describing a ca. 1870 system of atomist physics and then go on to define the number of iron atoms to be maximized. Neither can you go "all the way" and formulate an axiomatic system based on our contemporary understanding of multi-particle wavefunctions, since this a) will make it very difficut to specify what an "iron atom" is in this axiomatic, in fact, only slightly less difficult than specifying what a "Rolex" is in term of iron atoms, and b) our contemporary understanding will, in the long term, turn out to be just as flawed as earlier systems.
This doesn't mean that maximizing the number of iron atoms is impossible, or nonsensical, like computing the last digit of pi. Iron atoms, like porn, do exist, even if we can't give a rock-solid definition. Unfortunately, telling AIXI to maximize that you know, little thingies, will not work, since for to understand that command, AIXI would not only have to have a good understanding of the human mind, but also a goal function that says: "Do what humans want you to do." Now go ahead and define human and want. There's a hole in my bucket...
Nevertheless, this points us already in the right direction. We again write down our atomistic system of physics, and the goal Maximize the number of iron atoms! , but we quote that. Then we go on and define the following goal function: "maximize the goal function of the agent who would say such a thing (quote), that is, who would give this text and this goal function to an AIXI." Specifying what an agent, a goal function, and AIXI is is not all too difficult. Now, in order to maximize this goal function, AIXI will have to speculate about the goal function of agents believing in atomistic systems of physics, and saying they want to maximize "iron atoms". What makes them tick ? What kind of people are they? What experiments might they have conducted, and what reasoning processes might they have employed to arrive at their worldview? The answer could range from a downfallen civilization of robot creatures who need iron for reproduction to something as outrageous as us humans today. What's common to all these people is their somewhat poorly articulated desire to maximize the number of that little metal thingies.
Note that this is by no means the only information about the universe the AIXI has access to. Being smarter, and presumably more powerful than we are, AIXI will quickly discover the "real" laws of physics governing the universe, as well as insights about the nature and plausibility of various agent structures. This general level of world-understanding is absolutely necessary to conduct the above speculation. For example, the text quoted in the goal function could have been produced by people who want to minimize the number of iron atoms in the universe, but are so neurotic they always ask for the opposite of what they really want. That this is not impossible, but relatively implausible with respect to the more straightforward interpretation, can only be seen with some level of insight about the general way the world works.
My current best shot at making AIXI generally Friendly goes vaguely in the same direction. Instead of an atomistic system one could imagine using the totality of human cultural artefacts, (starting with the internet?) and instruct AIXI to reason about the motivations of the agents who created such things. ("First result: They crave pr0n." OK, start with something else than the internet.) One of the open questions here is whether we want AIXI to care about hypothetical creators of that artefacts (subjunctive humans) too, or just that very people who actually created that stuff. My current guess is the first.
