Jul 31, 2008

The Colors Of My Digits

For as long as I can remember, I perceive digits as having their own colors:

1 2 3 4 5 6 7 8 9

Zero is glassy-transparent, like acrylic, so actually it's more like digits having textures. When I see digits written down somewhere, they do not appear to be vividly colored, just subtly shaded. But when visualizing numbers, I find it hard not to perceive the individual digits being colored in the above way.

I doubt that this rudimentary form of synesthesia is beneficial to my ability to deal with numbers. Many of the colors are quite similiar to each other, so I tend to misremember phonenumbers, sums or dates in a specific way.

Jul 10, 2008

Disclaimer: Universe is NOT simple

"Why is the Universe so simple ?" asks the mathematician, or more generally, why is simple mathematics (school mathematics) so successful at describing the Universe ?

The Universe, however, is generally not simple to begin with. Rather there are some aspects of the Universe (which we happen to be interested in) that can be computed easily. Put one sheep next to one sheep and you get two sheep (in the short term); so "putting next to each other" is isomorphic to a simple "+" operator. But what about the eddies and whorls in a ravine ? Cloud patterns ? And I haven't even begun to ask *creative* questions here.

An arbitrary, low-Kolmogorov-complexity aspect of the Universe is very difficult to compute. We as a species, shaped by evolution, happen to be interested in many simple-to-compute aspects.

The question should rather be phrased: Why does the Universe have any simple-to-compute aspects at all ?

Jun 8, 2008

The Great Goodbye, Everett style.

In a post-singularity future, people may, with the help of superintelligent AI, have almost arbitrary levels of control over their environment and their own mental and physical constitution. This near-omnipotence, however, will presumably not extend to other people's mind and body. (Argument from symmetry, though a game-theoretically stable society model where each participent has unrestricted control over everyone else seems at least remotely conceivable. We'll leave that aside for later speculations.)

I think it's plausible to assume that those post-singularity people can be modeled as agents trying to maximize (minimize) their respective goal functions on the universe. Given their, in principle, almost infinite capability to maximize those functions, the biggest factor holding back individual agents may turn out to be other, similarly powerful agents with incompatible goal functions. Since we're talking about an agent model that clearly separates preferences from beliefs, Aumann's results don't provide a safety hatch here. Clearly, the agents can compromise, and arguments from symmetry again prevail, but this may, in the face of the otherwise immense capabilities of the agents, result in huge discounts from the theoretically achievable level of goal-function fulfillment.

That is, the posthumans may get in each other's way, and there's now way to rationally resolve the situation without massively stomping on some (or all) people's goal functions.

How likely is it that people's preferences may intrinsically differ after the technological singularity ? If those people have evolved through self- (or mutual) modification from humans, or have otherwise inherited, possibly through deliberate design, human values and tastes, then I'd regard this to be very likely indeed. I may be pessimistic here, but my personal lifelong experience is that people have, in parts radically, different values attached to certain aspects of the world, themselves, and other people, and no amount of rational insight is ever going to make those values compatible. So I think the problem I'm discussing here is real and realistic.

Emigration may be a solution, and is a cherished human tradition that may extend into a post-singularity future. Of course, people's value function will often put strong emphasis on the presence of (certain) other people, so walking away will in many cases be worse than gritting your teeth and getting on with each other. But in some cases, getting out of each other's hair may be the optimal thing to do.

Now posthumans surely have some radical opportunities to venture out into unexplored territory, and the silentium universii may mean that there's a lot of place to settle down. Starw(h)isps traveling at a notch below light speed can carry virtualized passengers for billions of parsecs within a short subjective time. But even this may not be far enough, as those other annoying posthumans with incompatible value systems will presumably have access to the same means of expansion and may be determined to use them, if not now, than maybe later in the future. For their destinies to separate, the opposing parties will have to make their future light cones disjunct. Cosmic acceleration from dark energy may make this possible simply by traveling far enough fast enough, but has at least two disadvantages: It creates an asymmetry between those deciding to move away and those that "inherit the earth", and it may be impractical for posthumans to wait long enough for inflation to catch on - given post-singularity computing capacities, and a foreseeable tendency to virtualize your supporting hardware, even a nanosecond wait in objective time may be unbearable on a subjective scale.

As you may have guessed by now from the title of this post, there's probably another, much simpler way for posthumans to part ways. This method depends on the assumed validity of the so-called many-worlds interpretation of the superposition principle in of quantum mechanics. As a note of caution, however, I'd like to point out that the superposition principle relies on the linearity of quantum mechanics, which may turn out to be false, since general relativity is non-linear. (That is, a linear combination of two solutions describing world-states is not necessarily a valid solution itself.) The basic idea is for all parties to condition their further existence on the output of a quantum random number generator. By accepting to inhabit only mutually exclusive subsets of possible worlds, all participants can have symmetric access to a constrained resource (e.g., they can all "inherit the earth" in their Everett branch.) The superposition principle also assures that their fates are separated once and forever, without the danger of any one party deciding to overturn the deal at a later time point. Furthermore, this approach can be implemented on a very short timescale.

As I believe in the mutual incompatibility of many, if not most, human tastes, values, and likings, as well as in the stability of those tastes, values and likings under reflection, I believe posthumans will use one method or another to eventually part ways. (The fact that I spend some time thinking on such problems shows that I believe I would do so, doesn't it ?) Everett emigration seems to be a rather straightforward way to achieve that. We do not, however, currently understand quantum mechanics, general relativity, and the superposition principle well enough to literally bet our lives on it. (Otherwise, we could already choose to implement it using current technology, that is, a quantum random number generator and some hydrogen bombs ...)

Could this be an explanation of the Fermi paradox ? If technological civilizations reliably undergo technological singularities, and post-singularity societies tend to "atomize" themselves, universes may in fact on average be relatively quiet places. But I don't really hold this argument to be valid, as even isolated posthumans may be very noisy. Furthermore, I think the "Everett barrier" is in fact not that impermeable in the presence of a sufficiently powerful AI, so transhumans with compatible tastes might join each other, even if they originated in different Everett branches - but that's some stuff to discuss in a follow-up to this post.

Jun 6, 2008

Reconstructing the Dow

Recently I had to reconstruct the Dow Jones Industrial Index for backtesting purposes. This turned out to be more painful than anticipated. In case you need to do this, I recommend you start out with this document detailing the historical composition of the DJIA. From this, create a .txt file containing dates and types of change over the relevant time interval. Write some code to read this into your preferred programming environment (MatLab in my case) and create a data structure containing the composition of the Dow at any given time point (daily closings, in my case). Then look up as many ticker symbols as possible at Yahoo finance and the Dow's wikipedia entry. For the rest, I googled, though there's probably some sort of central list of tickers maintained somewhere. I'll list below what I could find for the years between 1990 and 2008. Note that many of the tocker symbols today denote different companies.

3M Company
MMM
AT&T Corporation
T
AT&T Incorporated
T
Alcoa Incorporated
AA
Allied-Signal Incorporated
ALD (ALD today stands for Allied Capital Corporation) ALD merged with Honeywell
AlliedSignal Incorporated
ALD (again, today Allied Capital Corporation)
Altria Group Incorporated
MO
Altria Group, Incorporated
MO
Aluminum Company of America
AA
American Express Company
AXP
American International Group Inc.
AIG
American Tel. & Tel.
T
Bank of America Corporation
BAC
Bethlehem Steel
BS (Delisted)
Boeing Company
BA
Caterpillar Incorporated
CAT
Chevron
CVX
Chevron Corporation
CVX
Citigroup Incorporated
C
Coca-Cola Company
KO
Du Pont
DD
DuPont
DD
Dupont
DD
Eastman Kodak Company
EK
Exxon Corporation
XOM
Exxon Mobil Company
XOM
Exxon Mobil Corporation
XOM
General Electric Company
GE
General Motors Corporation
GM
Goodyear
GT
Hewlett-Packard Company
HPQ
Home Depot Incorporated
HD
Honeywell International
HON
Honeywell International Inc.
HON
Intel Corporation
INTC
International Business Machines
IBM
International Paper Company
IP
J.P. Morgan & Company
JPM
J.P. Morgan Chase
JPM
J.P. Morgan Chase & Company
JPM
Johnson & Johnson
JNJ
McDonald’s Corporation
MCD
Merck & Company, Inc.
MRK
Merck & Company, Incorporated
MRK
Microsoft Corporation
MSFT
Minnesota Mining & Mfg
MMM
Navistar International Corp.
NAVZ.PK (Only on Pink Sheets, delisted from NYSE in 2006)
Pfizer Incorporated
PFE
Philip Morris Companies Inc.
PM
Phizer Incorporated
PFE
Primerica Corporation
??? I have no idea.
Procter & Gamble Company
PG
SBC Communications Incorporated
SBC (delisted after at&t fusion)
Sears Roebuck & Company
S (S now stands for Sprint)
Texaco Incorporated
TX (now stands for ternium)
Travelers Group
TRV (now stands for Travelers Company; unrelated company ! )
USX Corporation
X
Union Carbide
UK (delisted)
United Technologies Corporation
UTX
Verizon Communications Inc.
VZ
Wal-Mart Stores Incorporated
WMT
Walt Disney Company
DIS
Westinghouse Electric
WX (Now stands for Wuxi pharma)
Woolworth
WOW (probably)

Next, link the company names to theire respective ticker symbols, and download stock quotes for all the tickers/date combinations. In MatLab, this is most conveniently done using this routine by Marcelo Scherer Perlin, which acesses free Yahoo datasets. For the delisted titles, or intra-day data, you'll have to resort to proprietary datasets. Opentick may be a good free alternative, but I haven't got around to look at it more closely.

Finally, you'd have to reconstruct the index from the individual quotes. Here's an explanation how the DJIA is calculated. You'll notice you need to know historical values for the so called Dow divisor which, as far as I know, are impossible to obtain in electronic format with reasonable effort. Fortunately, you can backward -compute them from any given single value by assuming that splits, dividends, and changes in the DJIA composition should not have an effect on the index value. This is admittedly somewhat pointless, as historical index data can be readily obtained, but it can serve as sort of a check-sum for the individual quotes you have.


Jun 4, 2008

BoCon Reaches 1000

Three cheers for Matthew Skala of Bonobo Conspiracy: BoCon today passed the 1000 strip mark. Amazingly, Matt managed to post a strip each and every single day during the last three years, while working on his PhD in computer science. (He defended, successfully, a few days ago, nice timing.)

May 25, 2008

Who's the Ezra Gurney in Cowboy Bebop ?

In case you hadn't noticed, the characters in Cowboy Bebop map nicely onto those in Captain Future, though it's not a bijective mapping. (And by mapping I don't mean the characters are similiar; I mean they're analogous.)

The Captain <-> Spike Spiegel
Greg, Otho <-> Jet Black
Professor Simon (Ken Scott?) <-> Edward
Ul Quorn <-> Vicious
Joan Randall <-> Faye Valentine
Yiek, Oak <-> Ein

Which leaves out Ezra Gurney, a fairly major character. The best I can come up with is Alfredo ("Punch") from Big Shot. He dons a moustache, he's getting quite some screen time, and he fills the crew in on the baddies.

May 16, 2008

Radical Luddism (or maybe not).

Yesterday, thirst-stricken in front of an organic food store, I bought a bottle of Lauretana mountain spring water, which brags, among other things, about being bottled using only natural gravity, without any pressure. If taken as a rejection of pumping technology, developed about two-and-a-half millenia ago, this is pretty radical even by most luddist standards; if, however, this is intended merely as a criticism of artificial gravity, this is rather conservative.

I'm tempted to take Lauretana's logic one step further and just leave some empty bottles outside to be rained into, which is probably as low-tech as you can possibly go.

May 10, 2008

Web 2.0 Company Name Magnetic Poetry

I just spent three weeks in the Bay Area.

This hot new startup is called
which is for .

Alternatively, you can also pick basically any word from one of the Dravidian languages.

Or you can just grow a handlebar moustache, wear bell-bottoms and hang a sign around your neck that says Style is timeless.

Apr 6, 2008

A Strategy for Maximization of Global Iron Production employing Universal Artificial Intelligence.

It's Monday, 4 AM, and singularitarianism is asleep. The SL4 archive doesn't show a message for the last 7 days, which I don't believe, since they had an all-time high of 650 messages last month. The AGIRI mailing list archive ends with a "MindFORTH" message by A.T. Murray in February, acceleratingfuture gives a 404, and the SIAI blog has 4 (in words: four) entries so far this year. Meanwhile, Eliezer is blogging on the questions whether lookup tables have consciousness (Footnote: To me, a static, two-dimensional spatial pattern is a dynamic, one-dimensional spatiotemporal pattern (=Turing machine tape) with the temporal axis rotated into the spatial dimension. So what's the difference?) Nothing much from Peter de Blanc, Nick Hay, Shane Legg, or Michael Wilson, either. (But I like your new wordpress template, Shane.) All this doesn't exactly bolster my hopes for the Friendly AI problem being solved in the near future. Well, there was a message on SL4 last month titled Friendliness SOLVED!, but something kept me from reading it. Maybe it was the boldface, maybe the exclamation mark.

Besides, the website of the publishing company where I'm supposed to submit my manuscript has apparently gone defunct over the weekend, or so it seems after half an hour of re-submitting, and it's still dark outside, and it rains, and I had my coffee already, so I can't go back to sleep, so I say hey, why not write a bit on Friendliness.

Eliezer once formulated the challenge of bringing AIXI to maximize the number of iron atoms in the universe. (Why iron ?) AIXI is an example of a reinforcement-learning based agent architecture, meaning the agent gets a cookie whenever he behaves in way we think is fruitful. It's generally impossible to make such agents do something more difficult than coaxing the reinforcer (us) into handing out cookies by whatever means possible - imagine, for illustration, you're on a deserted island, with a Gorilla and a jar full of cookies. Current reinforcement learners are far too stupid to push us around, but this is not the case for the hypothetical infinitely-powerful AIXI. And maximizing the number of iron atoms is probably much more difficult than, say, secretely putting all humans into a VR-Matrix where things look like as if the number of iron atoms has been maximized. (Or, less elegantly, putting a gun at our head.) On the other hand, the iron-problem is at least an (arbitrarily) specified problem, whereas the more important problem of building a Friendly AI is not even clearly defined. (We don't know what we really want.) So the iron problem can serve as a little finger exercise to warm up for the real challenge.

One way to make a reinforcement learner more controllable is to internalize the reward structure via a goal function. A goal function is a function that takes a description of the world and computes how "similiar" it is to an arbitrary "goal" state, basically, just how good a certain world is. Instead of maximizing the number of cookies, the agent tries to maximize the goal function. AIXI could be modified to incorporate such a goal function.

The challenge here, however, is to explicitely define a goal function that says "Maximize the number of iron atoms". To formulate such a function, we might have to define what an iron atom is, and that definition might, in fact, turn out to be flawed, just as many earlier physical concepts have turned out to be flawed. It's like trying to get an agent to extinguish fire in terms of phlogiston. The agent, if smart enough, may decide there isn't something like phlogiston IRL and therefore he can't, and shouldn't, do anything about that blazing orphanage over there.

So you cannot straightforwardly write down a few pages of axioms describing a ca. 1870 system of atomist physics and then go on to define the number of iron atoms to be maximized. Neither can you go "all the way" and formulate an axiomatic system based on our contemporary understanding of multi-particle wavefunctions, since this a) will make it very difficut to specify what an "iron atom" is in this axiomatic, in fact, only slightly less difficult than specifying what a "Rolex" is in term of iron atoms, and b) our contemporary understanding will, in the long term, turn out to be just as flawed as earlier systems.

This doesn't mean that maximizing the number of iron atoms is impossible, or nonsensical, like computing the last digit of pi. Iron atoms, like porn, do exist, even if we can't give a rock-solid definition. Unfortunately, telling AIXI to maximize that you know, little thingies, will not work, since for to understand that command, AIXI would not only have to have a good understanding of the human mind, but also a goal function that says: "Do what humans want you to do." Now go ahead and define human and want. There's a hole in my bucket...

Nevertheless, this points us already in the right direction. We again write down our atomistic system of physics, and the goal Maximize the number of iron atoms! , but we quote that. Then we go on and define the following goal function: "maximize the goal function of the agent who would say such a thing (quote), that is, who would give this text and this goal function to an AIXI." Specifying what an agent, a goal function, and AIXI is is not all too difficult. Now, in order to maximize this goal function, AIXI will have to speculate about the goal function of agents believing in atomistic systems of physics, and saying they want to maximize "iron atoms". What makes them tick ? What kind of people are they? What experiments might they have conducted, and what reasoning processes might they have employed to arrive at their worldview? The answer could range from a downfallen civilization of robot creatures who need iron for reproduction to something as outrageous as us humans today. What's common to all these people is their somewhat poorly articulated desire to maximize the number of that little metal thingies.

Note that this is by no means the only information about the universe the AIXI has access to. Being smarter, and presumably more powerful than we are, AIXI will quickly discover the "real" laws of physics governing the universe, as well as insights about the nature and plausibility of various agent structures. This general level of world-understanding is absolutely necessary to conduct the above speculation. For example, the text quoted in the goal function could have been produced by people who want to minimize the number of iron atoms in the universe, but are so neurotic they always ask for the opposite of what they really want. That this is not impossible, but relatively implausible with respect to the more straightforward interpretation, can only be seen with some level of insight about the general way the world works.

My current best shot at making AIXI generally Friendly goes vaguely in the same direction. Instead of an atomistic system one could imagine using the totality of human cultural artefacts, (starting with the internet?) and instruct AIXI to reason about the motivations of the agents who created such things. ("First result: They crave pr0n." OK, start with something else than the internet.) One of the open questions here is whether we want AIXI to care about hypothetical creators of that artefacts (subjunctive humans) too, or just that very people who actually created that stuff. My current guess is the first.

Mar 18, 2008

Die Kunst Des Verhörens


A friend once remarked that the French speak French, and in what a French kind of way they do that! I guess he'd say something similiar about the English, as would most Austrians. Consequently, the art of mishearing foreign words is widely practiced, and not constrained to song lyrics. (Know what Austrians mean when they speak of golden-red rivers ? Think "woof".)
So today I was asking a girl at the newsstand whether they have the Economist. She didn't know, tried to ask her coworker and, well, you can can guess the rest...I had to pretend to fall into a coughing fit and thanked them with wave.

Mar 9, 2008

ExaFLOPS in 2012

Sandia and Oak Ridge recently received a 7.4 M$ grant to "conduct the basic research required to create a computer capable of performing a million trillion calculations per second, otherwise known as an exaflop" (link).
"In this amazing and expanding universe !" I'm tempted to add to that millions trillions, but what I'm even more tempted to do is a back-of-the-envelope calculation of a folding@home-style distributed computing project using 8th-generation gaming consoles ("PS4s").
For a nicely parallel algorithm you can currently milk around 67 GFLOPS from a PS3 under Linux using minimal contortion. If you could access the RSX GPU (which is locked under Linux) , that figure would probably increase about fourfold.
Historically, peak console CPU+GPU computing power increased roughly 60-fold in the 4.3 years between the release of the PS1 and the PS2, and a further roughly 100-fold (the exact architecture of the RSX is unknown) in the 7.7 years to the release of the PS3. That combines to an average doubling time for peak performance of a little less than a year, somewhat faster than the 18-months doubling time for real performance commonly associated with Moore's law (which, strictly speaking, is about transistor counts per die.)
There is currently some speculation about the next generation of consoles being released a few years earlier than the 6-year-cycle we've seen so far. Let's just pull a release date of mid-2011 out of thin air, and "Moore's law" points to a tenfold increase in real computing power, which looks flimsy compared to the above figures. So if we extrapolate the past trend for peak power, and assume we can use the new architecture as efficiently as the current one, we get a more handsome 40-fold increase, which translates to roughly 10 TFLOPS per console.
So you would need 100.000 consoles running simultaneously to break the exaFLOPS barrier. That figure is somewhat smaller than the total number of folding@home clients installed as of 2008, but larger than the number of PS3 clients for that project. And this figure assumes the client is running 100% of the time, which for a gaming console is unlikely to be true. (Running a 150W console 24/7 cost you about 80$ in electricity per year, depending on where you live; other factors are noise, and computing resources used for things like gaming. ) But if an organization can find a cool project and has the necessary PR skills, it should be possible to lay hands on that many clients within one or two years after hardware release. All in all this makes it look possible to do computations at more than one exaFLOPS before the end of 2012, six years earlier than the 2018 horizon for a Sandia / Oak Ridge mainframe.

Feb 29, 2008

OK, let's please all agree that's a hoax.


According to the Telegraph, "The director of a Norwegian museum claimed yesterday to have discovered cartoons drawn by Adolf Hitler during the Second World War."

While the stereotype of the freakish doujinshi artist is well established, Hitler is admittedly something of an extreme case, well known for his obsession with high-fantasy, his spending weeks at a time in his basement, his unhealthy interest in his underaged niece, and his frequent use of hate-speech.

But of course for someone who remembers the Schtonk affair this triggers all hoax alarms. And looking at that drawing of Disney's Pinocchio I really do wish this is a hoax. Otherwise the mental associations to that ruthless, all-consuming machinery of mass manipulation will forever soil for me the picture that I have of that cute, innocent little guy Hitler.

Feb 4, 2008

Hibernation



We Humans don't hibernate, (beginning statements with "we humans" rocks, try it) but maybe we have, like some other non-hibernating mammals, some rudimentary remnants of hibernation on our body-plan. I, for my part, cannot ignore the fact that every January I sleep ten hours a day, gain weight, feel stingy, and procrastinate with all my might. 2 months and not a single posting. Time to get out of the pyjama.

Picture is BTW the chapel around the corner from my mother's house in Altmuenster, Sound-of-Music-Land, taken in late December.

Nov 23, 2007

The Trains To Orbit Leave From Platform 5



Not a road to the stars, but a railtrack to orbit. "The restaurant car is on the top of the train." How far ? Well, like riding the Transib проезда туда и обратно, twice in a row. But the view ! The view ! Sure, it's expensive to build, but that's a cherished tradition. In Austria, too. Equator ? No, no, Linz is +48° 17' 43", but that's no problem, believe me. The Van Allen belt ? I'm more afraid of railroad strikes. And once we're in GEO, you can take the connecting train to the southern hemisphere from platform 3, and go down again over Brazil. Transatlantic bridge !


(Artwork Ticket to the moon by Christoph Steinbrener & Rainer Dempf)

A Simple Heuristic Explanation of Solomonoff Induction

Solomonoff induction, named after its inventor Ray Solomonoff, is a mathematical method that describes how to take some set of observations and produce an educated guess about possible processes underlying that observations. You could then use that guess, among else, to make a prediction about your next observation. The great thing is that this guess would, in a very general way, be better than any other guess anyone could make using just the same observations. Solomonoff induction is a simple and useful, yet widely misunderstood idea. Here I'm trying to give a very short, heuristic explanation of the basics. You can also try out this slightly more challenging explanation. A follow-up post to this one will deal with possible applications of Solomonoff's results.

Something simple beforehand: A string of 5 bits can take 2 x 2 x 2 x 2 x 2 =2^5 configurations. A string of n bits can take 2^n configurations. A string of n+1 bits can take 2^(n+1) configurations, that's an additional factor of two over 2^n. So if we have one extra bit of length, we can make twice as many different strings.

Imagine the following task: You are sitting in a lab in front of a computer screen. The computer is running a simple program, but you have no idea what the program looks like. The program is printing output after output on the screen. After a while, you should give an educated guess about the unknown program's next output.

To make things a little easier, the experimentator is telling you the program is at most 1 million bits in length. This is not part of Solomonoff induction originally, but accept it for the moment.

"Well", you say, "I could look through all possible programs shorter than million bits, see if they could produce the output I've seen, and throw away all programs that don't. That is, programs that output something different, or get trapped in infinite loops, or crash, or don't even compile. Because, it is an old maxim of mine that when you have excluded the impossible, whatever remains, however improbable, must be the truth. Then I'll let this "truth" run on my own computer, and use the output to predict the next output on the screen.

That's a good idea, but what if you end up with more than one program in the end ? Well, there's no reason to think the experimentator tried to make things particularly easy or complicated, and you're not a big fan of medieval philosophy either, so you decide to split the bets evenly between all the remaining programs. If you'd end up with 5 possible candidate programs, you'd say each one has 1/5 probability of being the right program.

But it may take a while to sift through all the 2^1.000.000 possible programs. So the experimentator has mercy and gives you two sheets of paper containing the printout of two programs that do in fact produce the output you've seen. One is 1.999 bits long (it's titled SHORT), the other 2.000 bits (it's titled LONG). The experimentator also tells you that the two programs embody the only two simple approaches to produce the data you've seen, any other approach would be waaayyy more complicated. SHORT will output Cat next, long will output Fish.

You're about to say "there's a 50% chance LONG is the right program, and a 50% ..." but then you hesitate. Because you just found a simple way to create more programs that are using the same approaches as LONG and SHORT: Just insert some comments into LONG and SHORT. The comment doesn't even have to be witty, nonsense will do just fine. The output will be the same, and if the program is shorter than 1.000.000 bits, it'll be OK. These will be valid programs, and although they do the same things that LONG and SHORT do, they must be counted as individuals.

You realize you can make a lot of variations of LONG and SHORT this way. With LONG, you have 1.000.000 - 2.000 = 1.998.000 bits remaining for commentary. With SHORT, you have 1.000.000 - 1.999 = 1.998.001 bits, that's one extra bit. If you can make a Gazillion comments on LONG, this one extra bit allows you to make two Gazillion comments on SHORT, twice as many.

So within all possible programs of less than 1.000.000 bits of length there are twice as many variants of SHORT as there are of LONG. Consequently, you decide to say "I'll bet 2:1 that the program inside the computer is behaving like the program SHORT, and not like the program LONG. So it's 2:1 for Cat against Fish."

OK, that's it, in principle. Be aware that the length limit of 1.000.000 bits is imposed only for didactic reasons. The 2:1 ratio would be unchanged if we increased the limit to a Trillion bits - there's still the extra bit available in SHORT, and we can make twice as many comments. So let's ditch the limit altogether. Let's just say being one bit shorter makes a program twice as likely.

Be also aware that we have shown no preference for short programs in the beginning. We had no idea whether to expect short or long programs, so for simplicity we decided to split the probability even between all programs, irrespective of length. We just put our bets on SHORT in the end because there are more variations of SHORT than there are of long.

To rephrase it: If we had sampled random programs of less than 1.000.000 bits, and at the end of the day had ended up with twice as many programs outputting Cat than programs outputting Fish, we'd probably put our bets on Cat being the next output. But what we did was we found a very short Cat program and a slightly longer Fish program. From this we were able to deduce that there must be more Cat programs out there than Fish programs, because the shorter Cat program leaves more room for crazy comments without hitting the length limit (no matter how big the length limit really is).

So you see the basic idea is really simple - not having the slightest idea beforehand what program to expect means assigning equal probability to all candidate programs. And finding a short program means deducing that there are more variations of the short program than of any longer program - a factor of two for every extra bit - so there are more variations of the short program in our set of candidate programs, so we'll put higher bets on the short program (again, a factor of two for every extra bit.)

Nov 19, 2007

Wheelchaired Robot Girl Totally Un-Moemoe IRL.


Canadian robot enthusiast Le Trung's creation Aiko, the "world's first sexually harassed, disabled Fembot" (Engadget), once again vividly demonstrates the Grand-Canyon-like dimensions of the uncanny valley. Watch the video here. Some comments by various posters:


- "Her right hook punch looks promising." - "I, for one, welcome our wheelchair-bound, face-slapping female android overlords." - "Wow. She speaks perfect Engrish." - "OK, so I'm going to finish that underground bunker after all."

I have to admit that this makes me seriously reconsider my own robot-girlfriend project.

Hey, I'm joking.

Honestly.

And if - I'm saying if - I ever were to hypothetically build a robot girl in my basement I surely wouldn't ever sink as low as to cannibalize an Oriental Industries Candy Girl, as Le Trung apparently seems to have done. (A Nana, if you ask me; notice the slightly more protruding chin in Aiko resulting from added motorization, which is in fact difficult to do without...OK, forget what I just said.)

To make today's cup of weirdness full, I found there is also a Candy Girl available that looks bizarrely like often-spaced-out Osaka-San* from Azumanga Daioh , once again nicely illustrating MIT professor Max Tegmark's cosmological theory of radical Platonism, which states that every logically possible entity does in fact exist somewhere in the Universe, most likely in Japan.


( * = It's the 未来; I will not post a link. It's deplorable enough already that my blog is linking to Oriental Industry's main page. Look her up for yourself, if you think you're brave enough. )

Nov 9, 2007

....with science !

Comic artist Aaron Diaz of Dresden Codak fame has decided to quit his day job and work full time on his webcomic. It's nice artwork, it's high-brow, it's fun, and it's got characters you wish you could meet in real life. And DC seems to really understand the hardships of being a Singularitarian. Let's support him through purchasing stuff and through donations ! (It might even get you a place reserved in secular heaven.)

Nov 8, 2007

Science Has No Use For Ockham's Razor

entia non sunt multiplicanda praeter necessitatem.
"Please keep things simple."

(William of Ockham)
(Bertrand Russel)

manus non sunt ventilandae praeter necessitatem.
"Please keep the handwaving down."
(Me)

You know, I've had it with Ockham's razor.

My work in machine learning is more or less orbiting the Solomonoff - Chaitin - Kolmogorov - Hutter - Boulton - Wallace galaxy. This simply means I'm assuming that the data I'm analyzing is the output of a computational process, any computational process. I have no idea whatsoever as to the sourcecode of this process, so I'm trying to assign equal a priori probability to all programs. Now suppose I'm stumbling over two short programs which in fact do output my data. Both programs are 1000 bits long. Let's say the first one is a neural net, and the other's a support vector machine.
Now assume, after playing around with my first program, I'm finding out that only the first 50 bits are in fact important for producing the output. The rest is just random garbage. I could in fact try out all combinations of those remaining 950 bits and get 2^950 different neural nets that all output my data. Now I'm trying the same thing with program two. Here, only the first 49 bits matter, and I could create 2^951 variations of support vector machines, that's twice as many as in the case of program 1. Since I try to assign equal a priori probability to all programs, and possible support vector machines outnumber possible neural nets two-to-one, I'd bet two-to-one that my for the support vector machine and against the neural net.
Note that the "1000 bits" do not figure into the result, I could just have well have chosen 10.000 bits, or 10 Gazillion bits. Also, if the first program had been 723 bits instead of 1000, I could have just padded it with 277 extra garbage bits to make it as long as the second. The argument stays the same. We're cutting a few corners here, but the basic idea is that, when you have to assign probabilities to various models, you calculate the number of bits absolutely necessary to produce your models, and penalize all models but the shortest by a relative factor of 0.5 for every bit of extra length. Let me repeat it, this is just a consequence of assuming the true process that's creating your data (the "world") is a program, any program, and before having seen the data, you have no idea whatsoever which program. Simple, isn't it ?

Welcome to the world of of Solomonoff induction.

The attentive reader might have noticed the complete absence of any reference to Ockham in the above explanation. What Ockham himself really intended to say is not entirely clear, nor is it actually too clear what people today mean when they invoke his name. To repeat it once again, the reason we penalize long models, or theories, in Solomonoff induction, is because we don't know a priori which program created our observation. It's not like we have anything against long models, or that we said hey, remember Ockham! Sure, what we've ended up with seems to go along somewhat with Ockham's razor, but we notice this after we got our results. So if anything, you could try to say Solomonoff induction explains why Ockham's razor works, and not the other way round. But don't, for it doesn't.
To illustrate this think of the two hypotheses "Afro-Americans get comparatively few PhDs because of [a complicated interplay of socioeconomic factors]" and "Afro-Americans get comparatively few PhDs because of they don't have the intelligence gene X." Shooting from their hip, people would say the second hypothesis is simpler. Is it ?
How the hell should I know !! Imagine just for a moment trying to translate those two verbal statements into computer programs which produce the data in question. The data in question being human academic achievement. PhD theses. Social interactions. Application interviews. Then imagine what has to be included in the program's source code: Human genetics, human brain structure, social dynamics, macroeconomic systems...We're talking at least gigabits of data here. Trying to estimate the length of such huge programs down to a few bits is like doing a bit of lattice quantum chromodynamics in your head in order to estimate the proton mass. Humans simply can't do this. If you can, give me call. I have a job for you.
So the connection between the rigorous theory that is Solomonoff Induction, and the intuitive insight that is Ockham's razor is tentative at best. OK, nonexistent. The same goes for machine learning theories like minimum message length (MML), minimum description length (MDL), or the Akaike information criterion (AIC), which can all be shown to be approximations of Solomonoff induction.
Then why do so many people, even those working in the very field, handwavingly invoke Ockham as the forefather of their discipline ?

Ockham’s Razor has long been known as a philosophical paradigm, and in recent times, has become an. invaluable tool of the machine learning community. (link)

Algorithmic probability [comment: the theory behind Solomonoff induction] rests upon two philosophical principles...[]...The second philosophical foundation is the principle of Occam's razor. (link).

Or this, that, and many more examples ?

Let me make it clear that I really respect the authors quoted above as scientist, (the author of the second quote contributed fundamentally to the field of algorithmic probability theory himself !). But really, I cannot imagine any other reasons for summoning Ockham in this context than the desire to look humanistic, or philosophical, the desire to make students nodd in "comprehension", or, I'm sorry, a bit of muddled thinking.

OK, so let's make it clear once more:
  • Ockham's razor is an intuitive philosophical insight.
  • Ockham's razor is NOT the underlying principle of Solomonoff induction. It may have been an inspiration to Solomoff, but so may have been, say, talking to Marvin Minsky. Note also the complete absence of the name "Ockham" (or Occam) in this talk.
  • MML, MDL, AIC, MAP, and even least-squares approaches to theory formation can all be derived from Solomonoff induction. Logically, Ockham's razor is NOT the underlying principle of any of these theories.
  • Solomonoff induction is NOT a "formalization" of Ockham's razor. Solomonoff induction does NOT proof Ockham's razor is useful.
  • Ockham's razor is NOT an empirical observation. It's a maxime, a rule of thumb, a heuristic. It's usefulness can in fact be debated, since it's a rubberband rule, i.e. you can stretch it in to various sizes and shapes. Your intuitionist notion of simplicity may not be the same as mine. In the end, we're back to gut feeling.
  • Ockham's razor is intended for use by human beings. You cannot really translate it into a rigorous mathematical statement. In particular Solomonoff induction is not a "version" of Ockham's razor.
  • MDL, MML, MAP, AIC are valid mathematical approaches at scientific data analysis. A scientist should not defend the use of these methods by invoking Ockham's razor. And if a scientist invokes Ockham's razor in a non-mathematical situation, be aware he's essentially talking about his gut feeling.

Nov 6, 2007

オーストリアの紅葉


この写真ザンクト・フローリアン(St. Florian)修道院のブナの木が表示されます。 ところで作曲家アントン・ブルックナーはこの場所の近くに生まれた。日本人の読者のごあいさつを!



Oct 26, 2007

The Absence Of Monologue

How do I tell them that because of the unfreezing process I have no inner monologue? I hope I didn't just say that out loud...
Austin Powers

I still don't know why humans have an internal monologue. This means I'm seriously wondering about it's evolutionary motivation, and could not imagine why I would include it in an AI design.

But of course, I'm somewhat biased from personal experience, here.

The truth is, until the age of 12 or 13 years, I myself simply had no inner monologue at all. Unlike the International Man of Mystery however, I was not talking out loud. I was simply thinking completely non-verbally all the time. This seemed to have had no negative net effect on my cognitive abilities at all; I was a bit smarter than most kids my age. (Well, maybe a bit smarter than most smart kids, too...) Social cognition or speech production, two likely first casualties, weren't impaired either. Personally, I was not even aware of this anomaly in the first place, since I did not know how noisy other minds were. When I was exposed to inner monologue in film or literature, I interpreted it as a style device, like the "sweat drop" in manga. In the same spirit, I interpreted a sentence like " The pig thought: "I should be going home" " as the pig thinking that it should go home, but not literally subverbalizing "I should be going home".

Curiously, I absolutely cannot remember how or when I started to develop an inner monologue. Neither can I remember being aware of any change for months or maybe even years. But I have biographical memories from when I was about 13 years where I was reflecting on the apparent change.

Bottom line: "thought = language" = BS.