Defining the Question

Different researchers have different ways of coming up with questions that drive their work, especially the big defining themes that organize the many little projects. For AI researchers, a popular approach is to think about various aspects of intelligence, as exhibited in the natural world, and to ask how they want to explain, understand, recreate parts of it. What follows is one installment of an ongoing attempt to sketch my chosen, biased and necessarily incomplete, view on this.

For me, the need for ‘intelligence‘ is closely tied to the need for clever variations in dealings with a hostile, arbitrarily complicated and infinite world. Autonomous agents (animals, people, robots, trading or bidding agents) must exist a world that is constantly changing, in ways that are not easy to enumerate and in a closed-loop setting where one can’t separate acquisition of experience, learning and decision making.

What does this mean in practice? I like to think of this in terms of a game, played out continually over the entire lifetime of an agent. An agent is born into an unknown world (endowed with varying levels of preliminary skill – ‘level zero’, i.e., no skill being a perfectly valid option) and finds itself able to play an arbitrary and infinite set of games – some are easy, some are hard, some are downright lethal in high doses. Any instance of a game must be solved using the agent’s bounded resources in a suitably game-dependent finite interval (after which, the world being non-stationary, the details of the game would have changed anyway). The only way for the agent to play the more complex and ‘dangerous’ games is to reuse its knowledge from simpler games. However, information about this setup is only available in an experience-based sense. The agent needs algorithmic procedures for figuring out how to navigate this world and incrementally learn to become more skilled.

All this sounds terribly abstract but I claim that it happens all the time. A favorite example I often quote to my students is the CO2 filter scene from the movie Apollo 13:

Here is another very relevant one – if you are impatient, skip to the final 15 seconds of the clip.

Yes, these are unusual scenarios in the sense that normal people may not be formally called upon to solve problems in this dramatic way. However, people do rise to challenges (see below). Also, yes, the specific instance of this problem may be posed as an instance of state space search which we all know, after chess, is an incomplete account of intelligent behavior. However, that is only true if I literally put a pile of stuff on the table and say find me a solution. Instead, if I asked more open ended questions with potentially infinite solutions, I am not just beating the familiar, possibly lifeless, horse.

So, notwithstanding some potential criticisms, the question remains – what kind of learning algorithms do we need that can incrementally learn from a lifetime of such encounters to one day solve problems in genuinely creative and original ways? Such a learning algorithm must surely be one of the more important parts of the final solution to the AI question.

To support my claim that such scenarios are not all that unusual, I offer the following pictures of my daughter as I found her this morning when I woke up:

Apparently, she had come to the conclusion that she and her teddy bear were in need of a tent. We don’t actually have one easily available in our living room, so she set about making one. She put some of her chairs together, connected them together with a towel for a roof, lined up some cushions for a wall and even set up two other cushions as doors through which they could crawl in and out.

Admittedly, this is not rocket science. Lots of kids, and most adults, can do this with ‘limited’ cognitive effort. However, there is no denying the fact that she did something involving some open-ended ‘reasoning’ about an arbitrary, large collections of objects and what they could potentially do – outside of the normal circumstances where they are often found. Anyone who has played ‘pretend’ games with little kids knows how all this works. I claim that these behaviors are the ‘drosophila and yeast’ of our area under discussion, and getting agents to do this well is a very worthwhile and nontrivial exercise.

At some level, these are not entirely original thoughts. This is the developmental approach to AI and many researchers work in this area. However, as yet, there are no generic techniques for learning like this in general domains. A lot of very good work is done by replicating early infancy (right down to rattles and brightly colored toys) and trying to get algorithms to do what cognitive psychologists tell us kids are able to do. That is excellent. But, I am sceptical of being able to directly scale up from this to the Apollo 13 scenarios, entirely on the shoulders of the behavioral theories. At the same time, it is worth noting that very little of the current thrust of developments in either the statistical machine learning or logic-based learning tools is directly focused on addressing this issue. So, I would very much like to see (hopefully, develop myself) an algorithmically principled approach to this type of learning. This is my big theme – from which many of the little questions for my individual papers are derived (and, in broad terms, my chosen application domains emphasize the importance of this kind of learning and thinking).


What does Polymath tell us about problem solving?

Gowers and Nielsen have written a nice opinion piece (Nature 461, 879-881, 15 October 2009) on The Polymath Project, an open-source and collaborative attempt at solving an unsolved math problem – to find a new proof of a result in ergodic theory called the density Hales-Jewett theorem using only ‘elementary’ building blocks. The protocol for the collaboration was that each participant (who could be anyone – from beginning student to Fields medalist – with an interest in the topic, from anywhere in the world) could post one nugget of an idea at a time, to the weblog. There was no requirement regarding the individual contributions. Indeed, it seems like some of the posts were just comments clarifying small points. In this first trial, it took a group of 27 people 37 days and approx. 800 serious comments to get the desired result.

The authors ask, who would have guessed that the working record of a mathematical project would read like a thriller? But, of course, both of them  know very well (as does every serious scientist) that this is exactly the nature of research Рthis is why we do what we do.

To me, there were two important messages to take away:

(a) The protocol begins to demystify the process of creativity. I have always been turned off by the macho posturing by researchers who deny that big ideas are, in the end, just clever compositions of carefully chosen smaller ones. Instead, this project strongly suggests that the collaborative effect of multiple incomplete but properly diverse viewpoints is what it takes, much like Minsky’s Society of Mind.

(b) As the article notes,

“Although DHJ Polymath was large compared with most mathematical collaborations, it fell short of being the mass collaboration initially envisaged. Those involved agreed that scaling up much further would require changes to the process. A significant barrier to entry was the linear narrative style of the blog. This made it difficult for late entrants to identify problems to which their talents could be applied.”

With my AI researcher hat on (i.e., as someone who looks at this project as inspiration for the design of corresponding ‘intelligent’ computational systems), I find that this is the exactly the challenge that an autonomous agent must come to terms with when trying to learn useful skills in a lifelong sense. It is not that hard to devise learning procedures that can do the equivalent of making nuggets of suggestions. The harder problem is to learn context and measures of appropriateness for the individual components. However, if we begin to understand how people really do this it shouldn’t be impossible to get machines to follow (although, as someone used to ask me in response to such bold statements, ‘…famous last words?!’).

Simon on Discovery

“For a variety of reasons, perhaps best understood by psychoanalysis, when we talk or write about scientific discovery, we tend to dwell lovingly on great events – Galileo and uniform acceleration, Newton and universal gravitation, Einstein and relativity. We insist that a theory of discovery postulate proceses sufficiently powerful to produce these events. It is right to so insist, but we must not forget how rare such events are, and we must not postulate processes so powerful that they predict discovery of first magnitude as a daily matter.

On the contrary, for each such event there is an investment of thousands of man-years of investigation by hundreds of talented and hard-working scientists. This particular slot machine produces many stiff arms for every jackpot. At the same time that we explain how Schrodinger, in 1926, came to quantum mechanics, we must explain why Planck, Bohr, Einstein, de Broglie, and other men of comparable ability struggled for the preceding twenty years without completing this discovery. Scientific discovery is a rare event; a theory to explain it must predict innumerable failures for every success.”

– Herbert A. Simon

Scientific Discovery and The Psychology of Problem Solving, In Models of Discovery, 1977.

What does Wolfram Alpha do?

I heard about Wolfram Alpha through a variety of news articles such as this one (from a couple of months back). It is still a bit unclear to me precisely what this product does. What I have managed to figure out so far is that this is an engine that accepts queries in some subset of natural language and tries to perform a certain level of ‘understanding’ in order to answer the queries. The news articles suggest similarities with Powerset and other start-ups I had heard about from some of my colleagues who considered taking jobs in this area. However, reading between the lines, Wolfram seems to promise something more in the direction of calculations. My current best guess is a nice interface to the core technology of something like Mathematica so that the net effect is an “intelligent computation engine”.

I have requested access through the Wolfram web site. Let us see if I get to play with it and, if I do, what the details look like…

Why do simple techniques work?

My past few posts have been driven by an underlying question that was pointedly raised by someone in a discussion group I follow on linkedin (if you’re curious, this is a Quant Finance group that I follow due to my interest in autonomous agent design and the question was posed by a hedge fund person with a Caltech PhD and a Wharton MBA):

I read Ernest Chan’s book on quantitative trading. He said that he tried a lot of complicated advanced quantitative tools, it turns out that he kept on losing money. He eventually found that the simplest things often generated best returns. From your experiences, what do not think about the value of advanced econometric or statistical tools in developing quantitative strategies. Are these advanced tools (say wavelet analysis, frequent domain analysis, state space model, stochastic volatility, GMM, GARCH and its variations, advanced time series modeling and so on) more like alchemy in the scientific camouflage, or they really have some value. Stochastic differential equation might have some value in trading vol. But I am talking about quantitative trading of futures, equities and currencies here. No, technical indicators, Kalman filter, cointegration, regression, PCA or factor analysis have been proven to be valuable in quantitative trading. I am not so sure about anything beyond these simple techniques.

This is not just a question about trading. The exact same question comes up over and over in the domain of robotics and I have tried to address it in my published work.

My take on this issue is that before one invokes a sophisticated inference algorithm, one has to have a sensible way to describe the essence of the problem – you can only learn what you can succinctly describe and represent! All too often, when advanced methods do not work, it is because they’re being used with very little understanding of what makes the problem hard.¬† Often, there is a fundamental disconnect in that the only people who truly understand the sophisticated tools are tools developers who are more interested in applying their favourite tool(s) to any given problem than in really understanding a problem and asking what is the simplest tool for it. Moreover, how many people out there have a genuine feel for Hilbert spaces and infinite-dimensional estimation while also having the practical skills to solve problems in constrained ‘real world’ settings? Anyone who has this rare combination would be ideally placed to solve the complex problems we are all interested in, whether using simple methods or more sophisticated ones (i.e., it is not just about tools but about knowing when to use what and why). But, such people are rare indeed.

On bottom-up/top-down development of concepts

For many years now, beginning with some questions that were part of my doctoral dissertation research, I have been curious about multi-level models that describe phenomena and strategies. A fundamental question that arises in this setting is regarding which direction (top-down/bottom-up) takes primacy.

A particular sense in which this directly touches upon my work is in the ability of unsupervised and semi-supervised learning methods to model “everything of interest” in a complex domain (e.g., robotics) so that any detailed analysis of the domain is rendered unnecessary. A claim that is often made is that the entire hierarchy will just emerge from the bottom-up. My own experience with difficult problems such as synthesizing complex humanoid robot behaviours makes me sceptical of the breadth of this claim. I find that, often, the easily available observables do not suffice and one needs to work hard to get the true description. However, I am equally sceptical of the chauvinistic view that the only way to solve problems is to model everything in the domain and dream up a clever strategy or the defeatist view that the only way to solve the problem is to look at pre-existing solutions somewhere else and copy them. Instead, in my own work, I have searched for a middle ground where one seeks general principles on both ends of the spectrum and tries to tie it together efficiently.

Recently, while searching google scholar for some technical papers on multi-level control and learning, I came across an interesting philosophical paper (R.C. Bishop and H. Atmanspacher, Contextual emergence in the description of properties, Foundations of Physics 36(12):1753-1777, 2006.) that makes the case that extracting deep organizational principles for a higher level from a purely bottom-up approach is, in a certain sense, a fundamentally ill-posed problem. Even in “basic” areas like theoretical physics one needs more context. Yet, all is not lost. What this really means is that there are some top-down contextual constraints (much weaker than arbitrary rigid prescriptions) that are necessary to make the two mesh together. You will probably have to at least skim the paper to get a better idea but I think this speaks to the same issue I raise above and says something quite insightful.