Decision Making when there are Unknown Unknowns

Unknown unknowns are everywhere, a bit like dark matter in the universe. Yet, everything we seem to do in terms of algorithms for learning and inference either assumes a simplified setting that is closed in terms of the hypothesis space (hence not even allowing for these unknown unknowns), or depends on our being able to setup such generally expressive priors that computation is far from tractable. How do real people really bridge the gap? We don’t know, of course, but we have started to take a stab at this from a different direction. With my colleague, Alex Lascarides, and my former student, Benji Rosman, we have been looking into this issue in a specific setting – that of asking how an agent incrementally grows its model to reach the level of knowledge of a more experienced teacher, while dealing with a world that requires our agent to expand its hypothesis space during the process of learning and inference.

This is very much ongoing work, of the kind wherein we have an idea of where we might like to end up (a lighthouse on the horizon) with only a very limited idea of the way there, and the nature of the rocky shores we’ll need to navigate to get there. A status report on the current state of this work, for local Informatics folks, would be an upcoming talk as part if the DReaM talks (10th March, 11:30 am in room IF 2.33) – abstract below.

—————————————————————————————-

Decision Making when there are Unknown Unknowns

Alex Lascarides

Joint work with Ram Ramamoorthy and Benji Rosman

Existing approaches to learning how to solve a decision problem all
assume that the hypothesis space is known in advance of the learning
process.  That is, the agent knows all possible states, all possible
actions, and also has complete knowledge of his or her own intrinsic
preferences (typically represented as a function from the set of
possible states to numeric award).  In most cases, the models for
learning how to behave optimally also assume that the probabilistic
dependencies among the factors that influence behaviour are known as
well.

But there are many decision problems where these high informational
demands on learning aren’t met.  An agent may have to act in the
domain without known all possible states or actions or with only
partial and uncertain information about his or her own preferences.
And yet if one changes the random variables one uses to represent a
decision problem, or one changes the reward function, then this is
viewed as a different and unrelated decision problem.  Intuitively,
one needs a logic of change to one’s decision problem, where change is
informed by evidence.

I will present here some relatively half-baked ideas about how to
learn optimal behaviour when the agent starts out with incomplete and
uncertain information about the hypothesis space: that is, the agent
knows there are `unknown unknowns’.  The model is one where the agent
adapts the representation of the decision problem, and so revises
calculations of optimal behaviour, by drawing on two sources of
evidence: their own exploration of the domain by repeatedly performing
actions and observing their consequences and rewards; and dialogues
with an oracle who knows the true representation of the decision
problem.

Our hypothesis is that an agent that abides by certain defeasible
principles for adapting the representation of the decision problem to
the evidence learns to converge on optimal behaviour faster than an
agent who ignores evidence that his current representation entails the
wrong hypothesis space or intrinsic rewards, or an agent who adapts
the representation of the decision problem in a way that does not make
the defeasible assumptions we’ll argue for here.

Advertisements

Uncertainty vs. Risk

In the eloquent words of Nietzsche:

“Oh heaven over me, pure and high! That is what your purity is to me now…that to me you are a dance floor for divine accidents, that you are to me a divine table for divine dice and dice players. But you blush? Did I speak the unspeakable?”

– Also Sprach Zarathustra, Third Part, Before Sunrise.

On the fallacy of prediction

This is an interesting op-ed type piece on prediction: http://www.scientificamerican.com/article.cfm?id=financial-flimflam.

It is a bit predictable and doesn’t have too many new things to say but the following para caught my attention:

There was one significant factor in greater prediction success, however, and that was cognitive style: “foxes” who know a little about many things do better than “hedgehogs” who know a lot about one area of expertise. Low scorers, Tetlock wrote, were “thinkers who ‘know one big thing,’ aggressively extend the explanatory reach of that one big thing into new domains, display bristly impatience with those who ‘do not get it,’ and express considerable confidence that they are already pretty proficient forecasters.” High scorers in the study were “thinkers who know many small things (tricks of their trade), are skeptical of grand schemes, see explanation and prediction not as deductive exercises but rather as exercises in flexible ‘ad hocery’ that require stitching together diverse sources of information, and are rather diffident about their own forecasting prowess.”

This bit I completely agree with. What we need is a computational framework for doing exactly what the author says above! Then maybe the problem wouldn’t be as hopeless as it is now, even though (I agree) we will never have the perfect crystal ball..

Extended uncertainty

I have been following Rick Bookstaber’s blog and there is a very interesting discussion there about robustness in the face of large scale uncertainty. As a conceptual issue, this has been on my mind for a while. In fact, this issue was one of the motivations behind some aspects of my doctoral thesis and I am actively trying to address this question in the context of my robotics and autonomous agents work. All this has made me realize that this is one of the big hard problems (however, somewhat under-appreciated) within the science of decision processes.

Bookstaber’s take on this issue is well put in this paper in the Journal of Theoretical Biology. In a nutshell, he argues that if an animal has to make decisions in a world where it can observe some states, s, but not some other states, t, (whose evolution it simply can not model, i.e., the concept of extended uncertainty) then the process of making an optimal decision in this setting would imply “averaging” over the extended uncertainty in a proper decision-theoretic sense. The main implication of this process is that the animal will adopt a set of coarse behaviours that will be sub-optimal with respect to many observable features in the restricted state space, s. There is a lot more detail in that paper, mainly focussed on theoretical biology issues of animal behaviour.

I find this particularly interesting because I too had been approaching the problem of learning robust behaviours by assuming that there are coarse behaviours (however, defined differently from how Bookstaber does it) for most tasks of real interest, such that fine behaviours with respect to specific quantitative optimality criteria  are specializations of these coarse behaviours. Two questions arise within this program. How will you find concise descriptions of these coarse behaviours while learning from experience in a complex world? Many learning techniques, such as reinforcement learning, run into deep computational difficulties in this hierarchical setting. However, this is a well posed technical question already being addressed by many people, and I am actively working on as well. A somewhat more foundational conceptual question is – why do you think these types of coarse behaviours will exist in general? I am a control theorist who would ideally like to find solutions for these learning problems that are indepdent of the special domains (e.g., is it clear that coarse behaviours would exist in arbitrary decision processes such as, say, in autonomous trading), so this generality question is of interest to me. Bookstaber’s paper points towards the answer in a general decision-theoretic sense.

In an earlier post, I tried to make an argument that the point of “intelligence” is to be able to act robustly in an uncertain world. Now we see that optimal decisions in the face of extended uncertainty implies the existence of coarse behaviours for many common decision tasks. So, perhaps, an agent who is learning a complex behaviour in an uncertain world is  better off structuring this process in a similar multi-level way…

Uncertainty and non-determinism

What is the difference between uncertainty and non-determinism? I am beginning to find that many graduate students, and even many researchers, I come across do not really distinguish between the two very much. In fact, even within the first class, what is the difference between probabilistic uncertainty and, say, set-membership description of uncertainty?

Now, if I ask the question properly, it becomes clear that there are differences and it is not that hard to guess what they are from the words used. However, these things have important influences on what they mean for various techniques, e.g., design of control strategies. If what you are dealing with is probabilistic uncertainty that can be modeled and statistically estimated then one could design control strategies in a way that is not all that different from the procedure for deterministic noise-free systems. The “right” procedure might be a lot harder for set-membership uncertainty. For more general forms of non-determinism, the concerns and design procedures tend to be even more different from the case of “noisy systems”.

So what? Well, for one thing, we have very little work in the robot learning or control learning areas which is able to do anything beyond the data-driven equivalent of stochastic control in the presence of “small noise”. Currently, non-determinism is really only handled by “re-planning” but we don’t have very efficient methods for that either, and even then this is built as a hierarchy of non-interacting black boxes. So, this area needs more work – which begins with more awareness of the underlying issue!

PS: I recently came across this interesting paper that is loosely related to what I say above but has a different focus and motivation: Ken Binmore, Rational Decisions in Large Worlds, Annales d’Économie et de Statistique, No. 86, pp. 25-41, 2007 (from the looks of it, this is a short version of a book).

Using topology to overcome uncertainty.

Over the past few years, I have been thinking quite a bit about what I consider to be a central issue at the interface of machine learning and robotics – can we incorporate prior knowledge regarding dynamical behavior into learning algorithms in such a way that it remains invariant with respect to the inevitable quantitative uncertainties that plague real systems? In my own work, I have taken the view that the right approach is to focus on the topological structure of strategies.

I am very pleased to see that this view is becoming more mainstream. There is a workshop at NIPS focusing on ways to learn topology from data. Meanwhile, in the general area of robotics, there is a stellar group of researchers (see, e.g., the SToMP project) studying the use of topological methods to deal with distributed robotics and sensor networks.

Of course, there is still a lot more to be done before these methods begin to show tangible results at the level of something like a humanoid robot. Nonetheless, I am pleased to see this emerging consensus. It’s the right way!