Unknown unknowns are everywhere, a bit like dark matter in the universe. Yet, everything we seem to do in terms of algorithms for learning and inference either assumes a simplified setting that is closed in terms of the hypothesis space (hence not even allowing for these unknown unknowns), or depends on our being able to setup such generally expressive priors that computation is far from tractable. How do real people really bridge the gap? We don’t know, of course, but we have started to take a stab at this from a different direction. With my colleague, Alex Lascarides, and my former student, Benji Rosman, we have been looking into this issue in a specific setting – that of asking how an agent incrementally grows its model to reach the level of knowledge of a more experienced teacher, while dealing with a world that requires our agent to expand its hypothesis space during the process of learning and inference.

This is very much ongoing work, of the kind wherein we have an idea of where we might like to end up (a lighthouse on the horizon) with only a very limited idea of the way there, and the nature of the rocky shores we’ll need to navigate to get there. A status report on the current state of this work, for local Informatics folks, would be an upcoming talk as part if the DReaM talks (10th March, 11:30 am in room IF 2.33) – abstract below.

—————————————————————————————-

**Decision Making when there are Unknown Unknowns**

**Alex Lascarides**

*Joint work with Ram Ramamoorthy and Benji Rosman*

Existing approaches to learning how to solve a decision problem all

assume that the hypothesis space is known in advance of the learning

process. That is, the agent knows all possible states, all possible

actions, and also has complete knowledge of his or her own intrinsic

preferences (typically represented as a function from the set of

possible states to numeric award). In most cases, the models for

learning how to behave optimally also assume that the probabilistic

dependencies among the factors that influence behaviour are known as

well.

But there are many decision problems where these high informational

demands on learning aren’t met. An agent may have to act in the

domain without known all possible states or actions or with only

partial and uncertain information about his or her own preferences.

And yet if one changes the random variables one uses to represent a

decision problem, or one changes the reward function, then this is

viewed as a different and unrelated decision problem. Intuitively,

one needs a logic of change to one’s decision problem, where change is

informed by evidence.

I will present here some relatively half-baked ideas about how to

learn optimal behaviour when the agent starts out with incomplete and

uncertain information about the hypothesis space: that is, the agent

knows there are `unknown unknowns’. The model is one where the agent

adapts the representation of the decision problem, and so revises

calculations of optimal behaviour, by drawing on two sources of

evidence: their own exploration of the domain by repeatedly performing

actions and observing their consequences and rewards; and dialogues

with an oracle who knows the true representation of the decision

problem.

Our hypothesis is that an agent that abides by certain defeasible

principles for adapting the representation of the decision problem to

the evidence learns to converge on optimal behaviour faster than an

agent who ignores evidence that his current representation entails the

wrong hypothesis space or intrinsic rewards, or an agent who adapts

the representation of the decision problem in a way that does not make

the defeasible assumptions we’ll argue for here.