Unknown unknowns are everywhere, a bit like dark matter in the universe. Yet, everything we seem to do in terms of algorithms for learning and inference either assumes a simplified setting that is closed in terms of the hypothesis space (hence not even allowing for these unknown unknowns), or depends on our being able to setup such generally expressive priors that computation is far from tractable. How do real people really bridge the gap? We don’t know, of course, but we have started to take a stab at this from a different direction. With my colleague, Alex Lascarides, and my former student, Benji Rosman, we have been looking into this issue in a specific setting – that of asking how an agent incrementally grows its model to reach the level of knowledge of a more experienced teacher, while dealing with a world that requires our agent to expand its hypothesis space during the process of learning and inference.
This is very much ongoing work, of the kind wherein we have an idea of where we might like to end up (a lighthouse on the horizon) with only a very limited idea of the way there, and the nature of the rocky shores we’ll need to navigate to get there. A status report on the current state of this work, for local Informatics folks, would be an upcoming talk as part if the DReaM talks (10th March, 11:30 am in room IF 2.33) – abstract below.
Decision Making when there are Unknown Unknowns
Joint work with Ram Ramamoorthy and Benji Rosman
Existing approaches to learning how to solve a decision problem all
assume that the hypothesis space is known in advance of the learning
process. That is, the agent knows all possible states, all possible
actions, and also has complete knowledge of his or her own intrinsic
preferences (typically represented as a function from the set of
possible states to numeric award). In most cases, the models for
learning how to behave optimally also assume that the probabilistic
dependencies among the factors that influence behaviour are known as
But there are many decision problems where these high informational
demands on learning aren’t met. An agent may have to act in the
domain without known all possible states or actions or with only
partial and uncertain information about his or her own preferences.
And yet if one changes the random variables one uses to represent a
decision problem, or one changes the reward function, then this is
viewed as a different and unrelated decision problem. Intuitively,
one needs a logic of change to one’s decision problem, where change is
informed by evidence.
I will present here some relatively half-baked ideas about how to
learn optimal behaviour when the agent starts out with incomplete and
uncertain information about the hypothesis space: that is, the agent
knows there are `unknown unknowns’. The model is one where the agent
adapts the representation of the decision problem, and so revises
calculations of optimal behaviour, by drawing on two sources of
evidence: their own exploration of the domain by repeatedly performing
actions and observing their consequences and rewards; and dialogues
with an oracle who knows the true representation of the decision
Our hypothesis is that an agent that abides by certain defeasible
principles for adapting the representation of the decision problem to
the evidence learns to converge on optimal behaviour faster than an
agent who ignores evidence that his current representation entails the
wrong hypothesis space or intrinsic rewards, or an agent who adapts
the representation of the decision problem in a way that does not make
the defeasible assumptions we’ll argue for here.