New JAIR paper: Exploiting Causality for Selective Belief Filtering in Dynamic Bayesian Networks

Our paper proposes a new way to account for “passivity” structure in Dynamic Bayesian Networks, which enables more efficient belief computations and through that improvements for systems modelled by POMDPs and so on. It was surprising to me when we started this project that despite significant earlier attention to exploiting conditional independence structure, there had not been work on these notions of using constraints (often imposed by physics of other background regularities) in making belief updates more efficient.

Please read the paper in JAIR:, abstract reproduced below:

Stefano V. Albrecht and Subramanian Ramamoorthy (2016) “Exploiting Causality for Selective Belief Filtering in Dynamic Bayesian Networks”, Volume 55, pages 1135-1178.

Dynamic Bayesian networks (DBNs) are a general model for stochastic processes with partially observed states. Belief filtering in DBNs is the task of inferring the belief state (i.e. the probability distribution over process states) based on incomplete and noisy observations. This can be a hard problem in complex processes with large state spaces. In this article, we explore the idea of accelerating the filtering task by automatically exploiting causality in the process. We consider a specific type of causal relation, called passivity, which pertains to how state variables cause changes in other variables. We present the Passivity-based Selective Belief Filtering (PSBF) method, which maintains a factored belief representation and exploits passivity to perform selective updates over the belief factors. PSBF produces exact belief states under certain assumptions and approximate belief states otherwise, where the approximation error is bounded by the degree of uncertainty in the process. We show empirically, in synthetic processes with varying sizes and degrees of passivity, that PSBF is faster than several alternative methods while achieving competitive accuracy. Furthermore, we demonstrate how passivity occurs naturally in a complex system such as a multi-robot warehouse, and how PSBF can exploit this to accelerate the filtering task.

Our agent, Edart, at the Trading Agent Competition 2015

My student, Stavros Gerakaris, has been working on applying our multi agent learning ideas to the domain of Ad Exchanges. He is participating in the Sixteenth Annual Trading Agent Competition (TAC-15), conducted as part of AAMAS-15. His entry, entitled Edart, finished 2nd among the participants and 5th overall, the last year’s winners still did better than all of us. This earns us a spot in the finals. If you’d like to know more about the background and setup of this competition, see this paper by Mariano Schain and Yishay Mansour.

People familiar with AMEC/TADA will realise that the main objective of these competitions is to try out our original ideas in a demanding open ended domain. In this sense, I am especially pleased that this agent had begun to validate our more theoretical work in the form of the Harsanyi-Bellman Ad-hoc Coordination algorithm, originally developed by Stefano Albrecht, which Stavros is using in a partially observable and censored observation setting. In due course, this work will appear as a publication, so watch that space in our publications list.

Decision Making when there are Unknown Unknowns

Unknown unknowns are everywhere, a bit like dark matter in the universe. Yet, everything we seem to do in terms of algorithms for learning and inference either assumes a simplified setting that is closed in terms of the hypothesis space (hence not even allowing for these unknown unknowns), or depends on our being able to setup such generally expressive priors that computation is far from tractable. How do real people really bridge the gap? We don’t know, of course, but we have started to take a stab at this from a different direction. With my colleague, Alex Lascarides, and my former student, Benji Rosman, we have been looking into this issue in a specific setting – that of asking how an agent incrementally grows its model to reach the level of knowledge of a more experienced teacher, while dealing with a world that requires our agent to expand its hypothesis space during the process of learning and inference.

This is very much ongoing work, of the kind wherein we have an idea of where we might like to end up (a lighthouse on the horizon) with only a very limited idea of the way there, and the nature of the rocky shores we’ll need to navigate to get there. A status report on the current state of this work, for local Informatics folks, would be an upcoming talk as part if the DReaM talks (10th March, 11:30 am in room IF 2.33) – abstract below.


Decision Making when there are Unknown Unknowns

Alex Lascarides

Joint work with Ram Ramamoorthy and Benji Rosman

Existing approaches to learning how to solve a decision problem all
assume that the hypothesis space is known in advance of the learning
process.  That is, the agent knows all possible states, all possible
actions, and also has complete knowledge of his or her own intrinsic
preferences (typically represented as a function from the set of
possible states to numeric award).  In most cases, the models for
learning how to behave optimally also assume that the probabilistic
dependencies among the factors that influence behaviour are known as

But there are many decision problems where these high informational
demands on learning aren’t met.  An agent may have to act in the
domain without known all possible states or actions or with only
partial and uncertain information about his or her own preferences.
And yet if one changes the random variables one uses to represent a
decision problem, or one changes the reward function, then this is
viewed as a different and unrelated decision problem.  Intuitively,
one needs a logic of change to one’s decision problem, where change is
informed by evidence.

I will present here some relatively half-baked ideas about how to
learn optimal behaviour when the agent starts out with incomplete and
uncertain information about the hypothesis space: that is, the agent
knows there are `unknown unknowns’.  The model is one where the agent
adapts the representation of the decision problem, and so revises
calculations of optimal behaviour, by drawing on two sources of
evidence: their own exploration of the domain by repeatedly performing
actions and observing their consequences and rewards; and dialogues
with an oracle who knows the true representation of the decision

Our hypothesis is that an agent that abides by certain defeasible
principles for adapting the representation of the decision problem to
the evidence learns to converge on optimal behaviour faster than an
agent who ignores evidence that his current representation entails the
wrong hypothesis space or intrinsic rewards, or an agent who adapts
the representation of the decision problem in a way that does not make
the defeasible assumptions we’ll argue for here.

Uncertainty vs. Risk

In the eloquent words of Nietzsche:

“Oh heaven over me, pure and high! That is what your purity is to me now…that to me you are a dance floor for divine accidents, that you are to me a divine table for divine dice and dice players. But you blush? Did I speak the unspeakable?”

– Also Sprach Zarathustra, Third Part, Before Sunrise.

Upcoming talks – Microsoft Research Cambridge, University of Hertfordshire

I will be giving a couple of talks this week, describing some of our recent work on learning to interact with changing environments such as when autonomous agents must work with people.

Here is the abstract from the MSR announcement:

Towards ad hoc interactions with robots

A primary motivation for work within my group is the notion of autonomous agents that can interact, robustly over the long term, with an incompletely known environment that continually changes. In this talk I will describe results from a few different projects that attempt to address key aspects of this big question.

I will begin by looking at how task encodings can be made effective using qualitative (geometric) structure in the strategy space. Using examples that may be familiar to many machine learning researchers – such as control of an inverted pendulum and bipedal walkers – we will explore this connection between the geometric structure of solutions and strategies for dealing with a continually changing task context. The key result here would be regarding ways to combine exploitation of ‘natural’ dynamics with the benefits of active planning.

Can there be similarly flexible encodings for more general decision problems, beyond the domain of robot control? I will describe recent results from our work on policy reuse and transfer learning, demonstrating how it is possible to construct agents that can learn to adapt, through a process of belief updating based on policy performance, to a changing task context including the case where the change may be induced by other decision making agents.

Finally, building on this theme of making decisions in the presence of other decision making agents, I will briefly describe results from our recent experiments in human-robot interaction where agents must learn to influence the behaviour of other agents in order to achieve their task. This experiment is a step towards general and implementable models of ad hoc interaction where agents learn from experience to shape aspects of that interaction without the benefits of prior coordination and related knowledge. I will conclude with some remarks on the potential practical uses of such models and learning methods in a wide variety of applications ranging from personal robotics to intelligent user interfaces.

This talk is part of the Microsoft Research Machine Learning and Perception Seminars series.

Hamming Seminar: Games Robots Play

I will be giving a talk on 23rd Nov, as part of the Hamming Seminar series, where I will attempt to lay out my case for a line of research that I am pursuing with my research group. I welcome you to come, participate in the discussions.


Where are yesterday’s robots of tomorrow? The optimist might answer that some such robots have already begun to enter the realm of the possible, in the form of self-driving cars, robots that can fold towels, take an elevator to fetch you a sandwich, etc. The pessimist would rightly counter that although there are media clips showing robots doing these things, most of these systems are frightfully far from being ready for prime time (do we really expect to be able to take home an Asimo along with our iPhones?). The realist would note that although the pessimist’s objections are valid, a variety of systems ranging from automated vacuum cleaners to unmanned submarines are routinely and gainfully deployed around the world today. Moreover, what about the much larger ecosystem of virtual robots roaming in worlds ranging from eBay and Amazon to NASDAQ?

Siding with the pragmatic realist, I argue that autonomous robots that do succeed and become commonplace will perhaps come in the form of collections of heterogeneous modules that are put together in an ad hoc way to do whatever is the need of the hour. Moreover, people are more likely to accept these robots in mixed-initiative team situations, capable of strategic interaction, rather than as black box unknowns. In this milieu, one technical issue stands out as being particularly thorny – how does one marshal diverse modules into long-lived autonomous systems capable of interacting in such non-trivial ways with a continually changing and rich world?

A proper answer to this question requires a careful look at foundational issues in the science of autonomous decision making and an understanding of the space of implementable models for interactive behaviour. Conceptual advances in this area must also be informed by an empirical science of decisions in boundedly rational agents. One way to do this kind of science is by trying to construct complete agents that act in domains that are rich enough to fully exercise the theoretical framework being developed, while being sufficiently well posed to enable incremental experiments. Two such domains are robotic football and autonomous agents in electronic markets. This talk will provide glimpses into why and how AI researchers get robots to play these games, and how this helps us address the big questions.