Project COGLE, within the DARPA XAI Programme

We have been awarded one of the projects under the DARPA Explainable AI programme, to be kicked off next week. Our project, entitled COGLE (Common Ground Learning and Explanation), will be coordinated by Xerox Palo Alto Research Centre, and I will a PI leading technical efforts on the machine learning side of the architecture.

COGLE will be a highly interactive sense-making system for explaining the learned performance capabilities of an autonomous system and the history that produced that learning. COGLE will be initially developed using an autonomous Unmanned Aircraft System (UAS) test bed that uses reinforcement learning (RL) to improve its performance. COGLE will support user sensemaking of autonomous system decisions, enable users to understand autonomous system strengths and weaknesses, convey an understanding of how the system will behave in the future, and provide ways for the user to improve the UAS’s performance.

To do this, COGLE will:

  1. Provide specific interactions in sensemaking user interfaces that directly support modes of human explanation known to be effective and efficient in human learning and understanding.
  2. Support mapping (grounding) of human conceptualizations onto the RL representations and processes.

This area is becoming one that is increasingly being discussed in the public sphere, in the context of the increasing adoption of AI into daily lives, e.g., see this article in the MIT Technology Review and this one in Nautilus, both referring directly to this DARPA programme. I look forward to contributing to this theme!

Team Edina selected to Compete for Alexa Prize

My student, Emmanuel Kahembwe, is part of Team Edina – consisting of students and postdoctoral researchers from the School of Informatics at the University of Edinburgh – who are one of 12 teams competing for The Alexa Prize. The grand challenge is to build a socialbot that can converse coherently and engagingly with humans on popular topics for 20 minutes.

Let us wish them all the best and I am very curious to see what comes out of this competition!

New JAIR paper: Exploiting Causality for Selective Belief Filtering in Dynamic Bayesian Networks

Our paper proposes a new way to account for “passivity” structure in Dynamic Bayesian Networks, which enables more efficient belief computations and through that improvements for systems modelled by POMDPs and so on. It was surprising to me when we started this project that despite significant earlier attention to exploiting conditional independence structure, there had not been work on these notions of using constraints (often imposed by physics of other background regularities) in making belief updates more efficient.

Please read the paper in JAIR: http://dx.doi.org/10.1613/jair.5044, abstract reproduced below:

Stefano V. Albrecht and Subramanian Ramamoorthy (2016) “Exploiting Causality for Selective Belief Filtering in Dynamic Bayesian Networks”, Volume 55, pages 1135-1178.

Dynamic Bayesian networks (DBNs) are a general model for stochastic processes with partially observed states. Belief filtering in DBNs is the task of inferring the belief state (i.e. the probability distribution over process states) based on incomplete and noisy observations. This can be a hard problem in complex processes with large state spaces. In this article, we explore the idea of accelerating the filtering task by automatically exploiting causality in the process. We consider a specific type of causal relation, called passivity, which pertains to how state variables cause changes in other variables. We present the Passivity-based Selective Belief Filtering (PSBF) method, which maintains a factored belief representation and exploits passivity to perform selective updates over the belief factors. PSBF produces exact belief states under certain assumptions and approximate belief states otherwise, where the approximation error is bounded by the degree of uncertainty in the process. We show empirically, in synthetic processes with varying sizes and degrees of passivity, that PSBF is faster than several alternative methods while achieving competitive accuracy. Furthermore, we demonstrate how passivity occurs naturally in a complex system such as a multi-robot warehouse, and how PSBF can exploit this to accelerate the filtering task.

Our agent, Edart, at the Trading Agent Competition 2015

My student, Stavros Gerakaris, has been working on applying our multi agent learning ideas to the domain of Ad Exchanges. He is participating in the Sixteenth Annual Trading Agent Competition (TAC-15), conducted as part of AAMAS-15. His entry, entitled Edart, finished 2nd among the participants and 5th overall, the last year’s winners still did better than all of us. This earns us a spot in the finals. If you’d like to know more about the background and setup of this competition, see this paper by Mariano Schain and Yishay Mansour.

People familiar with AMEC/TADA will realise that the main objective of these competitions is to try out our original ideas in a demanding open ended domain. In this sense, I am especially pleased that this agent had begun to validate our more theoretical work in the form of the Harsanyi-Bellman Ad-hoc Coordination algorithm, originally developed by Stefano Albrecht, which Stavros is using in a partially observable and censored observation setting. In due course, this work will appear as a publication, so watch that space in our publications list.

Decision Making when there are Unknown Unknowns

Unknown unknowns are everywhere, a bit like dark matter in the universe. Yet, everything we seem to do in terms of algorithms for learning and inference either assumes a simplified setting that is closed in terms of the hypothesis space (hence not even allowing for these unknown unknowns), or depends on our being able to setup such generally expressive priors that computation is far from tractable. How do real people really bridge the gap? We don’t know, of course, but we have started to take a stab at this from a different direction. With my colleague, Alex Lascarides, and my former student, Benji Rosman, we have been looking into this issue in a specific setting – that of asking how an agent incrementally grows its model to reach the level of knowledge of a more experienced teacher, while dealing with a world that requires our agent to expand its hypothesis space during the process of learning and inference.

This is very much ongoing work, of the kind wherein we have an idea of where we might like to end up (a lighthouse on the horizon) with only a very limited idea of the way there, and the nature of the rocky shores we’ll need to navigate to get there. A status report on the current state of this work, for local Informatics folks, would be an upcoming talk as part if the DReaM talks (10th March, 11:30 am in room IF 2.33) – abstract below.

—————————————————————————————-

Decision Making when there are Unknown Unknowns

Alex Lascarides

Joint work with Ram Ramamoorthy and Benji Rosman

Existing approaches to learning how to solve a decision problem all
assume that the hypothesis space is known in advance of the learning
process.  That is, the agent knows all possible states, all possible
actions, and also has complete knowledge of his or her own intrinsic
preferences (typically represented as a function from the set of
possible states to numeric award).  In most cases, the models for
learning how to behave optimally also assume that the probabilistic
dependencies among the factors that influence behaviour are known as
well.

But there are many decision problems where these high informational
demands on learning aren’t met.  An agent may have to act in the
domain without known all possible states or actions or with only
partial and uncertain information about his or her own preferences.
And yet if one changes the random variables one uses to represent a
decision problem, or one changes the reward function, then this is
viewed as a different and unrelated decision problem.  Intuitively,
one needs a logic of change to one’s decision problem, where change is
informed by evidence.

I will present here some relatively half-baked ideas about how to
learn optimal behaviour when the agent starts out with incomplete and
uncertain information about the hypothesis space: that is, the agent
knows there are `unknown unknowns’.  The model is one where the agent
adapts the representation of the decision problem, and so revises
calculations of optimal behaviour, by drawing on two sources of
evidence: their own exploration of the domain by repeatedly performing
actions and observing their consequences and rewards; and dialogues
with an oracle who knows the true representation of the decision
problem.

Our hypothesis is that an agent that abides by certain defeasible
principles for adapting the representation of the decision problem to
the evidence learns to converge on optimal behaviour faster than an
agent who ignores evidence that his current representation entails the
wrong hypothesis space or intrinsic rewards, or an agent who adapts
the representation of the decision problem in a way that does not make
the defeasible assumptions we’ll argue for here.

Dagstuhl talk – Learning action-oriented symbols

A few months back, I attended a Dagstuhl workshop on Neural-Symbolic Learning and Reasoning – a meeting that tried to bring together people looking at sub-symbolic (primarily but not exclusively neural network) and symbolic (i.e., logic based) learning and  reasoning.

I gave a talk on “Learning Action-oriented Symbols: Abstractions over Decision Processes”. This was an attempt at synthesising the idea behind a couple of different papers we have worked for over the past two years.

The report associated with this workshop has now been released: http://drops.dagstuhl.de/opus/volltexte/2015/4884/. It is an interesting collection of ideas, especially if you follow the links to the primary publications and associated background materials.

Smart watches are still pretty dumb

I have been looking into sensing technology (especially the wearable kind) with increasing interest. This is in part because of some current work I am involved with (e.g., our papers at IPSN ’13, and ACM TECS ’13, which were really initial forays), but more broadly because many of us are becoming convinced that persistent interaction between (wo)man and computational machines is a defining theme of the next decade or two of technology, and sensors are the mediating entities.

I have also felt for a while now that most sensors, even – especially? – when they are called ‘smart’ seem quite dumb actually. For instance, there is a wealth of papers that talk about smart this and that when what they really mean is just what an AI or robotics person would call reactive. In this context, I found this article in the MIT Technology Review to be interesting. As the author says,

After trying some smart watches, I’ve determined that a good one will need to be more than just reliable and simple to use—it will have to learn when and how to bother me. This means figuring out what I’m doing, and judging what bits of information among countless e-mails, app updates, and other alerts are most pressing. And, naturally, it must look good.

I think this is more than just a product design issue. This is fairly challenging to achieve properly even from a research point of view – requiring learning of models of human choice behaviour, etc. There is also the issue of designing good sensors that can be deployed, e.g., see this article. Lots to do, but it’s the kind of thing that’ll be fun to do!