Project COGLE, within the DARPA XAI Programme

We have been awarded one of the projects under the DARPA Explainable AI programme, to be kicked off next week. Our project, entitled COGLE (Common Ground Learning and Explanation), will be coordinated by Xerox Palo Alto Research Centre, and I will a PI leading technical efforts on the machine learning side of the architecture.

COGLE will be a highly interactive sense-making system for explaining the learned performance capabilities of an autonomous system and the history that produced that learning. COGLE will be initially developed using an autonomous Unmanned Aircraft System (UAS) test bed that uses reinforcement learning (RL) to improve its performance. COGLE will support user sensemaking of autonomous system decisions, enable users to understand autonomous system strengths and weaknesses, convey an understanding of how the system will behave in the future, and provide ways for the user to improve the UAS’s performance.

To do this, COGLE will:

  1. Provide specific interactions in sensemaking user interfaces that directly support modes of human explanation known to be effective and efficient in human learning and understanding.
  2. Support mapping (grounding) of human conceptualizations onto the RL representations and processes.

This area is becoming one that is increasingly being discussed in the public sphere, in the context of the increasing adoption of AI into daily lives, e.g., see this article in the MIT Technology Review and this one in Nautilus, both referring directly to this DARPA programme. I look forward to contributing to this theme!

The program induction route to explainability and safety in autonomous systems

I will be giving a talk, as part of the IPAB seminar series, through which I will try to further develop this framing of the problem which I expect our group to try to solve in the medium term. In a certain sense, my case will hark back to fairly well established techniques familiar to engineers but slowly lost with the coming of statistical methods into the robotics space. Also, many others are picking up on the underlying needs, e.g., this recent article in the MIT Technology Review gives a popular account of current sentiment among some in the AI community.

Title:

The program induction route to explainability and safety in autonomous systems

Abstract:

The confluence of advances in diverse areas including machine learning, large scale computing and reliable commoditised hardware have brought autonomous robots to the point where they are poised to be genuinely a part of our daily lives. Some of the application areas where this seems most imminent, e.g., autonomous vehicles, also bring with them stringent requirements regarding safety, explainability and trustworthiness. These needs seem to be at odds with the ways in which recent successes have been achieved, e.g., with end-to-end learning. In this talk, I will try to make a case for an approach to bridging this gap, through the use of programmatic representations that intermediate between opaque but efficient learning methods and other techniques for reasoning that benefit from ’symbolic’ representations.

I will begin by framing the overall problem, drawing on some of the motivations of the DARPA Explainable AI programme (under the auspices of which we will be starting a new project shortly) and on extant ideas regarding safety and dynamical properties in the control theorists’ toolbox – also noting where new techniques have given rise to new demands.

Then, I will shift focus to results from one specific project, for Grounding and Learning Instances through Demonstration and Eye tracking (GLIDE), which serves as an illustration of the starting point from which we will proceed within the DARPA project. The problem here is to learn the mapping between abstract plan symbols and their physical instances in the environment, i.e., physical symbol grounding, starting from cross-modal input provides the combination of high- level task descriptions (e.g., from a natural language instruction) and a detailed video or joint angles signal. This problem is formulated in terms of a probabilistic generative model and addressed using an algorithm for computationally feasible inference to associate traces of task demonstration to a sequence of fixations which we call fixation programs.

I will conclude with some remarks regarding ongoing work that explicitly addresses the task of learning structured programs, and using them for reasoning about risk analysis, exploration and other forms of introspection.

New JAIR paper: Exploiting Causality for Selective Belief Filtering in Dynamic Bayesian Networks

Our paper proposes a new way to account for “passivity” structure in Dynamic Bayesian Networks, which enables more efficient belief computations and through that improvements for systems modelled by POMDPs and so on. It was surprising to me when we started this project that despite significant earlier attention to exploiting conditional independence structure, there had not been work on these notions of using constraints (often imposed by physics of other background regularities) in making belief updates more efficient.

Please read the paper in JAIR: http://dx.doi.org/10.1613/jair.5044, abstract reproduced below:

Stefano V. Albrecht and Subramanian Ramamoorthy (2016) “Exploiting Causality for Selective Belief Filtering in Dynamic Bayesian Networks”, Volume 55, pages 1135-1178.

Dynamic Bayesian networks (DBNs) are a general model for stochastic processes with partially observed states. Belief filtering in DBNs is the task of inferring the belief state (i.e. the probability distribution over process states) based on incomplete and noisy observations. This can be a hard problem in complex processes with large state spaces. In this article, we explore the idea of accelerating the filtering task by automatically exploiting causality in the process. We consider a specific type of causal relation, called passivity, which pertains to how state variables cause changes in other variables. We present the Passivity-based Selective Belief Filtering (PSBF) method, which maintains a factored belief representation and exploits passivity to perform selective updates over the belief factors. PSBF produces exact belief states under certain assumptions and approximate belief states otherwise, where the approximation error is bounded by the degree of uncertainty in the process. We show empirically, in synthetic processes with varying sizes and degrees of passivity, that PSBF is faster than several alternative methods while achieving competitive accuracy. Furthermore, we demonstrate how passivity occurs naturally in a complex system such as a multi-robot warehouse, and how PSBF can exploit this to accelerate the filtering task.

Two talks

I will be giving two talks, to a broad audience, on Thursday, 19th November.

The first of these is in a interdisciplinary workshop on Embodied Mind, Embodied Design, aimed at connecting researchers within the university with a shared interest in this general direction. The partial lineup of speakers looks interesting, ranging from psychology and health science to music. Those in Edinburgh University might look into this event happening in Room G32, 7 George Square.

The second is to the Edinburgh U3A Science group. I will be speaking to this group about Interactively Intelligent Robots: Prospects and Challenges.

 

AimBrain

My former student, Alesis Novik, is the cofounder of a start-up company AimBrain, who have a product which is a biometric security layer that can be used with any mobile data-sensitive application. They are actively fundraising, and winning targeted funding competitions, e.g., http://www.ed.ac.uk/informatics/news-events/recentnews/alumni-in-ubs-finals.

These are still early days but I am quite curious about the potential of this general line of research – using personalised attributes (the ‘biometrics’) as a stable source of identity. It would certainly begin to solve the annoying problem of passwords, but does it also have the potential in the longer term to genuinely act as the security side of ‘connected things’?

Action Priors and Place Cells

My former student, Benji Rosman, and I worked on an idea that we called action priors – a way for an agent to learn a task-independent domain model of what actions are worth considering when learning a policy for a new sequential decision making task instance. This is described in a paper that has recently been accepted to the IEEE Transactions on Autonomous Mental Development.

This has been an interesting project in that, along the way, I have found several quite interesting connections. Firstly, I had always been curious about the famous experiment by Herb Simon and collaborators on board memory in chess – does that tell us something of broader relevance to intelligent agents? For instance, watch the clip below, paying special attention to the snippet starting at 1:25.

The kind of mistake he makes – involving only those two rooks – suggests his representation is not just based on compressing his perception (these rooks are by no means the least salient or otherwise least memorable pieces), but is intrinsically driven by the value – along very similar lines, we argue, as action priors.

Subsequently, through a nice kind of serendipity – i.e., conversations with my colleague Matt Nolan, I have come to know about some phenomena associated with place cells as studied by neuroscientists. Apparently, there are open questions about how place cells seem to represent intended destination in goal directed maze tasks. The question we are now trying to figure out is – does the concept of action priors and this way of re-representing space give us a handle on how place cells behave as well?

I’ll give a talk on this topic at the upcoming Spatial Computation Workshop, abstract below.

Priors from learning over a lifetime and potential connections to place and grid cells
– Subramanian Ramamoorthy (joint work with Benjamin Rosman)

An agent tasked with solving a number of different decision making problems in similar environments has an opportunity to learn over a longer timescale than each individual task. Through examining solutions to different tasks, it can uncover behavioural invariances in the domain, by identifying actions to be prioritised in local contexts, invariant to task details. This information has the effect of greatly increasing the speed of solving new problems. We formalise this notion as action priors, defined as distributions over the action space, conditioned on environment state, and show how these can be learnt from a set of value functions.

Applying action priors in the setting of reinforcement learning, using examples involving spatial navigation tasks, we show the benefits of this kind of bias over action selection during exploration. Aggressive use of action priors performs context based pruning of the available actions, thus reducing the complexity of lookahead during search. Additionally, we show how action priors over observation features, rather than states, provide further flexibility and generalisability, with the additional benefit of enabling feature selection.

An alternate interpretation to these priors is as a re-parameterisation of the domain within which the various tasks have been defined. In this sense, these prior distributions bear a resemblance to attributes, such as spatial distribution and firing patterns, of place cells. We conclude by discussing this connection using a variation of the above experiments.