Rule-bound robots and reckless humans

I found this article, and the associated discussions about what exactly is needed for a useful level of autonomy to be really interesting:

A point that immediately stands out is this: “Researchers in the fledgling field of autonomous vehicles say that one of the biggest challenges facing automated cars is blending them into a world in which humans don’t behave by the book.” Roboticists should of course realise that this is the real and complete problem – we can’t just complain about problem humans who do not ‘behave by the book’ – that is exactly the wrong way to approach the design of a usable product! Instead, we need to focus on how to make the autonomous system capable enough to learn and reason about the world – including other agents – despite their idiosyncrasies and irrationality! This really is the difference between the rote precision of old and genuinely robust autonomy of the future.

In our own small way, we have been approaching such issues with projects such as the following:

If you are a UK student looking to work on a PhD project in this area, look into this studentship opening:

Belief and Truth in Hypothesised Behaviours

My PhD student, Stefano Albrecht, will have his viva voce examination this Wednesday. As is the convention in some parts of our School, he will give a pre-viva talk at IF 2.33 between 10 – 11 am on Wednesday, 19th August.

His talk abstract: This thesis is concerned with a specific class of multiagent interaction problems, called ‘ad hoc coordination problems’, wherein the goal is to design an autonomous agent which can achieve flexible and efficient interaction with other agents whose behaviours are unknown. This problem is relevant for a number of applications, such as adaptive user interfaces, electronic trading markets, and robotic elderly care. A useful method of interaction in such problems is to hypothesise a set of possible behaviours, or ‘types’, and to plan our own actions with respect to those types which we believe are most likely, given the observed actions. We investigate the potential and limitations of this method in the context of ad hoc coordination, by addressing a spectrum of questions pertaining to the evolution and impact of beliefs as well as the implications and detection of incorrect hypothesised types. Specifically, how can evidence (i.e. observed actions) be incorporated into beliefs and under what conditions will the resulting beliefs be correct? What impact do prior beliefs (before observing any actions) have on our ability to maximise payoffs in the long-term and can they be computed automatically? Furthermore, what relation must the hypothesised types have to the true types in order for us to solve our task optimally, despite inaccuracies in hypothesised types? Finally, how can we ascertain the correctness of hypothesised types during the interaction, without knowledge of the true types? The talk will conclude with interesting open questions and future work.

While his thesis will become available in due course, you can get an idea of the main argument in this submission to the AI Journal:, entitled Belief and Truth in Hypothesised Behaviours.

Learning to be thick skinned!

The following anecdote came in a posting to one of the mailing lists I subscribe to, on decision theory. The message of course is quite domain independent, and in many ways transcends time too!

On Christmas Eve 1874, Tchaikovsky brought the score of his Piano Concerto no. 1 to the renowned pianist and conductor and the founder of the Moscow Conservatory, Nikolai Rubinstein, for advice on how to make the solo part more effective. This is how Tchaikovsky remembers it.

“I played the first movement. Not a single word, not a single comment! … I summoned all my patience and played through the end. Still silence. I stood up and asked, ‘well?’’’

“Then a torrent poured forth from Nikolai Gregorievich’s mouth… My concerto, it turned out, was worthless and unplayable – passages so fragmented, so clumsy, so badly written as to be beyond rescue – the music itself was bad, vulgar – here and there I had stolen from other composers – only two or three pages were worth preserving – the rest must be thrown out or completely rewritten…”

‘I shall not alter a single note’ I replied, I shall publish the work exactly as it stands!’ And this I did.”

The moral of the story: If you believe in the merits your work, don’t let a bad referee report get you down. Listen to Tchaikovsky’s Piano Concerto no. 1 to lift your spirit and move on.

Our agent, Edart, at the Trading Agent Competition 2015

My student, Stavros Gerakaris, has been working on applying our multi agent learning ideas to the domain of Ad Exchanges. He is participating in the Sixteenth Annual Trading Agent Competition (TAC-15), conducted as part of AAMAS-15. His entry, entitled Edart, finished 2nd among the participants and 5th overall, the last year’s winners still did better than all of us. This earns us a spot in the finals. If you’d like to know more about the background and setup of this competition, see this paper by Mariano Schain and Yishay Mansour.

People familiar with AMEC/TADA will realise that the main objective of these competitions is to try out our original ideas in a demanding open ended domain. In this sense, I am especially pleased that this agent had begun to validate our more theoretical work in the form of the Harsanyi-Bellman Ad-hoc Coordination algorithm, originally developed by Stefano Albrecht, which Stavros is using in a partially observable and censored observation setting. In due course, this work will appear as a publication, so watch that space in our publications list.

Action Priors and Place Cells

My former student, Benji Rosman, and I worked on an idea that we called action priors – a way for an agent to learn a task-independent domain model of what actions are worth considering when learning a policy for a new sequential decision making task instance. This is described in a paper that has recently been accepted to the IEEE Transactions on Autonomous Mental Development.

This has been an interesting project in that, along the way, I have found several quite interesting connections. Firstly, I had always been curious about the famous experiment by Herb Simon and collaborators on board memory in chess – does that tell us something of broader relevance to intelligent agents? For instance, watch the clip below, paying special attention to the snippet starting at 1:25.

The kind of mistake he makes – involving only those two rooks – suggests his representation is not just based on compressing his perception (these rooks are by no means the least salient or otherwise least memorable pieces), but is intrinsically driven by the value – along very similar lines, we argue, as action priors.

Subsequently, through a nice kind of serendipity – i.e., conversations with my colleague Matt Nolan, I have come to know about some phenomena associated with place cells as studied by neuroscientists. Apparently, there are open questions about how place cells seem to represent intended destination in goal directed maze tasks. The question we are now trying to figure out is – does the concept of action priors and this way of re-representing space give us a handle on how place cells behave as well?

I’ll give a talk on this topic at the upcoming Spatial Computation Workshop, abstract below.

Priors from learning over a lifetime and potential connections to place and grid cells
– Subramanian Ramamoorthy (joint work with Benjamin Rosman)

An agent tasked with solving a number of different decision making problems in similar environments has an opportunity to learn over a longer timescale than each individual task. Through examining solutions to different tasks, it can uncover behavioural invariances in the domain, by identifying actions to be prioritised in local contexts, invariant to task details. This information has the effect of greatly increasing the speed of solving new problems. We formalise this notion as action priors, defined as distributions over the action space, conditioned on environment state, and show how these can be learnt from a set of value functions.

Applying action priors in the setting of reinforcement learning, using examples involving spatial navigation tasks, we show the benefits of this kind of bias over action selection during exploration. Aggressive use of action priors performs context based pruning of the available actions, thus reducing the complexity of lookahead during search. Additionally, we show how action priors over observation features, rather than states, provide further flexibility and generalisability, with the additional benefit of enabling feature selection.

An alternate interpretation to these priors is as a re-parameterisation of the domain within which the various tasks have been defined. In this sense, these prior distributions bear a resemblance to attributes, such as spatial distribution and firing patterns, of place cells. We conclude by discussing this connection using a variation of the above experiments.