Learning to be thick skinned!

The following anecdote came in a posting to one of the mailing lists I subscribe to, on decision theory. The message of course is quite domain independent, and in many ways transcends time too!

On Christmas Eve 1874, Tchaikovsky brought the score of his Piano Concerto no. 1 to the renowned pianist and conductor and the founder of the Moscow Conservatory, Nikolai Rubinstein, for advice on how to make the solo part more effective. This is how Tchaikovsky remembers it.

“I played the first movement. Not a single word, not a single comment! … I summoned all my patience and played through the end. Still silence. I stood up and asked, ‘well?’’’

“Then a torrent poured forth from Nikolai Gregorievich’s mouth… My concerto, it turned out, was worthless and unplayable – passages so fragmented, so clumsy, so badly written as to be beyond rescue – the music itself was bad, vulgar – here and there I had stolen from other composers – only two or three pages were worth preserving – the rest must be thrown out or completely rewritten…”

‘I shall not alter a single note’ I replied, I shall publish the work exactly as it stands!’ And this I did.”

The moral of the story: If you believe in the merits your work, don’t let a bad referee report get you down. Listen to Tchaikovsky’s Piano Concerto no. 1 to lift your spirit and move on.

Our agent, Edart, at the Trading Agent Competition 2015

My student, Stavros Gerakaris, has been working on applying our multi agent learning ideas to the domain of Ad Exchanges. He is participating in the Sixteenth Annual Trading Agent Competition (TAC-15), conducted as part of AAMAS-15. His entry, entitled Edart, finished 2nd among the participants and 5th overall, the last year’s winners still did better than all of us. This earns us a spot in the finals. If you’d like to know more about the background and setup of this competition, see this paper by Mariano Schain and Yishay Mansour.

People familiar with AMEC/TADA will realise that the main objective of these competitions is to try out our original ideas in a demanding open ended domain. In this sense, I am especially pleased that this agent had begun to validate our more theoretical work in the form of the Harsanyi-Bellman Ad-hoc Coordination algorithm, originally developed by Stefano Albrecht, which Stavros is using in a partially observable and censored observation setting. In due course, this work will appear as a publication, so watch that space in our publications list.

Action Priors and Place Cells

My former student, Benji Rosman, and I worked on an idea that we called action priors – a way for an agent to learn a task-independent domain model of what actions are worth considering when learning a policy for a new sequential decision making task instance. This is described in a paper that has recently been accepted to the IEEE Transactions on Autonomous Mental Development.

This has been an interesting project in that, along the way, I have found several quite interesting connections. Firstly, I had always been curious about the famous experiment by Herb Simon and collaborators on board memory in chess – does that tell us something of broader relevance to intelligent agents? For instance, watch the clip below, paying special attention to the snippet starting at 1:25.

The kind of mistake he makes – involving only those two rooks – suggests his representation is not just based on compressing his perception (these rooks are by no means the least salient or otherwise least memorable pieces), but is intrinsically driven by the value – along very similar lines, we argue, as action priors.

Subsequently, through a nice kind of serendipity – i.e., conversations with my colleague Matt Nolan, I have come to know about some phenomena associated with place cells as studied by neuroscientists. Apparently, there are open questions about how place cells seem to represent intended destination in goal directed maze tasks. The question we are now trying to figure out is – does the concept of action priors and this way of re-representing space give us a handle on how place cells behave as well?

I’ll give a talk on this topic at the upcoming Spatial Computation Workshop, abstract below.

Priors from learning over a lifetime and potential connections to place and grid cells
– Subramanian Ramamoorthy (joint work with Benjamin Rosman)

An agent tasked with solving a number of different decision making problems in similar environments has an opportunity to learn over a longer timescale than each individual task. Through examining solutions to different tasks, it can uncover behavioural invariances in the domain, by identifying actions to be prioritised in local contexts, invariant to task details. This information has the effect of greatly increasing the speed of solving new problems. We formalise this notion as action priors, defined as distributions over the action space, conditioned on environment state, and show how these can be learnt from a set of value functions.

Applying action priors in the setting of reinforcement learning, using examples involving spatial navigation tasks, we show the benefits of this kind of bias over action selection during exploration. Aggressive use of action priors performs context based pruning of the available actions, thus reducing the complexity of lookahead during search. Additionally, we show how action priors over observation features, rather than states, provide further flexibility and generalisability, with the additional benefit of enabling feature selection.

An alternate interpretation to these priors is as a re-parameterisation of the domain within which the various tasks have been defined. In this sense, these prior distributions bear a resemblance to attributes, such as spatial distribution and firing patterns, of place cells. We conclude by discussing this connection using a variation of the above experiments.

Code Yourself!

My friend, Areti Manataki, is one of the co-organisers of this excellent MOOC on Coursera, entitled “Code Yourself! An Introduction to Programming“. As the blurb on Coursera says, “Have you ever wished you knew how to program, but had no idea where to start from? This course will teach you how to program in Scratch, an easy to use visual programming language. More importantly, it will introduce you to the fundamental principles of computing and it will help you think like a software engineer.”

I like the emphasis on basics, and the desire to reach the broad audience of pre-college children. Many MOOCs I encounter are just college courses recycled. Instead, if MOOCs are to matter, and if they are to matter in the ways MOOCs are ambitiously advertised – i.e., in the developing world and in pursuit of helping new students who would not be otherwise served by existing formal programmes, this is the kind of entry point from which I’d expect to see progress.

I  made a small contribution to this course, by giving a guest interview about our work with the RoboCup project – as a case study. If you go to this course, you’ll find this under Unit 3 as “(Optional Video) Interview on football-playing robots [08:41]“.

On blue sky work…

Useful perspective to keep in mind for the next time one receives unfairly critical comments about speculative work:

Successful research enables problems which once seemed hopelessly complicated to be expressed so simply that we soon forget that they ever were problems. Thus the more successful a research, the more difficult does it become for those who use the result to appreciate the labour which has been put into it. This perhaps is why the very people who live on the results of past researches are so often the most critical of the labour and effort which, in their time, is being expended to simplify the problems of the future.

– Sir Bennett Melvill Jones, British aerodynamicist.