My former student, Benji Rosman, and I worked on an idea that we called action priors – a way for an agent to learn a task-independent domain model of what actions are worth considering when learning a policy for a new sequential decision making task instance. This is described in a paper that has recently been accepted to the IEEE Transactions on Autonomous Mental Development.
This has been an interesting project in that, along the way, I have found several quite interesting connections. Firstly, I had always been curious about the famous experiment by Herb Simon and collaborators on board memory in chess – does that tell us something of broader relevance to intelligent agents? For instance, watch the clip below, paying special attention to the snippet starting at 1:25.
The kind of mistake he makes – involving only those two rooks – suggests his representation is not just based on compressing his perception (these rooks are by no means the least salient or otherwise least memorable pieces), but is intrinsically driven by the value – along very similar lines, we argue, as action priors.
Subsequently, through a nice kind of serendipity – i.e., conversations with my colleague Matt Nolan, I have come to know about some phenomena associated with place cells as studied by neuroscientists. Apparently, there are open questions about how place cells seem to represent intended destination in goal directed maze tasks. The question we are now trying to figure out is – does the concept of action priors and this way of re-representing space give us a handle on how place cells behave as well?
I’ll give a talk on this topic at the upcoming Spatial Computation Workshop, abstract below.
Priors from learning over a lifetime and potential connections to place and grid cells
– Subramanian Ramamoorthy (joint work with Benjamin Rosman)
An agent tasked with solving a number of different decision making problems in similar environments has an opportunity to learn over a longer timescale than each individual task. Through examining solutions to different tasks, it can uncover behavioural invariances in the domain, by identifying actions to be prioritised in local contexts, invariant to task details. This information has the effect of greatly increasing the speed of solving new problems. We formalise this notion as action priors, defined as distributions over the action space, conditioned on environment state, and show how these can be learnt from a set of value functions.
Applying action priors in the setting of reinforcement learning, using examples involving spatial navigation tasks, we show the benefits of this kind of bias over action selection during exploration. Aggressive use of action priors performs context based pruning of the available actions, thus reducing the complexity of lookahead during search. Additionally, we show how action priors over observation features, rather than states, provide further flexibility and generalisability, with the additional benefit of enabling feature selection.
An alternate interpretation to these priors is as a re-parameterisation of the domain within which the various tasks have been defined. In this sense, these prior distributions bear a resemblance to attributes, such as spatial distribution and firing patterns, of place cells. We conclude by discussing this connection using a variation of the above experiments.