Is there just one kind of learning problem out there?

… correspondingly, is there one ‘holy grail’ learning method that solves all these problems?

Over the past few months, in the process of preparing courses and trying to identify the essential driving themes – as opposed to opportunistic or accidental advances – within AI and learning, I find myself settling on the opinion that there are a few major, essentially different, problem types or categories out there. Depending on which of these categories you find most appealing, you form a correspondingly different picture of what is hard about designing intelligent systems. So, many discussions about methodology, regarding what kinds of tools ought to be developed and that define the essence of the field are best understood by first stating one’s affiliation in this sense – to clarify where one is coming from.

I am not yet able to list all such categories but I think there are at least three that stand out to the extent that I have tried to understand them first hand:

  1. Natural language, object recognition in vision, etc.: Here, one of the most important hard problems is to capture the immense variability of the underlying ‘grammar’. So, one of the most successful approaches to solving the problem – a la Google – has been to find technologies to acquire and process massively large databases of the natural variety and use that as the way to induce the grammar.
  2. Decision making under uncertainty, especially in an ‘open environment’: Here, in my opinion, one of the hardest problems is that one is trying to autonomously respond to an immense variability in possible evolutions of a system (many parallels with above thread), but often in a setting where it is not possible or tractable to have enough data to enumerate the possible worlds. Most successful approaches here have involved cleverly posing these problems so that this essential issue is reduced (essentially, by reducing the extent to which the system is required to be autonomous) to one of a few familiar paradigms, e.g., MDPs for RL, where some version of the tools from the previous bullet point suffice.
  3. Mechanism design, communication protocols, etc. etc.: I think I need to come up with a better label for this category but I refer to the problem of trying to engineer mechanisms, such as auctions and voting schemes. Most people thinking about the first bullet point would not even consider this to be on their radar but many people interested in the second bullet point would feel a bit more affinity to these issues.

My point in categorizing things this way is in order to understand how, and to what extent, rapid advances in specific types of tools change or define the game for each of these problem types. For instance, probabilistic graphical models have brought about an entire reformation and a wealth of tools associated with efficient inference and learning of a certain class of models. This has naturally resulted in many follow-on innovations such as Bayesian RL – where one is trying to rephrase the good old dilemma of exploration vs. exploitation in terms of a different problem involving inference over distributions – using powerful new tools. Naively, on the face of it, one is led to believe that there are no real problems of type 2 – they just need to be mapped to type 1 and somehow one ought to adapt the machinery.

This is exciting. However, is it true that there are no problems of type 2? Moreover, what happens by the time you approach problems like in type 3? If your real goal is to solve the problem of achieving real autonomy, it seems like something more may be needed. In particular, imagine how exceptions are treated: when classifying speech, one may simply put all utterances that are vague in one big miscellaneous basket. If you get to, say, 97% accuracy, you’ve done well. If you are acting in the real world, even if you can’t identify your predator, you still need to make a move – unique to that time and place and with real consequences.Indeed, this may also be a loss minimization and one could perhaps try to handle the uncertainty as a distribution over paths but it is not clear that is the best way to proceed. Essentially always doing the right sort of thing with limited resources and information, at the cost of sub-optimally, is far more important than statistically often making the optimal inference. So, when one goes about learning, one may wish to think differently about the point of the exercise. What does this imply regarding the need for a science and tool set explicitly dedicated to these issues?

My question is not rhetorical. I am genuinely seeking to understand to what extent and how these problems are essentially similar or different. Is there indeed a single unifying theme with all of these problems being merely instances. Is it the case that the latter problems are hard in a way that is different from the former? I’m curious if others have further clarifying insights.

I am also thinking here of trends in other established fields, e.g., in solvers for ODEs and PDEs. They are fundamental to entire branches of engineering – especially mechanical and chemical. I have heard arguments to the effect that all there is to these fields is just a few domain-specific facts and, mainly, fancy solvers and optimization procedures. However, a good designer (of cars, oil wells and prosthetics) seems to operate differently. Advances in these fields are a clever combination of good (but far from perfect and often ‘outdated’) numerical tools and other techniques and tools that take into account the unique difficulty of design.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s