Supervised (yes, supervised) learning with 0 examples and other methods for obviating those pesky training sets

Erik G. Learned-Miller
Department of Computer Science
University of Massachusetts, Amherst

ABSTRACT:
Sometimes the first examples we have seen of particular objects or patterns come at test time rather than at training time. A simple example is reading a highly stylized font, say, on a store front. Appearance models trained a priori tend to do very poorly in classifying the letters of such new fonts. In this talk, I discuss our recent work in addressing the difficult problem of encountering new types of patterns at test time, especially those that are not well modeled by training data, either labeled or unlabeled. In the first part of the talk, I present ways of constraining the interpretations of patterns that are invariant to their appearance. This sounds paradoxical, but is quite simple. For example, the string 01221221331 is an encoding of a common string where each letter has been substituted with a digit. (Can you guess the string?) We show how such techniques can be used to provide important constraints in difficult problems like scene text recognition. In the second part of the talk, I discuss our work in optical character recognition. I discuss a "font free" OCR system which has never been trained on, or given any information about the specific appearance of any character, and yet can easily read the majority of most documents correctly. I also discuss new work in bootstrapping training sets in OCR problems. In this work, we automatically extract "training sets" from noisy documents so that we can dynamically build document specific models. We call this "Learning on the Fly". Finally, I discuss potential application of such ideas to other problems in computer vision and pattern recognition.