Supervised (yes, supervised) learning with 0 examples
and other methods for obviating those pesky training sets
Erik G. Learned-Miller
Department of Computer Science
University of Massachusetts, Amherst
ABSTRACT:
Sometimes the first examples we have seen of particular objects or
patterns come at test time rather than at training time. A simple example
is reading a highly stylized font, say, on a store front. Appearance
models trained a priori tend to do very poorly in classifying the letters
of such new fonts.
In this talk, I discuss our recent work in addressing the difficult problem
of encountering new types of patterns at test time, especially those that are not well
modeled by training data, either labeled or unlabeled. In the first part
of the talk, I present ways of constraining the interpretations of patterns
that are invariant to their appearance. This sounds paradoxical, but
is quite simple. For example, the string 01221221331 is an encoding of
a common string where each letter has been substituted with a digit.
(Can you guess the string?) We show how such techniques can be used to
provide important constraints in difficult problems like scene text recognition.
In the second part of the talk, I discuss our work in optical character recognition.
I discuss a "font free" OCR system which has never been trained on, or given
any information about the specific appearance of any character, and yet can
easily read the majority of most documents correctly. I also discuss new work
in bootstrapping training sets in OCR problems. In this work, we automatically
extract "training sets" from noisy documents so that we can dynamically build
document specific models. We call this "Learning on the Fly".
Finally, I discuss potential application of such ideas to other problems in
computer vision and pattern recognition.