Searn: Search-based Structured Prediction

Searn (searn.hal3.name) is a generic algorithm for solving structured prediction problems. This page contains papers, software and notes about using Searn for solving a variety of problems. Feel free to contact me AT hal3 DOT name

with questions or comments.

Papers

Practical Structured Learning Techniques for Natural Language Processing. Hal Daumé III. PhD Thesis, 2006 (USC).
This thesis describes an algorithm for solving many of the complex prediction problems encountered in NLP applications. The algorithm comes with strong theoretical guarentees, is empirically effective in applications such as IE and summarization, is efficient and is easy to implement.
[ BIB | PDF | PS | HTML ]
Search-based Structured Prediction. Hal Daumé III, John Langford and Daniel Marcu. Submitted to Machine Learning Journal, 2006.
We present Searn (for "search-learn"), an algorithm for decomposing structured prediction problems into a set of classification problems solvable by standard classification methods. Searn works for any loss function and any underlying classifier. Our algorithm satisfies a desirable reduction guarantee: good performance on binary prediction problems implies good performance on the original problem. Searn is also efficient whenever the loss function is efficiently approximately optimizable. We test Searn and show state-of-the-art performance on several diverse structured prediction problems.
[ BIB | PDF | PS ]
Searn in Practice. Hal Daumé III, John Langford and Daniel Marcu. Unpublished, 2006.
We recently introduced an algorithm, Searn, for solving hard structured prediction problems. This algorithm enjoys many nice properties: efficiency, wide applicability, theoretical justification and simplicity. However, under a desire to fit a lot of information into the original paper, it may not be so clear how simple the technique is. This report is designed to showcase how Searn can be applied to a wide variety of techniques and what really goes on behind the scenes. We will make use of three example problems, ranging from simple to complex. These are: (1) sequence labeling, (2) parsing and (3) machine translation. (These were chosen to be as widely understandable, especially in the NLP community, as possible.) In the end, we will come back to discuss Searn for general problems.
[ BIB | PDF | PS ]

Implementation

I'm releasing a very simply and stripped down implementation of Searn (limited to sequence labeling with Hamming loss) that should answer some questions people have been asking. You can download it here: SimpleSearn.tgz (requires Perl).

I'm also releasing a much more featureful implementation of Searn that can accomodate pretty much any structured prediction task. You can download it here: SearnShell_v0.1.tgz (requires Perl). The cool thing about SearnShell is that you provide your own code for doing search and computing features and it automatically connects this to a base learning algorithm like megam or libsvm.

Frequently Asked Questions

Here, I'll attempt to answer some questions that I've either been asked by email ( me AT hal3 DOT name

) or in person.

Q: In the Searn training algorithm, could you tell me in which step we should perform the search process using a kind of search algorithm, such as greedy search or beam search.
A: Essentially never. I like to think of Searn as follow. We have a structured prediction problem, which entails solving an argmax during prediction. This argmax is intractable. So, normally, we will use some search algorithm to approximate it (beam or greedy or...). Whatever search algorithm we apply, the search is performed by making a bunch of sequential decisions (each decision is a step is the search process). The idea behind Searn is that instead of learning some sort of global model and then searching (as is standard), instead we will simply learn a classifier to make each of these decisions optimally. The question then becomes how to train such a classifier, and Searn is one possible solution. In the end, there is no search going on. All you're doing is making a bunch of sequential decisions, as if you were performing search, but you aren't actually searching. So there is nowhere in the training or prediction algorithm where you will run some search algorithm.

Q: The Searn algorithm takes an optimal policy as input. Can I believe that actually the optimal policy is an initial classifier? Could you tell me how to construct the initial classifier?
A: When you think about Searn as described in response to the previous question, you see that training a classifier to make incremental decisions requires that we get labeled training data for incremental decisions. This is essentially where the optimal policy comes in. The optimal policy tells us: for a given input (eg., sentence), true output (eg., part of speech tags for this sentence) and some location in the search space (eg., after 3 words have been tagged), what is the best action to take. Importantly, this is based on having access to the true output sequence. For instance, in sequence labeling, under Hamming loss, the optimal policy will always choose simply to label the next word with its correct label.

The optimal policy is not a classifier, in the sense that it is not learned. It is, essentially, exactly what we want our classifiers to learn, since it always makes correct choices. It is up to us as the algorithm designer to come up with a way of computing the optimal policy on the basis of the true output. Essentially, what we need to do is to find some way of structuring the search space so that computing an optimal policy for our given loss function is easy. I don't know a magic method for constructing it, but if you look at the examples for sequence labeling, parsing and machine translation, hopefully that will give some sort of idea.

More to come...

Last updated 5 May 2006