Searn: Search-based Structured Prediction
Searn (searn.hal3.name) is a generic algorithm for
solving structured prediction problems. This page contains papers,
software and notes about using Searn for solving a variety of
problems. Feel free to contact with
questions or comments.
Papers
-
Practical Structured Learning Techniques for Natural Language Processing.
Hal Daumé III. PhD Thesis, 2006 (USC).
This thesis describes an algorithm for solving many of the complex
prediction problems encountered in NLP applications. The algorithm comes
with strong theoretical guarentees, is empirically effective in
applications such as IE and summarization, is efficient and is easy to
implement.
[ BIB
| PDF
| PS
| HTML ]
-
Search-based Structured Prediction.
Hal Daumé III, John Langford and Daniel Marcu. Submitted to Machine Learning Journal, 2006.
We present Searn (for "search-learn"), an algorithm for decomposing
structured prediction problems into a set of classification problems
solvable by standard classification methods. Searn works for
any loss function and any underlying classifier. Our
algorithm satisfies a desirable reduction guarantee: good performance
on binary prediction problems implies good performance on the original
problem. Searn is also efficient whenever the loss function is
efficiently approximately optimizable. We test Searn and show
state-of-the-art performance on several diverse structured prediction
problems.
[ BIB
| PDF
| PS ]
-
Searn in Practice.
Hal Daumé III, John Langford and Daniel Marcu. Unpublished, 2006.
We recently introduced an algorithm, Searn, for solving hard
structured prediction problems. This algorithm
enjoys many nice properties: efficiency, wide applicability,
theoretical justification and simplicity. However, under a desire to
fit a lot of information into the original paper,
it may not be so clear how simple the technique is. This report is
designed to showcase how Searn can be applied to a wide variety of
techniques and what really goes on behind the scenes. We will
make use of three example problems, ranging from simple to complex.
These are: (1) sequence labeling, (2) parsing and (3) machine
translation. (These were chosen to be as widely understandable,
especially in the NLP community, as possible.) In the end, we will
come back to discuss Searn for general problems.
[ BIB
| PDF
| PS ]
Implementation
I'm releasing a very simply and stripped down implementation of Searn
(limited to sequence labeling with Hamming loss) that should answer some
questions people have been asking. You can download it here: SimpleSearn.tgz (requires Perl).
I'm also releasing a much more featureful implementation of Searn that can
accomodate pretty much any structured prediction task. You can download it
here: SearnShell_v0.1.tgz (requires Perl).
The cool thing about SearnShell is that you provide your own code for doing
search and computing features and it automatically connects this to a base
learning algorithm like megam or libsvm.
Frequently Asked Questions
Here, I'll attempt to answer some questions that I've either been
asked by email () or in
person.
Q:
In the Searn training algorithm, could you tell me in which step we
should perform the search process using a kind of search algorithm,
such as greedy search or beam search.
A:
Essentially never. I like to think of Searn as follow. We have a
structured prediction problem, which entails solving an argmax during
prediction. This argmax is intractable. So, normally, we will
use some search algorithm to approximate it (beam or greedy or...).
Whatever search algorithm we apply, the search is performed by making
a bunch of sequential decisions (each decision is a step is the search
process). The idea behind Searn is that instead of learning some sort
of global model and then searching (as is standard), instead we will
simply learn a classifier to make each of these decisions optimally.
The question then becomes how to train such a classifier, and Searn is
one possible solution. In the end, there is no search going
on. All you're doing is making a bunch of sequential decisions,
as if you were performing search, but you aren't
actually searching. So there is nowhere in the training or prediction
algorithm where you will run some search algorithm.
Q:
The Searn algorithm takes an optimal policy as input. Can I believe
that actually the optimal policy is an initial classifier? Could you
tell me how to construct the initial classifier?
A:
When you think about Searn as described in response to the previous
question, you see that training a classifier to make incremental
decisions requires that we get labeled training data for incremental
decisions. This is essentially where the optimal policy comes in.
The optimal policy tells us: for a given input (eg., sentence), true
output (eg., part of speech tags for this sentence) and some location
in the search space (eg., after 3 words have been tagged), what is the
best action to take. Importantly, this is based on having
access to the true output sequence. For instance, in sequence
labeling, under Hamming loss, the optimal policy will always choose
simply to label the next word with its correct label.
The optimal policy is not a classifier, in the sense that it is not
learned. It is, essentially, exactly what we want our
classifiers to learn, since it always makes correct choices. It is up
to us as the algorithm designer to come up with a way of computing the
optimal policy on the basis of the true output. Essentially, what
we need to do is to find some way of structuring the search
space so that computing an optimal policy for our given loss function
is easy. I don't know a magic method for constructing it, but if you
look at the examples for sequence labeling, parsing and machine
translation, hopefully that will give some sort of idea.
More to come...
Last updated 5 May 2006