Training the System

Despite existence of efficient estimation algorithm, still too slow
- 1500 word document yields ~ 10k states
- Use extract: down to 511 words ~ 4k states
- Beam search: explore only most likely 50% of state space

Model learns in an unsupervised manner on 2033 document/extract pairs