First page Back Continue Last page Graphics
Facts from Human Annotation
Function words are hard to align
16% of abstract words are unaligned
Average summary phrase length is ~1.3
- 80% are singletons, 6.1% are phrases of length 2, 2.2% are phrases of length 3
67% are identical up to stem (86% for singletons, 48% for phrases)
Distance between corresponding documents words aligned to subsequent summary words is ~ 1