Project funded by the National Science Foundation (NCSE-1422492)
PI: Jordan Boyd-Graber
Labeling schemes such as the MESH ontology help users understand large corpora of specialized text. However, despite the demonstrated utility of such techniques, they represent an uncertain value proposition, as they require huge investments of resources to both create and apply the labels.
For the latter question, automatic labeling of text data using improves the value proposition by reducing cost. However, the process for creating a broadly applicable, consistent, and generalizable label set and then applying them to a dataset is long and difficult.
To solve the problem of label creation, we present ALTO (Active Learning from Topic Overviews), a user interactive tool for document labeling that uses topic models to help users assign appropriate labels to documents. We show that annotators can more quickly label (higher value, lower cost) a document collection given a topic modeling overview and that these efforts result in a more useful (in our experiments, higher purity) system.
<< back to top
![]() |
Jordan Boyd-Graber Assistant Professor, Computer Science (Colorado) |
![]() |
Fenfei Guo PhD Student, Computer Science (Colorado) |
![]() |
You Lu MS Student, Computer Science (Colorado) |
![]() |
Forough Poursabzi PhD Student, Computer Science (Colorado) |
<< back to top
@article{Gerow:Hu:Boyd-Graber:Blei:Evans-2018, Title = {Measuring Discursive Influence Across Scholarship}, Author = {Aaron Gerow and Yuening Hu and Jordan Boyd-Graber and David M. Blei and James A. Evans}, Journal = {Proceedings of the National Academies of Science}, Year = {2018}, }
@article{Gerow:Hu:Boyd-Graber:Blei:Evans-2018, Title = {Measuring Discursive Influence Across Scholarship}, Author = {Aaron Gerow and Yuening Hu and Jordan Boyd-Graber and David M. Blei and James A. Evans}, Journal = {Proceedings of the National Academies of Science}, Year = {2018}, }
@inproceedings{Lu:Lund:Boyd-Graber-2017, Title = {Why ADAGRAD Fails for Online Topic Modeling}, Author = {You Lu and Jeff Lund and Jordan Boyd-Graber}, Booktitle = {Empirical Methods in Natural Language Processing}, Url = {http://cs.umd.edu/~jbg//docs/2017_emnlp_adagrad_olda.pdf}, Year = {2017}, Location = {Copenhagen, Denmark}, }
@inproceedings{Poursabzi-Sangdeh:Boyd-Graber:Findlater:Seppi-2016, Title = {ALTO: Active Learning with Topic Overviews for Speeding Label Induction and Document Labeling}, Author = {Forough Poursabzi-Sangdeh and Jordan Boyd-Graber and Leah Findlater and Kevin Seppi}, Booktitle = {Association for Computational Linguistics}, Year = {2016}, Location = {Berlin, Brandenburg}, Url = {http://cs.umd.edu/~jbg//docs/2016_acl_doclabel.pdf}, }
@inbook{Klochikhin:Boyd-Graber-2016, Editor = {Julia Lane and Ian Foster and Rayid Ghani and Ron Jarmin and Frauke Kreuter}, Title = {Text Analysis}, Author = {Evgeny Klochikhin and Jordan Boyd-Graber}, Booktitle = {Big Data and Social Science Research: Theory and Practical Approaches}, Publisher = {Taylor Francis}, Year = {2016}, }
@inbook{Klochikhin:Boyd-Graber-2016, Editor = {Julia Lane and Ian Foster and Rayid Ghani and Ron Jarmin and Frauke Kreuter}, Title = {Text Analysis}, Author = {Evgeny Klochikhin and Jordan Boyd-Graber}, Booktitle = {Big Data and Social Science Research: Theory and Practical Approaches}, Publisher = {Taylor Francis}, Year = {2016}, }
@inproceedings{Poursabzi-Sangdeh:Boyd-Graber-2015, Title = {Speeding Document Annotation with Topic Models}, Author = {Forough Poursabzi-Sangdeh and Jordan Boyd-Graber}, Booktitle = {NAACL Student Research Workshop}, Year = {2015}, Location = {Denver, CO}, }
This work is supported by the National Science Foundation. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the researchers and do not necessarily reflect the views of the National Science Foundation.