Assistive Technology
-------------------------

Sonya S. Nikolova, Jordan Boyd-Graber, and Christiane Fellbaum.  Collecting Semantic Similarity Ratings to Connect Concepts in Assistive Communication Tools.  Modeling, Learning and Processing of Text Technological Data Structures, 2011. 
http://umiacs.umd.edu/~jbg/docs/2011_book_chapter_evocation.pdf

Sonya S. Nikolova, Jordan Boyd-Graber, Christiane Fellbaum, and Perry Cook.  Better Vocabularies for Assistive Communication Aids: Connecting Terms using Semantic Networks and Untrained Annotators.  ACM Conference on Computers and Accessibility, 2009.   (31% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/evocation-viva.pdf

Xiaojuan Ma, Jordan Boyd-Graber, Sonya S. Nikolova, and Perry Cook.  Speaking Through Pictures: Images vs. Icons.  ACM Conference on Computers and Accessibility, 2009.  (31% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/image_icon.pdf

Jordan Boyd-Graber, Sonya S. Nikolova, Karyn A. Moffatt, Kenrick C. Kin, Joshua Y. Lee, Lester W. Mackey, Marilyn M. Tremaine, and Maria M. Klawe.  Participatory design with proxies: Developing a desktop-PDA system to support people with aphasia.  Computer-Human Interaction, 2006.  (23% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/paper673-boyd-graber.pdf

Bayesian Non-parametrics
-------------------------

Daniel Peterson, Jordan Boyd-Graber, Martha Palmer, and Daisuke Kawahara.  Leveraging VerbNet to build Corpus-Specific Verb Clusters.  Proceedings of the Fifth Joint Conference on Lexical and Computational Semantics, 2016.
http://umiacs.umd.edu/~jbg/https://aclanthology.org/S16-2012/

Viet-An Nguyen, Jordan Boyd-Graber, Philip Resnik, Deborah Cai, Jennifer Midberry, and Yuanxin Wang.  Modeling Topic Control to Detect Influence in Conversations using Nonparametric Topic Models.  Machine Learning, 2014.   
http://umiacs.umd.edu/~jbg/docs/2014_mlj_influencer.pdf

Ke Zhai, Jordan Boyd-Graber, and Shay B. Cohen.  Hybrid Online Inference with Adaptor Grammars.  NIPS Workshop on Advances in Variational Inference, 2014.

Viet-An Nguyen, Jordan Boyd-Graber, Philip Resnik, and Jonathan Chang.  Learning a Concept Hierarchy from Multi-labeled Documents.  Neural Information Processing Systems, 2014.  (25% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2014_nips_l2h.pdf

Ke Zhai, Jordan Boyd-Graber, and Shay B. Cohen.  Online Adaptor Grammars with Hybrid Inference.  Transactions of the Association for Computational Linguistics, 2014. 
http://umiacs.umd.edu/~jbg/docs/2014_tacl_ag_vb_online.pdf

Ke Zhai and Jordan Boyd-Graber.  Online Topic Models with Infinite Vocabulary.  International Conference on Machine Learning, 2013.    (20% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2013_icml_infvoc.pdf

Viet-An Nguyen, Jordan Boyd-Graber, and Stephen Altschul.  Dirichlet Mixtures, the Dirichlet Process, and the Structure of Protein Space.  Journal of Computational Biology, 2013. 
http://umiacs.umd.edu/~jbg/docs/2013_dp_protein.pdf

Yuening Hu, Jordan Boyd-Graber, Hal Daume III, and Z. Irene Ying.  Binary to Bushy: Bayesian Hierarchical Clustering with the Beta Coalescent.  Neural Information Processing Systems, 2013.   (25% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2013_coalescent.pdf

Viet-An Nguyen, Jordan Boyd-Graber, and Philip Resnik.  Lexical and Hierarchical Topic Regression.  Neural Information Processing Systems, 2013.  (25% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2013_shlda.pdf

Viet-An Nguyen, Yuening Hu, Jordan Boyd-Graber, and Philip Resnik.  Argviz: Interactive Visualization of Topic Dynamics in Multi-party Conversations.  North American Association for Computational Linguistics, 2013. (50% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2013_argviz.pdf

Naho Orita, Rebecca McKeown, Naomi H. Feldman, Jeffrey Lidz, and Jordan Boyd-Graber.  Discovering Pronoun Categories using Discourse Information.  Proceedings of the Cognitive Science Society, 2013.
http://umiacs.umd.edu/~jbg/docs/2013_cogsci_pronoun.pdf

Viet-An Nguyen, Jordan Boyd-Graber, and Philip Resnik.  SITS: A Hierarchical Nonparametric Model using Speaker Identity for Topic Segmentation in Multiparty Conversations.  Association for Computational Linguistics, 2012.     (19% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/acl_2012_sits.pdf

Yuening Hu, Ke Zhai, Sinead Williamson, and Jordan Boyd-Graber.  Modeling Images using Transformed Indian Buffet Processes.  International Conference on Machine Learning, 2012.   
 (27% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/mtibp_icml_2012.pdf

Yuening Hu and Jordan Boyd-Graber.  Bayesian Hierarchical Clustering with Beta Coalescents.  Mid-Atlantic Student Colloquium on Speech, Language, and Learning, 2012.

Ke Zhai and Jordan Boyd-Graber.  Online Topic Model with Infinite Vocabulary.  Mid-Atlantic Student Colloquium on Speech, Language, and Learning, 2012.

Viet-An Nguyen, Jordan Boyd-Graber, and Philip Resnik.  &quot;I Want to Talk About, Again, My Record On Energy&nbsp;&hellip;'':  Modeling
  Topic Control in Conversations using Speaker-centric Nonparametric Topic Models.  Mid-Atlantic Student Colloquium on Speech, Language, and Learning, 2012.

Eric Hardisty, Jordan Boyd-Graber, and Philip Resnik.  Modeling Perspective using Adaptor Grammars.  Empirical Methods in Natural Language Processing, 2010. (25% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/adapted_naive_bayes.pdf

Jordan Boyd-Graber and David M. Blei.  Syntactic Topic Models.  Neural Information Processing Systems, 2008.   (25% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/nips2008.pdf

Computational Biology
-------------------------

Viet-An Nguyen, Jordan Boyd-Graber, and Stephen Altschul.  Dirichlet Mixtures, the Dirichlet Process, and the Structure of Protein Space.  Journal of Computational Biology, 2013. 
http://umiacs.umd.edu/~jbg/docs/2013_dp_protein.pdf

Yuening Hu, Jordan Boyd-Graber, Hal Daume III, and Z. Irene Ying.  Binary to Bushy: Bayesian Hierarchical Clustering with the Beta Coalescent.  Neural Information Processing Systems, 2013.   (25% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2013_coalescent.pdf

Data Mining
-------------------------

Weiwei Yang, Jordan Boyd-Graber, and Philip Resnik.  A Multilingual Topic Model for Learning Weighted Topic Links Across Incomparable Corpora.  Empirical Methods in Natural Language Processing, 2019.
http://umiacs.umd.edu/~jbg/docs/2019_emnlp_mtm.pdf

Aaron Gerow, Yuening Hu, Jordan Boyd-Graber, David M. Blei, and James A. Evans.  Measuring Discursive Influence Across Scholarship.  Proceedings of the National Academies of Science, 2018. 

Weiwei Yang, Jordan Boyd-Graber, and Philip Resnik.  Adapting Topic Models using Lexical Associations with Tree Priors.  Empirical Methods in Natural Language Processing, 2017. (18% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2017_emnlp_tree_prior.pdf

Weiwei Yang, Jordan Boyd-Graber, and Philip Resnik.  A Discriminative Topic Model using Document Network Structure.  Association for Computational Linguistics, 2016. 
 (28% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2016_acl_docblock.pdf

Weiwei Yang, Jordan Boyd-Graber, and Philip Resnik.  Birds of a Feather in the Same Nest: A Discriminative Topic Model using Block-based Priors.  Mid-Atlantic Student Colloquium on Speech, Language, and Learning, 2016.

Mohit Iyyer, Anupam Guha, Snigdha Chaturvedi, Jordan Boyd-Graber, and Hal Daume III.  Feuding Families and Former Friends: Unsupervised Learning for Dynamic Fictional Relationships.  North American Association for Computational Linguistics, 2016. Best paper award (2 out of 1592)

 (24% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2016_naacl_relationships.pdf

Weiwei Yang, Jordan Boyd-Graber, and Philip Resnik.  Birds of a Feather Linked Together: A Discriminative Topic Model using Link-based Priors.  Empirical Methods in Natural Language Processing, 2015. (28% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2015_emnlp_hinge_link.pdf

Viet-An Nguyen, Jordan Boyd-Graber, Philip Resnik, and Jonathan Chang.  Learning a Concept Hierarchy from Multi-labeled Documents.  Neural Information Processing Systems, 2014.  (25% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2014_nips_l2h.pdf

Viet-An Nguyen, Jordan Boyd-Graber, Jonathan Chang, and Philip Resnik.  Tree-Based Label Dependency Topic Models.  NIPS Workshop on Topic Models: Computation, Application, and Evaluation, 2013.

Ke Zhai, Jordan Boyd-Graber, Nima Asadi, and Mohamad (Jude) Alkhouja.  Mr. LDA: A Flexible Large Scale Topic Modeling Package using Variational Inference in MapReduce.  ACM International Conference on World Wide Web, 2012.   (12% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2012_www_mrlda.pdf

Yuening Hu, Jordan Boyd-Graber, and Brianna Satinoff.  Interactive Topic Modeling.  Association for Computational Linguistics, 2011.   (25% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/itm.pdf

Jonathan Chang, Jordan Boyd-Graber, and David M. Blei.  Connections between the Lines: Augmenting Social Networks with Text.  Knowledge Discovery and Data Mining, 2009.     (9% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/kdd2009.pdf

Jonathan Chang, Jordan Boyd-Graber, and David M. Blei.  Discovering social networks from free text.  3rd Annual Machine Learning Symposium, 2008.

Deep Learning
-------------------------

Benjamin B&ouml;rschinger, Jordan Boyd-Graber, Christian Buck, Jannis Bulian, Massimiliano Ciaramita, Michelle Chen Huebscher, Wojciech Gajewski, Yannic Kilcher, Rodrigo Nogueira, and Lierni Sestorain Saralegu.  Meta Answering for Machine Reading.  ArXiv, Preprint. 
http://umiacs.umd.edu/~jbg/https://arxiv.org/abs/1911.04156

Pedro Rodriguez, Shi Feng, Mohit Iyyer, He He, and Jordan Boyd-Graber.  Quizbowl: The Case for Incremental Question Answering.  ArXiv, Preprint. 
http://umiacs.umd.edu/~jbg/https://arxiv.org/abs/1904.04792

Ishani Mondal, Shwetha S, Anandhavelu Natarajan, Aparna Garimella, Sambaran Bandyopadhyay, and Jordan Lee Boyd-Graber.  Presentations by the People, for the People: Harnessing LLMs for Generating Persona-Aware Slides from Documents.  European Association for Computational Linguistics, 2024. (21% Acceptance Rate)

Zongxia Li, Andrew Mao, Daniel Kofi Stephens, Pranav Goel, Emily Walpole, Juan Francisco Fung, Alden Dima, and Jordan Lee Boyd-Graber.  TENOR: Topic Enabled Neural Organization and Recommendation: Evaluating Topic Models in Task Based Settings.  European Association for Computational Linguistics, 2024. (21% Acceptance Rate)

Michelle Yuan, Patrick Xia, Chandler May, Benjamin Van Durme, and Jordan Boyd-Graber.  Adapting Coreference Resolution Models through Active Learning.  Association for Computational Linguistics, 2022.  (21% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2022_acl_alcoref.pdf

Yoshinari Fujinuma, Jordan Boyd-Graber, and Katharina Kann.  How Does Multilingual Pretraining Affect Cross-Lingual Transferability?.  Association for Computational Linguistics, 2022.  (21% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2022_acl_multilingbert.pdf

Chenglei Si, Chen Zhao, Sewon Min, and Jordan Boyd-Graber.  Re-Examining Calibration: The Case of Question Answering.  Findings of Empirical Methods in Natural Language Processing, 2022.  

Accessible Abstract: Calibration is an important problem in question answering: if a search engine or virtual assistant doesn't know the answer to a question, you should probably abstain from showing an answer (to save embarassment, as when Google said a horse had six legs).  This EMNLP Findings paper shows that existing metrics to test how good a QA calibration push calibrated confidence toward the average confidence.  We proposed an alternate method both for evaluation and to generate better calibration by looking how models change as they learn.

http://umiacs.umd.edu/~jbg/docs/2022_emnlp_calibration.pdf

Fenfei Guo, Chen Zhang, Zhirui Zhang, Qixin He, Kejun Zhang, Jun Xie, and Jordan Boyd-Graber.  Automatic Song Translation for Tonal Languages.  Findings of the Association for Computational Linguistics, 2022.   (31% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2022_acl_ast.pdf

Chen Zhao, Chenyan Xiong, Jordan Boyd-Graber, and Hal Daume III.  Distantly-Supervised Dense Retrieval Enables Open-Domain Question Answering without Evidence Annotation.  Empirical Methods in Natural Language Processing, 2021.  

Accessible Abstract: Answering questions sometimes requires tying multiple pieces of information together.  Previous datasets have required annotators to explicitly build these reasoning chains (e.g., to answer "where do I know the cop from Die Hard from", you need to figure out that the actor's name is "Reginald VelJohnson" and then find out that he's best known as the dad on Family Matters.).  By exploring search queries that get to the right answer, we're able to answer these questions without expensive annotation.
 (26% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2021_emnlp_weak_dpr.pdf

Chenglei Si, Chen Zhao, and Jordan Boyd-Graber.  What's in a Name? Answer Equivalence For Open-Domain Question Answering.  Empirical Methods in Natural Language Processing, 2021. 

Accessible Abstract: Is Tim Cook the same person as Timothy Donald Cook?  You might think so, but the way we train computers to answer questions would say they aren't.  We show that keeping track of multiple names (and it's really simple) can create better question answering systems.  Simply by adding alternate answers mined from knowledge bases, we can improve accuracy 1-2 points on major QA datasets.
 (18% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2021_emnlp_answer_equiv.pdf

Chen Zhao, Chenyan Xiong, Hal Daume III, and Jordan Boyd-Graber.  Multi-Step Reasoning Over Unstructured Text with Beam Dense Retrieval.  North American Association for Computational Linguistics, 2021. 
Accessible Abstract: For computers to answer complicated questions online, they often need to put together multiple pieces of information (Ronald Reagan was both governor of California and an actor in Bedtime for Bonzo).  However, existing approaches use the links in Wikipedia to combine these clues.  This research helps computers find connected information without using these explicit links.
 (23% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2021_naacl_multi_ance.pdf

Chen Zhao, Chenyan Xiong, Xin Qian, and Jordan Boyd-Graber.  Complex Factoid Question Answering with a Free-Text Knowledge Graph.  ACM International Conference on World Wide Web, 2020. 
 (19.2% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2020_www_delft.pdf

Mozhi Zhang, Yoshinari Fujinuma, Michael J. Paul, and Jordan Boyd-Graber.  Why Overfitting Isn't Always Bad: Retrofitting Cross-Lingual Word Embeddings to Dictionaries.  Association for Computational Linguistics, 2020.   

Accessible Abstract: Computers need to represent words in a computer-readable way. This work talks about how slightly moving these representations for words in different languages to be closer to a small list of translations (like from a dictionary) after doing fancy machine learning works better on downstream tasks (e.g., guessing grammatical category of a word) but hurts on asking the algorithm for translations of unseen words.
 (17.6% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2020_acl_refine.pdf

Mozhi Zhang, Yoshinari Fujinuma, and Jordan Boyd-Graber.  Exploiting Cross-Lingual Subword Similarities in Low-Resource Document Classification.  Association for the Advancement of Artificial Intelligence, 2020. (20.6% Acceptance Rate)
http://umiacs.umd.edu/~jbg/https://arxiv.org/abs/1812.09617

Michelle Yuan, Hsuan-Tien Lin, and Jordan Boyd-Graber.  Cold-start Active Learning through Self-Supervised Language Modeling.  Empirical Methods in Natural Language Processing, 2020.  

Accessible Abstract: Labeling data is a fundamental bottleneck in machine learning, especially for NLP, due to annotation cost and time.  For medical text, obtaining labeled data is challenging because of privacy issues or shortage in expertise.  Thus, active learning can be employed to recognize the most relevant examples and then query labels from an oracle.  However, developing a strategy for selecting examples to label is non-trivial.  Active learning is difficult to use in cold-start; all examples confuse the model because it has not trained on enough data.  Fortunately, modern NLP provides an additional source of information: pre-trained language models.  In our paper, we propose an active learning strategy called ALPS to find sentences that perplex the language model.  We evaluate our approach on sentence classification datasets spanning across different domains.  Results show that ALPS is an efficient active learning strategy that is competitive with state-of-the-art approaches.
 (25% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2020_emnlp_alps.pdf

Michelle Yuan, Mozhi Zhang, Benjamin Van Durme, Leah Findlater, and Jordan Boyd-Graber.  Interactive Refinement of Cross-Lingual Word Embeddings.  Empirical Methods in Natural Language Processing, 2020.    

Accessible Abstract: Language technologies sometimes need to be quickly deployed in low-resource languages. For example, in the 2010 Haiti earthquake, researchers used machine learning models to analyze social media and text messages to gain situational awareness. We introduce CLIME, an interactive system that can help in these scenarios: users see which words related to the task the system thinks are similar, corrects the model to push similar words together and dissimilar words apart.
 (25% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2020_emnlp_clime.pdf

Fenfei Guo, Jordan Boyd-Graber, Mohit Iyyer, and Leah Findlater.  Which Evaluations Uncover Sense Representations that Actually Make Sense?.  Linguistic Resources and Evaluation Conference, 2020.
http://umiacs.umd.edu/~jbg/docs/2020_lrec_sense.pdf

Francesco Saverio Varini, Jordan Boyd-Graber, Massimiliano Ciaramita, and Markus Leippold.  ClimaText: A Dataset for Climate Change Topic Detection.  NeurIPS Workshop on Tackling Climate Change with Machine Learning, 2020.

Yoshinari Fujinuma, Michael Paul, and Jordan Boyd-Graber.  A Resource-Free Evaluation Metric for Cross-Lingual Word Embeddings Based on Graph Modularity.  Association for Computational Linguistics, 2019. (26% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2019_acl_modularity.pdf

Mozhi Zhang, Keyulu Xu, Ken-ichi Kawarabayashi, Stefanie Jegelka, and Jordan Boyd-Graber.  Are Girls Neko or Sh&omacr;jo? Cross-Lingual Alignment of Non-Isomorphic Embeddings with Iterative Normalization.  Association for Computational Linguistics, 2019.  (18.3% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2019_acl_clwe.pdf

Eric Wallace, Shi Feng, and Jordan Boyd-Graber.  Misleading Failures of Partial-input Baselines.  Association for Computational Linguistics, 2019. (18% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2019_acl_flipside.pdf

Ahmed Elgohary Ghoneim, Denis Peskov, and Jordan Boyd-Graber.  Can You Unpack That? Learning to Rewrite Questions-in-Context.  Empirical Methods in Natural Language Processing, 2019. 
http://umiacs.umd.edu/~jbg/docs/2019_emnlp_sequentialqa.pdf

Mozhi Zhang, Yoshinari Fujinuma, and Jordan Boyd-Graber.  Exploiting Cross-Lingual Subword Similarities in Low-Resource Document Classification.  ACL Workshop on Deep Learning Approaches for Low-Resource Natural Language Processing, 2018.

Shi Feng, Eric Wallace, and Jordan Boyd-Graber.  Interpreting Neural Networks with Nearest Neighbors.  EMNLP Workshop on BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, 2018.
http://umiacs.umd.edu/~jbg/http://aclweb.org/anthology/W18-5416

Ahmed Elgohary Ghoneim, Chen Zhao, and Jordan Boyd-Graber.  Dataset and Baselines for Sequential Open-Domain Question Answering.  Empirical Methods in Natural Language Processing, 2018. (23% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2018_emnlp_linked.pdf

Shi Feng, Eric Wallace, Alvin Grissom II, Pedro Rodriguez, Mohit Iyyer, and Jordan Boyd-Graber.  Pathologies of Neural Models Make Interpretation Difficult.  Empirical Methods in Natural Language Processing, 2018. 
 (26% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2018_emnlp_rs.pdf

Mohit Iyyer, Varun Manjunatha, Jordan Boyd-Graber, and Larry Davis.  Learning to Color from Language.  North American Association for Computational Linguistics, 2018. (29% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2018_naacl_colorization.pdf

Mohit Iyyer, Varun Manjunatha, Anupam Guha, Yogarshi Vyas, Jordan Boyd-Graber, Hal Daume III, and Larry Davis.  The Amazing Mysteries of the Gutter: Drawing Inferences Between Panels in Comic Book Narratives.  Computer Vision and Pattern Recognition, 2017.  (30% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2017_cvpr_comics.pdf

Hadi Amiri, Philip Resnik, Jordan Boyd-Graber, and Hal Daume III.  Learning Text Pair Similarity with Context-sensitive Autoencoders.  Association for Computational Linguistics, 2016.  (28% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2016_acl_context_ae.pdf

Mohit Iyyer, Anupam Guha, Snigdha Chaturvedi, Jordan Boyd-Graber, and Hal Daume III.  Feuding Families and Former Friends: Unsupervised Learning for Dynamic Fictional Relationships.  North American Association for Computational Linguistics, 2016. Best paper award (2 out of 1592)

 (24% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2016_naacl_relationships.pdf

Mohit Iyyer, Varun Manjunatha, Jordan Boyd-Graber, and Hal Daume III.  Deep Unordered Composition Rivals Syntactic Methods for Text Classification.  Association for Computational Linguistics, 2015.   
 (25% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2015_acl_dan.pdf

Jordan Boyd-Graber, Mohit Iyyer, He He, and Hal Daume III.  Interactive Incremental Question Answering.  Neural Information Processing Systems, 2015.This won the best demonstration award at NIPS 2015


Mohit Iyyer, Peter Enns, Jordan Boyd-Graber, and Philip Resnik.  Political Ideology Detection Using Recursive Neural Networks.  Association for Computational Linguistics, 2014.  (26% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2014_acl_rnn_ideology.pdf

Mohit Iyyer, Jordan Boyd-Graber, Leonardo Claudino, Richard Socher, and Hal Daume III.  A Neural Network for Factoid Question Answering over Paragraphs.  Empirical Methods in Natural Language Processing, 2014. The partial derivatives of "C" and "J" with respect to the parameters should be switched in Equation 7.
 (26% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2014_emnlp_qb_rnn.pdf

Mohit Iyyer, Jordan Boyd-Graber, and Hal Daume III.  Generating Sentences from Semantic Vector Space Representations.  NIPS Workshop on Learning Semantics, 2014.

Digital Humanities
-------------------------

Mohit Iyyer, Anupam Guha, Snigdha Chaturvedi, Jordan Boyd-Graber, and Hal Daume III.  Feuding Families and Former Friends: Unsupervised Learning for Dynamic Fictional Relationships.  North American Association for Computational Linguistics, 2016. Best paper award (2 out of 1592)

 (24% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2016_naacl_relationships.pdf

Clay Templeton, Travis Brown, Sayan Battacharyya, and Jordan Boyd-Graber.  Mining the Dispatch under Supervision: Using Casualty Counts to Guide Topics from the Richmond Daily Dispatch Corpus.  Chicago Colloquium on Digital Humanities and Computer Science, 2011.
http://umiacs.umd.edu/~jbg/docs/slda_civil_war.pdf

Empirical Human Data Collection
-------------------------

Benjamin B&ouml;rschinger, Jordan Boyd-Graber, Christian Buck, Jannis Bulian, Massimiliano Ciaramita, Michelle Chen Huebscher, Wojciech Gajewski, Yannic Kilcher, Rodrigo Nogueira, and Lierni Sestorain Saralegu.  Meta Answering for Machine Reading.  ArXiv, Preprint. 
http://umiacs.umd.edu/~jbg/https://arxiv.org/abs/1911.04156

Pedro Rodriguez, Shi Feng, Mohit Iyyer, He He, and Jordan Boyd-Graber.  Quizbowl: The Case for Incremental Question Answering.  ArXiv, Preprint. 
http://umiacs.umd.edu/~jbg/https://arxiv.org/abs/1904.04792

Ishani Mondal, Shwetha S, Anandhavelu Natarajan, Aparna Garimella, Sambaran Bandyopadhyay, and Jordan Lee Boyd-Graber.  Presentations by the People, for the People: Harnessing LLMs for Generating Persona-Aware Slides from Documents.  European Association for Computational Linguistics, 2024. (21% Acceptance Rate)

Zongxia Li, Andrew Mao, Daniel Kofi Stephens, Pranav Goel, Emily Walpole, Juan Francisco Fung, Alden Dima, and Jordan Lee Boyd-Graber.  TENOR: Topic Enabled Neural Organization and Recommendation: Evaluating Topic Models in Task Based Settings.  European Association for Computational Linguistics, 2024. (21% Acceptance Rate)

Chenglei Si, Navita Goyal, Tongshuang Wu, Chen Zhao, Shi Feng, Hal Daume III, and Jordan Lee Boyd-Graber.  Large Language Models Help Humans Verify Truthfulness---Except When They Are Convincingly Wrong.  North American Association for Computational Linguistics, 2024.

Srikanth, Neha Pundlik, Rupak Sarkar, Mane, Heran Y., Aparicio, Elizabeth M., Nguyen, Quynh C., Rachel Rudinger, and Jordan Boyd-Graber.  Large Language Models Help Humans Verify Truthfulness---Except When They Are Convincingly Wrong.  North American Association for Computational Linguistics, 2024.

Program Chairs' Report on Peer Review at ACL 2023.  Anna Rogers, Marzena Karpinska, Jordan Boyd-Graber, Naoaki Okazaki.  Association for Computational Linguistics, 2023.
http://umiacs.umd.edu/~jbg/docs/2023_acl_peer_review_report.pdf

Michelle Yuan, Patrick Xia, Chandler May, Benjamin Van Durme, and Jordan Boyd-Graber.  Adapting Coreference Resolution Models through Active Learning.  Association for Computational Linguistics, 2022.  (21% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2022_acl_alcoref.pdf

Shi Feng and Jordan Boyd-Graber.  Learning to Explain Selectively: A Case Study on Question Answering.  Empirical Methods in Natural Language Processing, 2022.  

Accessible Abstract: Many AI methods are a black box: input goes in, predictions come out.  While there are many AI explanation tools that you can add to these predictions, how do you know if they are any good.  In this work presented at EMNLP, if you put a human in front of a AI that's trying to answer questions, our hypothesis is that you can measure how good the underlying explanations are by how much the human's score goes up.  This 2022 EMNLP publication not just measures which combinations of explanations are most effective for an individual.  We use bandit exploration to quickly figure out what set of explanations best help a specific user.

http://umiacs.umd.edu/~jbg/docs/2022_emnlp_augment.pdf

Wanrong He, Andrew Mao, and Jordan Boyd-Graber.  Cheater's Bowl: Human vs. Computer Search Strategies for Open-Domain QA.  Findings of Empirical Methods in Natural Language Processing, 2022.   

Accessible Abstract: When the Covid pandemic it, trivia games moved online.  With it came cheating: people tried to quickly Google answers.  This is bad for sportsmanship, but a good source of training data for helping teach computers how to find answers.  We built an interface to harvest this training data from trivia players, fed these into retrieval-based QA systems, showing that these queries were better than the automatically generated queries used by the current state of the art.

http://umiacs.umd.edu/~jbg/docs/2022_emnlp_cheaters.pdf

Fenfei Guo, Chen Zhang, Zhirui Zhang, Qixin He, Kejun Zhang, Jun Xie, and Jordan Boyd-Graber.  Automatic Song Translation for Tonal Languages.  Findings of the Association for Computational Linguistics, 2022.   (31% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2022_acl_ast.pdf

Pedro Rodriguez, Joe Barrow, Alexander Hoyle, John P. Lalor, Robin Jia, and Jordan Boyd-Graber.  Evaluation Examples Are Not Equally Informative: How Should That Change NLP Leaderboards?.  Association for Computational Linguistics, 2021.    

Accessible Abstract: When can we call an AI "intelligent"?  Just like humans, a common approach is to ask them a bunch of questions.  These questions posed to modern machine learning methods are collected in metrics called leaderboards to monitor progress, but beyond ranking approaches, this does not help us better understand our problems or our systems very well.  This paper introduces probabilistic models inspired by psychometric approaches called item response theory models (think year-end standardized tests) to better understand how computers can answer questions and whether we are asking the right questions.  This allows researchers to better compare what kinds of questions systems can answer, better compare human and machine ability, and discover problematic questions (e.g., questions that have incorrect answer keys, are vague, or "trick" those trying to answer the questions).
 (21% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2021_acl_leaderboard.pdf

Pedro Rodriguez and Jordan Boyd-Graber.  Evaluation Paradigms in Question Answering.  Empirical Methods in Natural Language Processing, 2021. 

Accessible Abstract: Why do we answer questions?  Sometimes it's to provide information, which has been the interpretation of the computer science community.  But sometimes it's to probe or test intelligence.  This paper argues we should think more about that application of question answering and its connection to the foundations of artificial intelligence: The Turing Test.  We thus argue that in addition to the long-standing Cranfield paradigm popularized by information retrieval, this paper proposes an alternative "Manchester paradigm" closer to the Turing test, trivia games, and education.
 (18% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2021_emnlp_paradigms.pdf

Maharshi Gor, Kellie Webster, and Jordan Boyd-Graber.  Toward Deconfounding the Influence of Subject's Demographic Characteristics in Question Answering.  Empirical Methods in Natural Language Processing, 2021. 

Accessible Abstract: The data used to train computer question answering systems have three times as many men as women.  This paper examines whether this is a problem for question answering accuracy. After a thorough investigation, we do not find evidence of serious accuracy discrepancies between languages.  However, an absence of evidence is not evidence of absence, and we would argue that we need more diverse datasets to better represent the world's population.
 (26% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2021_emnlp_qa_fairness.pdf

Denis Peskov, Viktor Hangya, Jordan Boyd-Graber, and Alexander Fraser.  Adapting Entities across Languages and Cultures.  Findings of Empirical Methods in Natural Language Processing, 2021.  

Accessible Abstract: If you ask who Germany's "Christian Drosten" is, a simple answer is that he's their "Anthony Fauci".  We create a system to automatically generate these adaptations, which can help improve cross-cultural understanding and create new training data for tasks like question answering.
 (37% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2021_emnlp_adaptation.pdf

Alexander Hoyle, Pranav Goel, Denis Peskov, Andrew Hian-Cheong, Jordan Boyd-Graber, and Philip Resnik.  Is Automated Topic Model Evaluation Broken?: The Incoherence of Coherence.  Neural Information Processing Systems, 2021.    

Accessible Abstract: Topic models help historians, journalists, and analysts make sense of large text collections.  But how do you know if you have a good one?  The field has settled on using "Automatic Coherence", but this paper argues that maybe that isn't the right choice if you want to actually make real users happy.  This paper builds on our 2009 that showed perplexity was not a good evaluation of interpretability for topic models; while the field adopted automatic topic coherence as a result of that 2009 paper, this paper argues that automatic topic coherence is not a good metric for neural topic models (even though it worked for probabilistic topic models).
 (26% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2021_neurips_incoherence.pdf

Julian Martin Eisenschlos, Bhuwan Dhingra, Jannis Bulian, Benjamin B&ouml;rschinger, and Jordan Boyd-Graber.  Fool Me Twice: Entailment from Wikipedia Gamification.  North American Association for Computational Linguistics, 2021.     

Accessible Abstract: Democracy and the free press depends on being able to recognize when facts online are true or not.  For machine learning to help this critical problem, it needs good data identifying which statements are backed up by trusted sources and which are not.  This research creates a game people can play online to craft difficult claims that can train computers to spot disinformation online.
 (28% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2021_naacl_fm2.pdf

Denis Peskov, Benny Cheng, Ahmed Elgohary Ghoneim, Joe Barrow, Cristian Danescu-Niculescu-Mizil, and Jordan Boyd-Graber.  It Takes Two to Lie: One to Lie and One to Listen.  Association for Computational Linguistics, 2020.   

Accessible Abstract: Machine learning techniques to detect deception in online communications requires training and evaluation data.  However, there is a dearth of data either because of uncertain gold labels or privacy concerns; we create a new, large deception-centered dataset in the online game of Diplomacy.  We gathered 17,289 messages from 12 games (each of which took over a month) involving 84 players, the majority of which were unique users.  This data was collected with a custom-made bot that allowed us to collect messages and annotations.  The user pool was created from scratch: we varied  participant demographics across gender, age, nationality, and past game experience.  Some of our participants included the former president of the Diplomacy players' association, several top ranked players in the world, a board game shop owner, and scientists.  We create machine learning models to detect lies using linguistic, context, and power-dynamic features.  Our best model had similar lie detection accuracy to humans.
 (25.4% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2020_acl_diplomacy.pdf

Michelle Yuan, Hsuan-Tien Lin, and Jordan Boyd-Graber.  Cold-start Active Learning through Self-Supervised Language Modeling.  Empirical Methods in Natural Language Processing, 2020.  

Accessible Abstract: Labeling data is a fundamental bottleneck in machine learning, especially for NLP, due to annotation cost and time.  For medical text, obtaining labeled data is challenging because of privacy issues or shortage in expertise.  Thus, active learning can be employed to recognize the most relevant examples and then query labels from an oracle.  However, developing a strategy for selecting examples to label is non-trivial.  Active learning is difficult to use in cold-start; all examples confuse the model because it has not trained on enough data.  Fortunately, modern NLP provides an additional source of information: pre-trained language models.  In our paper, we propose an active learning strategy called ALPS to find sentences that perplex the language model.  We evaluate our approach on sentence classification datasets spanning across different domains.  Results show that ALPS is an efficient active learning strategy that is competitive with state-of-the-art approaches.
 (25% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2020_emnlp_alps.pdf

Michelle Yuan, Mozhi Zhang, Benjamin Van Durme, Leah Findlater, and Jordan Boyd-Graber.  Interactive Refinement of Cross-Lingual Word Embeddings.  Empirical Methods in Natural Language Processing, 2020.    

Accessible Abstract: Language technologies sometimes need to be quickly deployed in low-resource languages. For example, in the 2010 Haiti earthquake, researchers used machine learning models to analyze social media and text messages to gain situational awareness. We introduce CLIME, an interactive system that can help in these scenarios: users see which words related to the task the system thinks are similar, corrects the model to push similar words together and dissimilar words apart.
 (25% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2020_emnlp_clime.pdf

Fenfei Guo, Jordan Boyd-Graber, Mohit Iyyer, and Leah Findlater.  Which Evaluations Uncover Sense Representations that Actually Make Sense?.  Linguistic Resources and Evaluation Conference, 2020.
http://umiacs.umd.edu/~jbg/docs/2020_lrec_sense.pdf

Diggelmann, Thomas, Boyd-Graber, Jordan, Bulian, Jannis, Ciaramita, Massimiliano, and Leippold, Markus.  CLIMATE-FEVER: A Dataset for Verification of Real-World Climate Claims.  NIPS Workshop on Tackling Climate Change with Machine Learning, 2020. 
http://umiacs.umd.edu/~jbg/https://research.google/pubs/pub50541/

Ahmed Elgohary Ghoneim, Denis Peskov, and Jordan Boyd-Graber.  Can You Unpack That? Learning to Rewrite Questions-in-Context.  Empirical Methods in Natural Language Processing, 2019. 
http://umiacs.umd.edu/~jbg/docs/2019_emnlp_sequentialqa.pdf

Shi Feng and Jordan Boyd-Graber.  What AI can do for me: Evaluating Machine Learning Interpretations in Cooperative Play.  Intelligent User Interfaces, 2019.
 (25% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2019_iui_augment.pdf

Eric Wallace, Pedro Rodriguez, Shi Feng, Ikuya Yamada, and Jordan Boyd-Graber.  Trick Me If You Can: Human-in-the-loop Generation of Adversarial Question Answering Examples.  Transactions of the Association for Computational Linguistics, 2019.   

http://umiacs.umd.edu/~jbg/docs/2019_tacl_trick.pdf

Eric Wallace and Jordan Boyd-Graber.  Trick Me If You Can: Adversarial Writing of Trivia Challenge Questions.  ACL Student Research Workshop, 2018.
http://umiacs.umd.edu/~jbg/http://aclweb.org/anthology/P18-3018

Ahmed Elgohary Ghoneim, Chen Zhao, and Jordan Boyd-Graber.  Dataset and Baselines for Sequential Open-Domain Question Answering.  Empirical Methods in Natural Language Processing, 2018. (23% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2018_emnlp_linked.pdf

Shi Feng, Eric Wallace, Alvin Grissom II, Pedro Rodriguez, Mohit Iyyer, and Jordan Boyd-Graber.  Pathologies of Neural Models Make Interpretation Difficult.  Empirical Methods in Natural Language Processing, 2018. 
 (26% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2018_emnlp_rs.pdf

Paul Felt, Eric Ringger, Kevin Seppi, and Jordan Boyd-Graber.  Learning from Measurements in Crowdsourcing Models: Inferring Ground Truth from Diverse Annotation Types.  International Conference on Computational Linguistics, 2018. (37% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2018_coling_measurements.pdf

Michelle Yuan, Benjamin Van Durme, and Jordan Boyd-Graber.  Multilingual Anchoring: Interactive Topic Modeling and Alignment  Across Languages.  Neural Information Processing Systems, 2018. 
 (21% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2018_neurips_mtanchor.pdf

Mohit Iyyer, Varun Manjunatha, Jordan Boyd-Graber, and Larry Davis.  Learning to Color from Language.  North American Association for Computational Linguistics, 2018. (29% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2018_naacl_colorization.pdf

Shudong Hao, Michael J. Paul, and Jordan Boyd-Graber.  Lessons from the Bible on Modern Topics: Multilingual Topic Model Evaluation on Low-Resource Languages.  North American Association for Computational Linguistics, 2018.
 (35% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2018_naacl_mltm_eval.pdf

Mohit Iyyer, Varun Manjunatha, Anupam Guha, Yogarshi Vyas, Jordan Boyd-Graber, Hal Daume III, and Larry Davis.  The Amazing Mysteries of the Gutter: Drawing Inferences Between Panels in Comic Book Narratives.  Computer Vision and Pattern Recognition, 2017.  (30% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2017_cvpr_comics.pdf

Tak Yeon Lee, Alison Smith, Kevin Seppi, Niklas Elmqvist, Jordan Boyd-Graber, and Leah Findlater.  The Human Touch: How Non-expert Users Perceive, Interpret, and Fix Topic Models.  International Journal of Human-Computer Studies, 2017. 
http://umiacs.umd.edu/~jbg/docs/2017_ijhcs_human_touch.pdf

Alvin Grissom II, Naho Orita, and Jordan Boyd-Graber.  Incremental Prediction of Sentence-final Verbs.  Conference on Computational Natural Language Learning, 2016.
 (20% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2016_conll_verbpred.pdf

He He, Jordan Boyd-Graber, Kevin Kwok, and Hal Daume III.  Opponent Modeling in Deep Reinforcement Learning.  International Conference on Machine Learning, 2016.  (24% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2016_icml_opponent.pdf

Anupam Guha, Mohit Iyyer, and Jordan Boyd-Graber.  A Distorted Skull Lies in the Bottom Center: Identifying Paintings from Text Descriptions.  NAACL Human-Computer Question Answering Workshop, 2016. 
http://umiacs.umd.edu/~jbg/docs/2016_naacl_paintings.pdf

Mohit Iyyer, Anupam Guha, Snigdha Chaturvedi, Jordan Boyd-Graber, and Hal Daume III.  Feuding Families and Former Friends: Unsupervised Learning for Dynamic Fictional Relationships.  North American Association for Computational Linguistics, 2016. Best paper award (2 out of 1592)

 (24% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2016_naacl_relationships.pdf

He He, Jordan Boyd-Graber, and Hal Daume III.  Interpretese vs. Translationese: The Uniqueness of Human Strategies in Simultaneous Interpretation.  North American Association for Computational Linguistics, 2016. 
 (29% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2016_naacl_interpretese.pdf

Vlad Niculae, Srijan Kumar, Jordan Boyd-Graber, and Cristian Danescu-Niculescu-Mizil.  Linguistic Harbingers of Betrayal: A Case Study on an Online Strategy Game.  Association for Computational Linguistics, 2015. 

Accessible Abstract: This paper introduces the application of natural language processing techniques to understand the relationships (and their dissolution) in the game of Diplomacy.  This popular board game simulates Europe at the eve of World War I and forces players to work with each other to forge alliances and make plans together.  However, the game's setup also encourages players to turn against each other.  This paper analyzes whether we can predict these betrayals (we can!) and the linguistic and social phenomena (demands, politeness, and planning) that can predict when a betrayal will happen.
 (25% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2015_acl_diplomacy.pdf

Paul Felt, Eric Ringger, Jordan Boyd-Graber, and Kevin Seppi.  Making the Most of Crowdsourced Document Annotations: Confused Supervised LDA.  Conference on Computational Natural Language Learning, 2015. This paper received the best paper award at CoNLL

 (30% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2015_conll_cslda.pdf

Anupam Guha, Mohit Iyyer, Danny Bouman, and Jordan Boyd-Graber.  Removing the Training Wheels: A Coreference Dataset that Entertains Humans and Challenges Computers.  North American Association for Computational Linguistics, 2015.    
 (26% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2015_naacl_qb_coref.pdf

Jordan Boyd-Graber, David Mimno, and David Newman.  Care and Feeding of Topic Models: Problems, Diagnostics, and Improvements.  Handbook of Mixed Membership Models and Their Applications, 2014.
http://umiacs.umd.edu/~jbg/docs/2014_book_chapter_care_and_feeding.pdf

Jordan Boyd-Graber, Brianna Satinoff, He He, and Hal Daume III.  Besting the Quiz Master: Crowdsourcing Incremental Classification Games.  Empirical Methods in Natural Language Processing, 2012.  
 (25% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/qb_emnlp_2012.pdf

Yuening Hu and Jordan Boyd-Graber.  Suggesting Constraints for Interactive Topic Modeling.  ICML Workshop on Machine Learning in Human Computation and Crowdsourcing, 2012.

Clay Templeton, Kenneth R. Fleischmann, and Jordan Boyd-Graber.  Simulating Audiences: Automating Analysis of Values, Attitudes, and Sentiment.  IEEE International Conference on Social Computing, 2011. (10% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/simulating_audiences.pdf

Sonya S. Nikolova, Jordan Boyd-Graber, and Christiane Fellbaum.  Collecting Semantic Similarity Ratings to Connect Concepts in Assistive Communication Tools.  Modeling, Learning and Processing of Text Technological Data Structures, 2011. 
http://umiacs.umd.edu/~jbg/docs/2011_book_chapter_evocation.pdf

Jordan Boyd-Graber.  Linguistic Resource Creation in a Web 2.0 World.  NSF Workshop on Collaborative Annotation, 2011.
http://umiacs.umd.edu/~jbg/docs/2011_resources.pdf

Brianna Satinoff and Jordan Boyd-Graber.  Trivial Classification: What features do humans use for classification?.  Workshop on Crowdsourcing Technologies for Language and Cognition Studies, 2011.

Clay Templeton, Kenneth R. Fleischmann, and Jordan Boyd-Graber.  Comparing Values and Sentiment Using Mechanical Turk.  iConference, 2011.
http://umiacs.umd.edu/~jbg/docs/iconference-2011-comparing.pdf

Kenneth R. Fleischmann, Clay Templeton, and Jordan Boyd-Graber.  Modeling Diverse Standpoints in Text Classification: Learning to Be Human by Modeling Human Values.  iConference, 2011.
http://umiacs.umd.edu/~jbg/docs/iconference-2011-learning.pdf

Nitin Madnani, Jordan Boyd-Graber, and Philip Resnik.  Measuring Transitivity Using Untrained Annotators.  Creating Speech and Language Data With Amazon's Mechanical Turk, 2010. 
http://umiacs.umd.edu/~jbg/docs/madnani-boyd-graber-turk-workshop.pdf

Jonathan Chang, Jordan Boyd-Graber, Chong Wang, Sean Gerrish, and David M. Blei.  Reading Tea Leaves: How Humans Interpret Topic Models.  Neural Information Processing Systems, 2009.   Jonathan Chang and I shared a NIPS student award honorable mention for this paper (5 out of 1105)


Accessible Abstract: Topic models are a tool that historians and social sciences use to explore large text corpora.  But how do you know if you have a good topic model?  Before this paper, the consensus was to use held-out likelihood to evaluate if you had a good model.  This paper argues that this does not fit how people actually use topic models and proposes new human-centered metrics for evaluating topic models.  This method inspired a rethinking of model evaluation and showed that the complexity of a model does not always correspond to what a user might want.
 (24% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/nips2009-rtl.pdf

Jordan Boyd-Graber, Christiane Fellbaum, Daniel Osherson, and Robert Schapire.  Adding Dense, Weighted, Connections to WordNet.  Proceedings of the Global WordNet Conference, 2006.  
http://umiacs.umd.edu/~jbg/docs/jbg-jeju.pdf

Fact Checking
-------------------------

Chenglei Si, Navita Goyal, Tongshuang Wu, Chen Zhao, Shi Feng, Hal Daume III, and Jordan Lee Boyd-Graber.  Large Language Models Help Humans Verify Truthfulness---Except When They Are Convincingly Wrong.  North American Association for Computational Linguistics, 2024.

Julian Martin Eisenschlos, Bhuwan Dhingra, Jannis Bulian, Benjamin B&ouml;rschinger, and Jordan Boyd-Graber.  Fool Me Twice: Entailment from Wikipedia Gamification.  North American Association for Computational Linguistics, 2021.     

Accessible Abstract: Democracy and the free press depends on being able to recognize when facts online are true or not.  For machine learning to help this critical problem, it needs good data identifying which statements are backed up by trusted sources and which are not.  This research creates a game people can play online to craft difficult claims that can train computers to spot disinformation online.
 (28% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2021_naacl_fm2.pdf

Diggelmann, Thomas, Boyd-Graber, Jordan, Bulian, Jannis, Ciaramita, Massimiliano, and Leippold, Markus.  CLIMATE-FEVER: A Dataset for Verification of Real-World Climate Claims.  NIPS Workshop on Tackling Climate Change with Machine Learning, 2020. 
http://umiacs.umd.edu/~jbg/https://research.google/pubs/pub50541/

Human-Computer Interaction
-------------------------

Ishani Mondal, Shwetha S, Anandhavelu Natarajan, Aparna Garimella, Sambaran Bandyopadhyay, and Jordan Lee Boyd-Graber.  Presentations by the People, for the People: Harnessing LLMs for Generating Persona-Aware Slides from Documents.  European Association for Computational Linguistics, 2024. (21% Acceptance Rate)

Zongxia Li, Andrew Mao, Daniel Kofi Stephens, Pranav Goel, Emily Walpole, Juan Francisco Fung, Alden Dima, and Jordan Lee Boyd-Graber.  TENOR: Topic Enabled Neural Organization and Recommendation: Evaluating Topic Models in Task Based Settings.  European Association for Computational Linguistics, 2024. (21% Acceptance Rate)

Alison Smith, Jordan Boyd-Graber, Ron Fan, Melissa Birchfield, Tongshuang Wu, Dan Weld, and Leah Findlater.  No Explainability without Accountability: An Empirical Study of Explanations and Feedback in Interactive ML.  Computer-Human Interaction, 2020. (24% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2020_chi_explanation.pdf

Michelle Yuan, Hsuan-Tien Lin, and Jordan Boyd-Graber.  Cold-start Active Learning through Self-Supervised Language Modeling.  Empirical Methods in Natural Language Processing, 2020.  

Accessible Abstract: Labeling data is a fundamental bottleneck in machine learning, especially for NLP, due to annotation cost and time.  For medical text, obtaining labeled data is challenging because of privacy issues or shortage in expertise.  Thus, active learning can be employed to recognize the most relevant examples and then query labels from an oracle.  However, developing a strategy for selecting examples to label is non-trivial.  Active learning is difficult to use in cold-start; all examples confuse the model because it has not trained on enough data.  Fortunately, modern NLP provides an additional source of information: pre-trained language models.  In our paper, we propose an active learning strategy called ALPS to find sentences that perplex the language model.  We evaluate our approach on sentence classification datasets spanning across different domains.  Results show that ALPS is an efficient active learning strategy that is competitive with state-of-the-art approaches.
 (25% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2020_emnlp_alps.pdf

Michelle Yuan, Mozhi Zhang, Benjamin Van Durme, Leah Findlater, and Jordan Boyd-Graber.  Interactive Refinement of Cross-Lingual Word Embeddings.  Empirical Methods in Natural Language Processing, 2020.    

Accessible Abstract: Language technologies sometimes need to be quickly deployed in low-resource languages. For example, in the 2010 Haiti earthquake, researchers used machine learning models to analyze social media and text messages to gain situational awareness. We introduce CLIME, an interactive system that can help in these scenarios: users see which words related to the task the system thinks are similar, corrects the model to push similar words together and dissimilar words apart.
 (25% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2020_emnlp_clime.pdf

Alison Smith, Varun Kumar, Jordan Boyd-Graber, Kevin Seppi, and Leah Findlater.  Digging into User Control: Perceptions of Adherence and
Instability in Transparent Models.  Intelligent User Interfaces, 2020. (23% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2020_iui_control.pdf

Varun Kumar, Alison Smith, Leah Findlater, Kevin Seppi, and Jordan Boyd-Graber.  Why Didn't You Listen to Me? Comparing User Control of Human-in-the-Loop Topic Models.  Association for Computational Linguistics, 2019. (18% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2019_acl_control.pdf

Alison Smith, Varun Kumar, Jordan Boyd-Graber, Kevin Seppi, and Leah Findlater.  User-Centered Design and Evaluation of a Human-in-the-Loop Topic Modeling System.  Intelligent User Interfaces, 2018.Alison won a best student paper honorable mention (3 out of 300)
 (23% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2018_iui_itm.pdf

Alison Smith, Varun Kumar, Jordan Boyd-Graber, Kevin Seppi, and Leah Findlater.  Accounting for Input Uncertainty in Human-in-the-Loop Systems.  CHI 2017 Designing for Uncertainty Workshop, 2017.
http://umiacs.umd.edu/~jbg/http://visualization.ischool.uw.edu/hci_uncertainty/papers/Paper11.pdf

Tak Yeon Lee, Alison Smith, Kevin Seppi, Niklas Elmqvist, Jordan Boyd-Graber, and Leah Findlater.  The Human Touch: How Non-expert Users Perceive, Interpret, and Fix Topic Models.  International Journal of Human-Computer Studies, 2017. 
http://umiacs.umd.edu/~jbg/docs/2017_ijhcs_human_touch.pdf

Jordan Boyd-Graber.  Humans and Computers Working Together to Measure Machine Learning Interpretability.  The Bridge, 2017. 

Alison Smith, Tak Yeon Lee, Forough Poursabzi-Sangdeh, Jordan Boyd-Graber, Kevin Seppi, Niklas Elmqvist, and Leah Findlater.  Evaluating Visual Representations for Topic Understanding and Their Effects on Manually Generated Labels.  Transactions of the Association for Computational Linguistics, 2017.  
http://umiacs.umd.edu/~jbg/docs/2017_tacl_eval_tm_viz.pdf

Forough Poursabzi-Sangdeh, Jordan Boyd-Graber, Leah Findlater, and Kevin Seppi.  ALTO: Active Learning with Topic Overviews for Speeding Label Induction and Document Labeling.  Association for Computational Linguistics, 2016.  (28% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2016_acl_doclabel.pdf

Alison Smith, Tak Yeon Lee, Forough Poursabzi-Sangdeh, Jordan Boyd-Graber, Kevin Seppi, Niklas Elmqvist, and Leah Findlater.  Human-Centered and Interactive: Expanding the Impact of Topic Models.  CHI Human Centred Machine Learning Workshop, 2016.


He He, Jordan Boyd-Graber, Kevin Kwok, and Hal Daume III.  Opponent Modeling in Deep Reinforcement Learning.  International Conference on Machine Learning, 2016.  (24% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2016_icml_opponent.pdf

Forough Poursabzi-Sangdeh and Jordan Boyd-Graber.  Speeding Document Annotation with Topic Models.  NAACL Student Research Workshop, 2015.

Alison Smith, Jason Chuang, Yuening Hu, Jordan Boyd-Graber, and Leah Findlater.  Concurrent Visualization of Relationships between Words and Topics in Topic Models.  ACL Workshop on Workshop on Interactive Language Learning, Visualization, and Interfaces, 2014.

Yuening Hu, Jordan Boyd-Graber, Brianna Satinoff, and Alison Smith.  Interactive Topic Modeling.  Machine Learning, 2014.   
http://umiacs.umd.edu/~jbg/docs/2014_mlj_itm.pdf

Jason Chuang, John D. Wilkerson, Rebecca Weiss, Dustin Tingley, Brandon M. Stewart, Margaret E. Roberts, Forough Poursabzi-Sangdeh, Justin Grimmer, Leah Findlater, Jordan Boyd-Graber, and Jeffrey Heer.  Computer-Assisted Content Analysis: Topic Models for Exploring Multiple Subjective Interpretations.  NIPS Workshop on Human-Propelled Machine Learning, 2014.

Viet-An Nguyen, Yuening Hu, Jordan Boyd-Graber, and Philip Resnik.  Argviz: Interactive Visualization of Topic Dynamics in Multi-party Conversations.  North American Association for Computational Linguistics, 2013. (50% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2013_argviz.pdf

Jordan Boyd-Graber, Brianna Satinoff, He He, and Hal Daume III.  Besting the Quiz Master: Crowdsourcing Incremental Classification Games.  Empirical Methods in Natural Language Processing, 2012.  
 (25% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/qb_emnlp_2012.pdf

Yuening Hu and Jordan Boyd-Graber.  Suggesting Constraints for Interactive Topic Modeling.  ICML Workshop on Machine Learning in Human Computation and Crowdsourcing, 2012.

Yuening Hu, Jordan Boyd-Graber, and Brianna Satinoff.  Interactive Topic Modeling.  Association for Computational Linguistics, 2011.   (25% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/itm.pdf

Brianna Satinoff and Jordan Boyd-Graber.  Trivial Classification: What features do humans use for classification?.  Workshop on Crowdsourcing Technologies for Language and Cognition Studies, 2011.

Sonya S. Nikolova, Jordan Boyd-Graber, Christiane Fellbaum, and Perry Cook.  Better Vocabularies for Assistive Communication Aids: Connecting Terms using Semantic Networks and Untrained Annotators.  ACM Conference on Computers and Accessibility, 2009.   (31% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/evocation-viva.pdf

Xiaojuan Ma, Jordan Boyd-Graber, Sonya S. Nikolova, and Perry Cook.  Speaking Through Pictures: Images vs. Icons.  ACM Conference on Computers and Accessibility, 2009.  (31% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/image_icon.pdf

Jordan Boyd-Graber, Sonya S. Nikolova, Karyn A. Moffatt, Kenrick C. Kin, Joshua Y. Lee, Lester W. Mackey, Marilyn M. Tremaine, and Maria M. Klawe.  Participatory design with proxies: Developing a desktop-PDA system to support people with aphasia.  Computer-Human Interaction, 2006.  (23% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/paper673-boyd-graber.pdf

Images
-------------------------

Mohit Iyyer, Varun Manjunatha, Jordan Boyd-Graber, and Larry Davis.  Learning to Color from Language.  North American Association for Computational Linguistics, 2018. (29% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2018_naacl_colorization.pdf

Mohit Iyyer, Varun Manjunatha, Anupam Guha, Yogarshi Vyas, Jordan Boyd-Graber, Hal Daume III, and Larry Davis.  The Amazing Mysteries of the Gutter: Drawing Inferences Between Panels in Comic Book Narratives.  Computer Vision and Pattern Recognition, 2017.  (30% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2017_cvpr_comics.pdf

Anupam Guha, Mohit Iyyer, and Jordan Boyd-Graber.  A Distorted Skull Lies in the Bottom Center: Identifying Paintings from Text Descriptions.  NAACL Human-Computer Question Answering Workshop, 2016. 
http://umiacs.umd.edu/~jbg/docs/2016_naacl_paintings.pdf

Yuening Hu, Ke Zhai, Sinead Williamson, and Jordan Boyd-Graber.  Modeling Images using Transformed Indian Buffet Processes.  International Conference on Machine Learning, 2012.   
 (27% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/mtibp_icml_2012.pdf

Xiaojuan Ma, Jordan Boyd-Graber, Sonya S. Nikolova, and Perry Cook.  Speaking Through Pictures: Images vs. Icons.  ACM Conference on Computers and Accessibility, 2009.  (31% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/image_icon.pdf

Interpretability
-------------------------

Shi Feng and Jordan Boyd-Graber.  Learning to Explain Selectively: A Case Study on Question Answering.  Empirical Methods in Natural Language Processing, 2022.  

Accessible Abstract: Many AI methods are a black box: input goes in, predictions come out.  While there are many AI explanation tools that you can add to these predictions, how do you know if they are any good.  In this work presented at EMNLP, if you put a human in front of a AI that's trying to answer questions, our hypothesis is that you can measure how good the underlying explanations are by how much the human's score goes up.  This 2022 EMNLP publication not just measures which combinations of explanations are most effective for an individual.  We use bandit exploration to quickly figure out what set of explanations best help a specific user.

http://umiacs.umd.edu/~jbg/docs/2022_emnlp_augment.pdf

Wanrong He, Andrew Mao, and Jordan Boyd-Graber.  Cheater's Bowl: Human vs. Computer Search Strategies for Open-Domain QA.  Findings of Empirical Methods in Natural Language Processing, 2022.   

Accessible Abstract: When the Covid pandemic it, trivia games moved online.  With it came cheating: people tried to quickly Google answers.  This is bad for sportsmanship, but a good source of training data for helping teach computers how to find answers.  We built an interface to harvest this training data from trivia players, fed these into retrieval-based QA systems, showing that these queries were better than the automatically generated queries used by the current state of the art.

http://umiacs.umd.edu/~jbg/docs/2022_emnlp_cheaters.pdf

Alexander Hoyle, Pranav Goel, Denis Peskov, Andrew Hian-Cheong, Jordan Boyd-Graber, and Philip Resnik.  Is Automated Topic Model Evaluation Broken?: The Incoherence of Coherence.  Neural Information Processing Systems, 2021.    

Accessible Abstract: Topic models help historians, journalists, and analysts make sense of large text collections.  But how do you know if you have a good one?  The field has settled on using "Automatic Coherence", but this paper argues that maybe that isn't the right choice if you want to actually make real users happy.  This paper builds on our 2009 that showed perplexity was not a good evaluation of interpretability for topic models; while the field adopted automatic topic coherence as a result of that 2009 paper, this paper argues that automatic topic coherence is not a good metric for neural topic models (even though it worked for probabilistic topic models).
 (26% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2021_neurips_incoherence.pdf

Alison Smith, Jordan Boyd-Graber, Ron Fan, Melissa Birchfield, Tongshuang Wu, Dan Weld, and Leah Findlater.  No Explainability without Accountability: An Empirical Study of Explanations and Feedback in Interactive ML.  Computer-Human Interaction, 2020. (24% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2020_chi_explanation.pdf

Alison Smith, Varun Kumar, Jordan Boyd-Graber, Kevin Seppi, and Leah Findlater.  Digging into User Control: Perceptions of Adherence and
Instability in Transparent Models.  Intelligent User Interfaces, 2020. (23% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2020_iui_control.pdf

Fenfei Guo, Jordan Boyd-Graber, Mohit Iyyer, and Leah Findlater.  Which Evaluations Uncover Sense Representations that Actually Make Sense?.  Linguistic Resources and Evaluation Conference, 2020.
http://umiacs.umd.edu/~jbg/docs/2020_lrec_sense.pdf

Eric Wallace, Shi Feng, and Jordan Boyd-Graber.  Misleading Failures of Partial-input Baselines.  Association for Computational Linguistics, 2019. (18% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2019_acl_flipside.pdf

Shi Feng and Jordan Boyd-Graber.  What AI can do for me: Evaluating Machine Learning Interpretations in Cooperative Play.  Intelligent User Interfaces, 2019.
 (25% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2019_iui_augment.pdf

Shi Feng, Eric Wallace, and Jordan Boyd-Graber.  Interpreting Neural Networks with Nearest Neighbors.  EMNLP Workshop on BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, 2018.
http://umiacs.umd.edu/~jbg/http://aclweb.org/anthology/W18-5416

Shi Feng, Eric Wallace, Alvin Grissom II, Pedro Rodriguez, Mohit Iyyer, and Jordan Boyd-Graber.  Pathologies of Neural Models Make Interpretation Difficult.  Empirical Methods in Natural Language Processing, 2018. 
 (26% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2018_emnlp_rs.pdf

Jordan Boyd-Graber.  Humans and Computers Working Together to Measure Machine Learning Interpretability.  The Bridge, 2017. 

Yuening Hu, Jordan Boyd-Graber, Brianna Satinoff, and Alison Smith.  Interactive Topic Modeling.  Machine Learning, 2014.   
http://umiacs.umd.edu/~jbg/docs/2014_mlj_itm.pdf

Jonathan Chang, Jordan Boyd-Graber, Chong Wang, Sean Gerrish, and David M. Blei.  Reading Tea Leaves: How Humans Interpret Topic Models.  Neural Information Processing Systems, 2009.   Jonathan Chang and I shared a NIPS student award honorable mention for this paper (5 out of 1105)


Accessible Abstract: Topic models are a tool that historians and social sciences use to explore large text corpora.  But how do you know if you have a good topic model?  Before this paper, the consensus was to use held-out likelihood to evaluate if you had a good model.  This paper argues that this does not fit how people actually use topic models and proposes new human-centered metrics for evaluating topic models.  This method inspired a rethinking of model evaluation and showed that the complexity of a model does not always correspond to what a user might want.
 (24% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/nips2009-rtl.pdf

Large Language Models (or, more correctly, <A HREF="https://www.youtube.com/watch?v=u0DgoRVLTE8">Muppet Models</A>)
-------------------------

Ishani Mondal, Shwetha S, Anandhavelu Natarajan, Aparna Garimella, Sambaran Bandyopadhyay, and Jordan Lee Boyd-Graber.  Presentations by the People, for the People: Harnessing LLMs for Generating Persona-Aware Slides from Documents.  European Association for Computational Linguistics, 2024. (21% Acceptance Rate)

Chenglei Si, Navita Goyal, Tongshuang Wu, Chen Zhao, Shi Feng, Hal Daume III, and Jordan Lee Boyd-Graber.  Large Language Models Help Humans Verify Truthfulness---Except When They Are Convincingly Wrong.  North American Association for Computational Linguistics, 2024.

Srikanth, Neha Pundlik, Rupak Sarkar, Mane, Heran Y., Aparicio, Elizabeth M., Nguyen, Quynh C., Rachel Rudinger, and Jordan Boyd-Graber.  Large Language Models Help Humans Verify Truthfulness---Except When They Are Convincingly Wrong.  North American Association for Computational Linguistics, 2024.

Chenglei Si, Weijia Shi, Chen Zhao, Luke Zettlemoyer, and Jordan Lee Boyd-Graber.  Getting MoRE out of Mixture of Language Model Reasoning Experts.  Findings of Empirical Methods in Natural Language Processing, 2023. 

Accessible Abstract: There are many ways for a computer to answer a question: a general knowledge question, a common sense question, or a math question.  Each of these types of questions can be answered by a particular kind of expert.  This paper investigates if we can automatically detect what kind of expert is best suited to answer a question and route the question to the correct expert.
 (45% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2023_findings_more.pdf

Chenglei Si, Zhe Gan, Zhengyuan Yang, Shuohang Wang, Jianfeng Wang, Jordan Boyd-Graber, and Lijuan Wang.  Prompting GPT-3 To Be Reliable.  International Conference on Learning Representations, 2023. 
http://umiacs.umd.edu/~jbg/docs/2023_iclr_reliable.pdf

Michelle Yuan, Hsuan-Tien Lin, and Jordan Boyd-Graber.  Cold-start Active Learning through Self-Supervised Language Modeling.  Empirical Methods in Natural Language Processing, 2020.  

Accessible Abstract: Labeling data is a fundamental bottleneck in machine learning, especially for NLP, due to annotation cost and time.  For medical text, obtaining labeled data is challenging because of privacy issues or shortage in expertise.  Thus, active learning can be employed to recognize the most relevant examples and then query labels from an oracle.  However, developing a strategy for selecting examples to label is non-trivial.  Active learning is difficult to use in cold-start; all examples confuse the model because it has not trained on enough data.  Fortunately, modern NLP provides an additional source of information: pre-trained language models.  In our paper, we propose an active learning strategy called ALPS to find sentences that perplex the language model.  We evaluate our approach on sentence classification datasets spanning across different domains.  Results show that ALPS is an efficient active learning strategy that is competitive with state-of-the-art approaches.
 (25% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2020_emnlp_alps.pdf

Lexical Semantics
-------------------------

Michelle Yuan, Mozhi Zhang, Benjamin Van Durme, Leah Findlater, and Jordan Boyd-Graber.  Interactive Refinement of Cross-Lingual Word Embeddings.  Empirical Methods in Natural Language Processing, 2020.    

Accessible Abstract: Language technologies sometimes need to be quickly deployed in low-resource languages. For example, in the 2010 Haiti earthquake, researchers used machine learning models to analyze social media and text messages to gain situational awareness. We introduce CLIME, an interactive system that can help in these scenarios: users see which words related to the task the system thinks are similar, corrects the model to push similar words together and dissimilar words apart.
 (25% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2020_emnlp_clime.pdf

Fenfei Guo, Jordan Boyd-Graber, Mohit Iyyer, and Leah Findlater.  Which Evaluations Uncover Sense Representations that Actually Make Sense?.  Linguistic Resources and Evaluation Conference, 2020.
http://umiacs.umd.edu/~jbg/docs/2020_lrec_sense.pdf

Daniel Peterson, Jordan Boyd-Graber, Martha Palmer, and Daisuke Kawahara.  Leveraging VerbNet to build Corpus-Specific Verb Clusters.  Proceedings of the Fifth Joint Conference on Lexical and Computational Semantics, 2016.
http://umiacs.umd.edu/~jbg/https://aclanthology.org/S16-2012/

Sonya S. Nikolova, Jordan Boyd-Graber, and Christiane Fellbaum.  Collecting Semantic Similarity Ratings to Connect Concepts in Assistive Communication Tools.  Modeling, Learning and Processing of Text Technological Data Structures, 2011. 
http://umiacs.umd.edu/~jbg/docs/2011_book_chapter_evocation.pdf

Jordan Boyd-Graber.  Linguistic Resource Creation in a Web 2.0 World.  NSF Workshop on Collaborative Annotation, 2011.
http://umiacs.umd.edu/~jbg/docs/2011_resources.pdf

Jordan Boyd-Graber and David M. Blei.  PUTOP: Turning Predominant Senses into a Topic Model for WSD.  4th International Workshop on Semantic Evaluations, 2007.
http://umiacs.umd.edu/~jbg/docs/jbg-SEMEVAL07.pdf

Jordan Boyd-Graber, David M. Blei, and Xiaojin Zhu.  A Topic Model for Word Sense Disambiguation.  Empirical Methods in Natural Language Processing, 2007.   (27% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/jbg-EMNLP07.pdf

Jordan Boyd-Graber, Christiane Fellbaum, Daniel Osherson, and Robert Schapire.  Adding Dense, Weighted, Connections to WordNet.  Proceedings of the Global WordNet Conference, 2006.  
http://umiacs.umd.edu/~jbg/docs/jbg-jeju.pdf

MCMC Inference
-------------------------

Varun Kumar, Alison Smith, Leah Findlater, Kevin Seppi, and Jordan Boyd-Graber.  Why Didn't You Listen to Me? Comparing User Control of Human-in-the-Loop Topic Models.  Association for Computational Linguistics, 2019. (18% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2019_acl_control.pdf

Weiwei Yang, Jordan Boyd-Graber, and Philip Resnik.  A Multilingual Topic Model for Learning Weighted Topic Links Across Incomparable Corpora.  Empirical Methods in Natural Language Processing, 2019.
http://umiacs.umd.edu/~jbg/docs/2019_emnlp_mtm.pdf

Weiwei Yang, Jordan Boyd-Graber, and Philip Resnik.  Adapting Topic Models using Lexical Associations with Tree Priors.  Empirical Methods in Natural Language Processing, 2017. (18% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2017_emnlp_tree_prior.pdf

Weiwei Yang, Jordan Boyd-Graber, and Philip Resnik.  A Discriminative Topic Model using Document Network Structure.  Association for Computational Linguistics, 2016. 
 (28% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2016_acl_docblock.pdf

Weiwei Yang, Jordan Boyd-Graber, and Philip Resnik.  Birds of a Feather in the Same Nest: A Discriminative Topic Model using Block-based Priors.  Mid-Atlantic Student Colloquium on Speech, Language, and Learning, 2016.

Md Arafat Sultan, Jordan Boyd-Graber, and Tamara Sumner.  Bayesian Supervised Domain Adaptation for Short Text Similarity.  North American Association for Computational Linguistics, 2016. 
 (24% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2016_naacl_sts.pdf

Viet-An Nguyen, Jordan Boyd-Graber, Philip Resnik, and Kristina Miler.  Tea Party in the House: A Hierarchical Ideal Point Topic Model and Its Application to Republican Legislators in the 112th Congress.  Association for Computational Linguistics, 2015.   

Accessible Abstract: In the mid 2010s, the Republican party in the United States diverged: mainstream conservatives split from the so-called "tea party" caucus.  However, the primary statistical tool for analyzing political factions in legislative bodies (ideal point models) fail to account for these changes.  This is because the schism is not fully reflected in voting patterns but rather in how politicians present themselves: thus we need to extend these models to capture not just how politicians vote but also how they frame particular issues.  This paper proposes a new model to capture framing differences within a voting block to start explaining the new subcoalitions of the republican caucus.
 (25% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2015_acl_teaparty.pdf

Paul Felt, Eric Ringger, Jordan Boyd-Graber, and Kevin Seppi.  Making the Most of Crowdsourced Document Annotations: Confused Supervised LDA.  Conference on Computational Natural Language Learning, 2015. This paper received the best paper award at CoNLL

 (30% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2015_conll_cslda.pdf

Weiwei Yang, Jordan Boyd-Graber, and Philip Resnik.  Birds of a Feather Linked Together: A Discriminative Topic Model using Link-based Priors.  Empirical Methods in Natural Language Processing, 2015. (28% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2015_emnlp_hinge_link.pdf

Yi Yang, Doug Downey, and Jordan Boyd-Graber.  Efficient Methods for Incorporating Knowledge into Topic Models.  Empirical Methods in Natural Language Processing, 2015.  (24% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2015_emnlp_fast_priors.pdf

Yuening Hu, Ke Zhai, Vlad Eidelman, and Jordan Boyd-Graber.  Polylingual Tree-Based Topic Models for Translation Domain Adaptation.  Association for Computational Linguistics, 2014.  (26% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2014_acl_ptlda_mt.pdf

Viet-An Nguyen, Jordan Boyd-Graber, Philip Resnik, and Jonathan Chang.  Learning a Concept Hierarchy from Multi-labeled Documents.  Neural Information Processing Systems, 2014.  (25% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2014_nips_l2h.pdf

Viet-An Nguyen, Jordan Boyd-Graber, Jonathan Chang, and Philip Resnik.  Tree-Based Label Dependency Topic Models.  NIPS Workshop on Topic Models: Computation, Application, and Evaluation, 2013.

Yuening Hu, Jordan Boyd-Graber, Hal Daume III, and Z. Irene Ying.  Binary to Bushy: Bayesian Hierarchical Clustering with the Beta Coalescent.  Neural Information Processing Systems, 2013.   (25% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2013_coalescent.pdf

Yuening Hu and Jordan Boyd-Graber.  Efficient Tree-Based Topic Modeling.  Association for Computational Linguistics, 2012. (21% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/acl_2012_fttm.pdf

Yuening Hu, Ke Zhai, Sinead Williamson, and Jordan Boyd-Graber.  Modeling Images using Transformed Indian Buffet Processes.  International Conference on Machine Learning, 2012.   
 (27% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/mtibp_icml_2012.pdf

Yuening Hu and Jordan Boyd-Graber.  Bayesian Hierarchical Clustering with Beta Coalescents.  Mid-Atlantic Student Colloquium on Speech, Language, and Learning, 2012.

Machine Translation
-------------------------

HyoJung Han, Marine Carpuat, and Jordan Boyd-Graber.  Automatic Explicitation to Bridge the Background Knowledge Gap in Translation and its Evaluation with Multilingual QA.  Empirical Methods in Natural Language Processing, 2023. 

Accessible Abstract: Sometimes when you a translating from one language to another, a literal translation is not enough.  Sometimes to actually understand what is being said, you need additional context.  Professional translators know this, and the process that they use to help a listener is called "explicitation" to capturing cultural differences between source and target audiences. We introduce techniques for automatically generating explicitations, motivated by WikiExpl(a dataset collected from Wikipedia and annotate with human translators), and evaluate the explicitation.
 (23% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2023_emnlp_explicitation.pdf

Sander V Schulhoff, Jeremy Pinto, Anaum Khan, Louis-François Bouchard, Chenglei Si, Jordan Lee Boyd-Graber, Svetlina Anati, Valen Tagliabue, Anson Liu Kost, and Christopher R Carnahan.  Ignore This Title and HackAPrompt: Exposing Systemic Vulnerabilities of LLMs Through a Global Prompt Hacking Competition.  Empirical Methods in Natural Language Processing, 2023.   This paper was selected as the Best Theme Paper at EMNLP 2023 (1 of 4909)


Accessible Abstract: As more AI services online are provided by prompted language models, we need to be aware of the weaknesses and exploits of the models.  We present the HackAPrompt competition to help elicit a broad array of exploits that get around large langauge models.
 (23% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2023_emnlp_hackaprompt.pdf

Yoo Yeon Sung, Naeemul Hassan, and Jordan Boyd-Graber.  Not all Fake News is Written: A Dataset and Analysis of Misleading Video Headlines.  Empirical Methods in Natural Language Processing, 2023. 

Accessible Abstract: Misinformation online is not all text-based.  More information is being consumed in video form, and both social media companies and external monitors need to know when misleading videos are being shared online.  We create a new dataset of misleading videos and describe what makes the problem so challenging.
 (23% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2023_emnlp_videoheadline.pdf

HyoJung Han, Marine Carpuat, and Jordan Boyd-Graber.  SimQA: Detecting Simultaneous MT Errors through Word-by-Word Question Answering.  Empirical Methods in Natural Language Processing, 2022.  

Accessible Abstract: Simultaneous interpretation (where a translation happens word by word before the source sentence is finished) is difficult to evaluate.  We created a new evaluation framework based on the following scenario: imagine that you're thrown into a trivia gameshow where you don't know the language.  Specifically, it's a game format where you interrupt the question word by word as soon as possible.  Our hypothesis is that a monolingual player (who doesn't speak the source language) will be able to do better in the game with a better simultaneous translation system.  In this 2022 EMNLP publication, we show that this evaluation is not only cheaper (you just need to translate the answer) but can also detect hallucinations and undertranslations better than existing evaluation methods.

http://umiacs.umd.edu/~jbg/docs/2022_emnlp_simqa.pdf

Peter Jansen and Jordan Boyd-Graber.  Picard understanding Darmok: A Dataset and Model for Metaphor-Rich Translation in a Constructed Language.  Figurative Language Workshop 2022 @EMNLP, 2022.  

http://umiacs.umd.edu/~jbg/https://arxiv.org/abs/2107.08146

Denis Peskov, Viktor Hangya, Jordan Boyd-Graber, and Alexander Fraser.  Adapting Entities across Languages and Cultures.  Findings of Empirical Methods in Natural Language Processing, 2021.  

Accessible Abstract: If you ask who Germany's "Christian Drosten" is, a simple answer is that he's their "Anthony Fauci".  We create a system to automatically generate these adaptations, which can help improve cross-cultural understanding and create new training data for tasks like question answering.
 (37% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2021_emnlp_adaptation.pdf

Wenyan Li, Alvin Grissom II, and Jordan Boyd-Graber.  An Attentive Recurrent Model for Incremental Prediction of Sentence-final Verbs.  Findings of EMNLP, 2020.
http://umiacs.umd.edu/~jbg/docs/2020_findings_verbs.pdf

Tianze Shi, Chen Zhao, Jordan Boyd-Graber, Hal Daume III, and Lillian Lee.  On the Potential of Lexico-logical Alignments for Semantic Parsing to SQL Queries.  Findings of EMNLP, 2020.  
http://umiacs.umd.edu/~jbg/docs/2020_findings_qalign.pdf

Craig Stewart, Nikolai Vogler, Junjie Hu, Jordan Boyd-Graber, and Graham Neubig.  Automatic Estimation of Simultaneous Interpreter Performance.  Association for Computational Linguistics, 2018. (24% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2018_acl_interpeval.pdf

Khanh Nguyen, Jordan Boyd-Graber, and Hal Daume III.  Reinforcement Learning for Bandit Neural Machine Translation with Simulated Human Feedback.  Empirical Methods in Natural Language Processing, 2017.  (26% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2017_emnlp_bandit_mt.pdf

Alvin Grissom II, Naho Orita, and Jordan Boyd-Graber.  Incremental Prediction of Sentence-final Verbs.  Conference on Computational Natural Language Learning, 2016.
 (20% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2016_conll_verbpred.pdf

He He, Jordan Boyd-Graber, and Hal Daume III.  Interpretese vs. Translationese: The Uniqueness of Human Strategies in Simultaneous Interpretation.  North American Association for Computational Linguistics, 2016. 
 (29% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2016_naacl_interpretese.pdf

He He, Alvin Grissom II, Jordan Boyd-Graber, and Hal Daume III.  Syntax-based Rewriting for Simultaneous Machine Translation.  Empirical Methods in Natural Language Processing, 2015.
 (24% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2015_emnlp_rewrite.pdf

Yuening Hu, Ke Zhai, Vlad Eidelman, and Jordan Boyd-Graber.  Polylingual Tree-Based Topic Models for Translation Domain Adaptation.  Association for Computational Linguistics, 2014.  (26% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2014_acl_ptlda_mt.pdf

Alvin Grissom II, He He, Jordan Boyd-Graber, John Morgan, and Hal Daume III.  Don't Until the Final Verb Wait: Reinforcement Learning for Simultaneous Machine Translation.  Empirical Methods in Natural Language Processing, 2014. 
 (30% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2014_emnlp_simtrans.pdf

Yuening Hu, Ke Zhai, Vlad Edelman, and Jordan Boyd-Graber.  Topic Models for Translation Domain Adaptation.  NIPS Workshop on Topic Models: Computation, Application, and Evaluation, 2013.

Vladimir Eidelman, Jordan Boyd-Graber, and Philip Resnik.  Topic Models for Dynamic Translation Model Adaptation.  Association for Computational Linguistics, 2012.  For a more thorough evaluation and an exploration of more advanced topic models for machine translation, see: Yuening Hu, Ke Zhai, Vlad Eidelman, and Jordan Boyd-Graber. Polylingual Tree-Based Topic Models for Translation Domain Adaptation. Association for Computational Linguistics, 2014.

 (21% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/acl_2012_tm_for_mt.pdf

Multilingual Corpora
-------------------------

Yoshinari Fujinuma, Jordan Boyd-Graber, and Katharina Kann.  How Does Multilingual Pretraining Affect Cross-Lingual Transferability?.  Association for Computational Linguistics, 2022.  (21% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2022_acl_multilingbert.pdf

Mozhi Zhang, Yoshinari Fujinuma, Michael J. Paul, and Jordan Boyd-Graber.  Why Overfitting Isn't Always Bad: Retrofitting Cross-Lingual Word Embeddings to Dictionaries.  Association for Computational Linguistics, 2020.   

Accessible Abstract: Computers need to represent words in a computer-readable way. This work talks about how slightly moving these representations for words in different languages to be closer to a small list of translations (like from a dictionary) after doing fancy machine learning works better on downstream tasks (e.g., guessing grammatical category of a word) but hurts on asking the algorithm for translations of unseen words.
 (17.6% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2020_acl_refine.pdf

Mozhi Zhang, Yoshinari Fujinuma, and Jordan Boyd-Graber.  Exploiting Cross-Lingual Subword Similarities in Low-Resource Document Classification.  Association for the Advancement of Artificial Intelligence, 2020. (20.6% Acceptance Rate)
http://umiacs.umd.edu/~jbg/https://arxiv.org/abs/1812.09617

Yoshinari Fujinuma, Michael Paul, and Jordan Boyd-Graber.  A Resource-Free Evaluation Metric for Cross-Lingual Word Embeddings Based on Graph Modularity.  Association for Computational Linguistics, 2019. (26% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2019_acl_modularity.pdf

Mozhi Zhang, Keyulu Xu, Ken-ichi Kawarabayashi, Stefanie Jegelka, and Jordan Boyd-Graber.  Are Girls Neko or Sh&omacr;jo? Cross-Lingual Alignment of Non-Isomorphic Embeddings with Iterative Normalization.  Association for Computational Linguistics, 2019.  (18.3% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2019_acl_clwe.pdf

Dasha Pruss, Yoshinari Fujinuma, Ashlynn Daughton, Michael Paul, Brad Arnot, Danielle Szafir, and Jordan Boyd-Graber.  Zika discourse in the Americas: A multilingual topic analysis of Twitter.  PlosOne, 2019. 
http://umiacs.umd.edu/~jbg/https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0216922

Mozhi Zhang, Yoshinari Fujinuma, and Jordan Boyd-Graber.  Exploiting Cross-Lingual Subword Similarities in Low-Resource Document Classification.  ACL Workshop on Deep Learning Approaches for Low-Resource Natural Language Processing, 2018.

Shudong Hao, Michael J. Paul, and Jordan Boyd-Graber.  Lessons from the Bible on Modern Topics: Multilingual Topic Model Evaluation on Low-Resource Languages.  North American Association for Computational Linguistics, 2018.
 (35% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2018_naacl_mltm_eval.pdf

Yuening Hu, Ke Zhai, Vlad Eidelman, and Jordan Boyd-Graber.  Polylingual Tree-Based Topic Models for Translation Domain Adaptation.  Association for Computational Linguistics, 2014.  (26% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2014_acl_ptlda_mt.pdf

Jordan Boyd-Graber and Philip Resnik.  Holistic Sentiment Analysis Across Languages: Multilingual Supervised Latent Dirichlet Allocation.  Empirical Methods in Natural Language Processing, 2010.  (25% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/jbg-mlslda-2010.pdf

Jordan Boyd-Graber and David M. Blei.  Multilingual Topic Models for Unaligned Text.  Uncertainty in Artificial Intelligence, 2009. For coverage of current state-of-the-art in cross-lingual topic models see: Yuening Hu, Ke Zhai, Vlad Eidelman, and Jordan Boyd-Graber. Polylingual Tree-Based Topic Models for Translation Domain Adaptation. Association for Computational Linguistics, 2014.
 (31% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/uai2009.pdf

Jordan Boyd-Graber and David M. Blei.  Multilingual Topic Models.  NIPS Workshop on Unsupervised Latent Variable Models, 2008.

Question Answering
-------------------------

Benjamin B&ouml;rschinger, Jordan Boyd-Graber, Christian Buck, Jannis Bulian, Massimiliano Ciaramita, Michelle Chen Huebscher, Wojciech Gajewski, Yannic Kilcher, Rodrigo Nogueira, and Lierni Sestorain Saralegu.  Meta Answering for Machine Reading.  ArXiv, Preprint. 
http://umiacs.umd.edu/~jbg/https://arxiv.org/abs/1911.04156

Pedro Rodriguez, Shi Feng, Mohit Iyyer, He He, and Jordan Boyd-Graber.  Quizbowl: The Case for Incremental Question Answering.  ArXiv, Preprint. 
http://umiacs.umd.edu/~jbg/https://arxiv.org/abs/1904.04792

Quynh C. Nguyen,  Elizabeth M. Aparicio,  Michelle Jasczynski,  Amara Channell Doig,  Xiaohe Yue,  Heran Mane,  Neha Pundlik Srikanth, Francia Ximena Marin Gutierrez, Nataly Delcid,  Xin He, and Jordan Boyd-Graber.  Randomized Pilot of Rosie, a Health Education Question-and-Answer Chatbot for New Mothers.  Journal of Medical Internet Research: Journal of Formative Research, 2024.

Srikanth, Neha Pundlik, Rupak Sarkar, Mane, Heran Y., Aparicio, Elizabeth M., Nguyen, Quynh C., Rachel Rudinger, and Jordan Boyd-Graber.  Large Language Models Help Humans Verify Truthfulness---Except When They Are Convincingly Wrong.  North American Association for Computational Linguistics, 2024.

HyoJung Han, Marine Carpuat, and Jordan Boyd-Graber.  Automatic Explicitation to Bridge the Background Knowledge Gap in Translation and its Evaluation with Multilingual QA.  Empirical Methods in Natural Language Processing, 2023. 

Accessible Abstract: Sometimes when you a translating from one language to another, a literal translation is not enough.  Sometimes to actually understand what is being said, you need additional context.  Professional translators know this, and the process that they use to help a listener is called "explicitation" to capturing cultural differences between source and target audiences. We introduce techniques for automatically generating explicitations, motivated by WikiExpl(a dataset collected from Wikipedia and annotate with human translators), and evaluate the explicitation.
 (23% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2023_emnlp_explicitation.pdf

Sander V Schulhoff, Jeremy Pinto, Anaum Khan, Louis-François Bouchard, Chenglei Si, Jordan Lee Boyd-Graber, Svetlina Anati, Valen Tagliabue, Anson Liu Kost, and Christopher R Carnahan.  Ignore This Title and HackAPrompt: Exposing Systemic Vulnerabilities of LLMs Through a Global Prompt Hacking Competition.  Empirical Methods in Natural Language Processing, 2023.   This paper was selected as the Best Theme Paper at EMNLP 2023 (1 of 4909)


Accessible Abstract: As more AI services online are provided by prompted language models, we need to be aware of the weaknesses and exploits of the models.  We present the HackAPrompt competition to help elicit a broad array of exploits that get around large langauge models.
 (23% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2023_emnlp_hackaprompt.pdf

Yoo Yeon Sung, Naeemul Hassan, and Jordan Boyd-Graber.  Not all Fake News is Written: A Dataset and Analysis of Misleading Video Headlines.  Empirical Methods in Natural Language Processing, 2023. 

Accessible Abstract: Misinformation online is not all text-based.  More information is being consumed in video form, and both social media companies and external monitors need to know when misleading videos are being shared online.  We create a new dataset of misleading videos and describe what makes the problem so challenging.
 (23% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2023_emnlp_videoheadline.pdf

Mane, Heran Y., Channell Doig, Amara, Marin Gutierrez, Francia Ximena, Jasczynski, Michelle, Yue, Xiaohe, Srikanth, Neha Pundlik, Mane, Sourabh, Sun, Abby, Moats, Rachel Ann, Patel, Pragat, He, Xin, Boyd-Graber, Jordan Lee, Aparicio, Elizabeth M., and Nguyen, Quynh C..  Practical Guidance for the Development of Rosie, a Health Education Question-and-Answer Chatbot for New Mothers.  Journal of Public Health Management and Practice, 2023.
http://umiacs.umd.edu/~jbg/https://journals.lww.com/jphmp/fulltext/2023/09000/practical_guidance_for_the_development_of_rosie,_a.9.aspx

Shi Feng and Jordan Boyd-Graber.  Learning to Explain Selectively: A Case Study on Question Answering.  Empirical Methods in Natural Language Processing, 2022.  

Accessible Abstract: Many AI methods are a black box: input goes in, predictions come out.  While there are many AI explanation tools that you can add to these predictions, how do you know if they are any good.  In this work presented at EMNLP, if you put a human in front of a AI that's trying to answer questions, our hypothesis is that you can measure how good the underlying explanations are by how much the human's score goes up.  This 2022 EMNLP publication not just measures which combinations of explanations are most effective for an individual.  We use bandit exploration to quickly figure out what set of explanations best help a specific user.

http://umiacs.umd.edu/~jbg/docs/2022_emnlp_augment.pdf

HyoJung Han, Marine Carpuat, and Jordan Boyd-Graber.  SimQA: Detecting Simultaneous MT Errors through Word-by-Word Question Answering.  Empirical Methods in Natural Language Processing, 2022.  

Accessible Abstract: Simultaneous interpretation (where a translation happens word by word before the source sentence is finished) is difficult to evaluate.  We created a new evaluation framework based on the following scenario: imagine that you're thrown into a trivia gameshow where you don't know the language.  Specifically, it's a game format where you interrupt the question word by word as soon as possible.  Our hypothesis is that a monolingual player (who doesn't speak the source language) will be able to do better in the game with a better simultaneous translation system.  In this 2022 EMNLP publication, we show that this evaluation is not only cheaper (you just need to translate the answer) but can also detect hallucinations and undertranslations better than existing evaluation methods.

http://umiacs.umd.edu/~jbg/docs/2022_emnlp_simqa.pdf

Wanrong He, Andrew Mao, and Jordan Boyd-Graber.  Cheater's Bowl: Human vs. Computer Search Strategies for Open-Domain QA.  Findings of Empirical Methods in Natural Language Processing, 2022.   

Accessible Abstract: When the Covid pandemic it, trivia games moved online.  With it came cheating: people tried to quickly Google answers.  This is bad for sportsmanship, but a good source of training data for helping teach computers how to find answers.  We built an interface to harvest this training data from trivia players, fed these into retrieval-based QA systems, showing that these queries were better than the automatically generated queries used by the current state of the art.

http://umiacs.umd.edu/~jbg/docs/2022_emnlp_cheaters.pdf

Chenglei Si, Chen Zhao, Sewon Min, and Jordan Boyd-Graber.  Re-Examining Calibration: The Case of Question Answering.  Findings of Empirical Methods in Natural Language Processing, 2022.  

Accessible Abstract: Calibration is an important problem in question answering: if a search engine or virtual assistant doesn't know the answer to a question, you should probably abstain from showing an answer (to save embarassment, as when Google said a horse had six legs).  This EMNLP Findings paper shows that existing metrics to test how good a QA calibration push calibrated confidence toward the average confidence.  We proposed an alternate method both for evaluation and to generate better calibration by looking how models change as they learn.

http://umiacs.umd.edu/~jbg/docs/2022_emnlp_calibration.pdf

Pedro Rodriguez, Joe Barrow, Alexander Hoyle, John P. Lalor, Robin Jia, and Jordan Boyd-Graber.  Evaluation Examples Are Not Equally Informative: How Should That Change NLP Leaderboards?.  Association for Computational Linguistics, 2021.    

Accessible Abstract: When can we call an AI "intelligent"?  Just like humans, a common approach is to ask them a bunch of questions.  These questions posed to modern machine learning methods are collected in metrics called leaderboards to monitor progress, but beyond ranking approaches, this does not help us better understand our problems or our systems very well.  This paper introduces probabilistic models inspired by psychometric approaches called item response theory models (think year-end standardized tests) to better understand how computers can answer questions and whether we are asking the right questions.  This allows researchers to better compare what kinds of questions systems can answer, better compare human and machine ability, and discover problematic questions (e.g., questions that have incorrect answer keys, are vague, or "trick" those trying to answer the questions).
 (21% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2021_acl_leaderboard.pdf

Chen Zhao, Chenyan Xiong, Jordan Boyd-Graber, and Hal Daume III.  Distantly-Supervised Dense Retrieval Enables Open-Domain Question Answering without Evidence Annotation.  Empirical Methods in Natural Language Processing, 2021.  

Accessible Abstract: Answering questions sometimes requires tying multiple pieces of information together.  Previous datasets have required annotators to explicitly build these reasoning chains (e.g., to answer "where do I know the cop from Die Hard from", you need to figure out that the actor's name is "Reginald VelJohnson" and then find out that he's best known as the dad on Family Matters.).  By exploring search queries that get to the right answer, we're able to answer these questions without expensive annotation.
 (26% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2021_emnlp_weak_dpr.pdf

Pedro Rodriguez and Jordan Boyd-Graber.  Evaluation Paradigms in Question Answering.  Empirical Methods in Natural Language Processing, 2021. 

Accessible Abstract: Why do we answer questions?  Sometimes it's to provide information, which has been the interpretation of the computer science community.  But sometimes it's to probe or test intelligence.  This paper argues we should think more about that application of question answering and its connection to the foundations of artificial intelligence: The Turing Test.  We thus argue that in addition to the long-standing Cranfield paradigm popularized by information retrieval, this paper proposes an alternative "Manchester paradigm" closer to the Turing test, trivia games, and education.
 (18% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2021_emnlp_paradigms.pdf

Maharshi Gor, Kellie Webster, and Jordan Boyd-Graber.  Toward Deconfounding the Influence of Subject's Demographic Characteristics in Question Answering.  Empirical Methods in Natural Language Processing, 2021. 

Accessible Abstract: The data used to train computer question answering systems have three times as many men as women.  This paper examines whether this is a problem for question answering accuracy. After a thorough investigation, we do not find evidence of serious accuracy discrepancies between languages.  However, an absence of evidence is not evidence of absence, and we would argue that we need more diverse datasets to better represent the world's population.
 (26% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2021_emnlp_qa_fairness.pdf

Chenglei Si, Chen Zhao, and Jordan Boyd-Graber.  What's in a Name? Answer Equivalence For Open-Domain Question Answering.  Empirical Methods in Natural Language Processing, 2021. 

Accessible Abstract: Is Tim Cook the same person as Timothy Donald Cook?  You might think so, but the way we train computers to answer questions would say they aren't.  We show that keeping track of multiple names (and it's really simple) can create better question answering systems.  Simply by adding alternate answers mined from knowledge bases, we can improve accuracy 1-2 points on major QA datasets.
 (18% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2021_emnlp_answer_equiv.pdf

Chen Zhao, Chenyan Xiong, Hal Daume III, and Jordan Boyd-Graber.  Multi-Step Reasoning Over Unstructured Text with Beam Dense Retrieval.  North American Association for Computational Linguistics, 2021. 
Accessible Abstract: For computers to answer complicated questions online, they often need to put together multiple pieces of information (Ronald Reagan was both governor of California and an actor in Bedtime for Bonzo).  However, existing approaches use the links in Wikipedia to combine these clues.  This research helps computers find connected information without using these explicit links.
 (23% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2021_naacl_multi_ance.pdf

Chen Zhao, Chenyan Xiong, Xin Qian, and Jordan Boyd-Graber.  Complex Factoid Question Answering with a Free-Text Knowledge Graph.  ACM International Conference on World Wide Web, 2020. 
 (19.2% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2020_www_delft.pdf

Jordan Boyd-Graber and Benjamin B&ouml;rschinger.  What Question Answering can Learn from Trivia Nerds.  Association for Computational Linguistics, 2020.    

Accessible Abstract: This paper reflects on the similarities between trivia competitions and computer question answering research.  Modern machine learning requires large, quality datasets.  The central thesis of this article argues that the same things that make trivia tournaments good (they're fun, fair, and consistently crown the best trivia players) can also improve question answering datasets.  Concretely, we argue that question answering datasets should clearly specify what answers are requested, have systematic policies to deal with natural ambiguity and variation, have authors look at the data (and help others do the same), make sure questions separate the best from the rest, and ensure people can have fun.  We draw on the authors' experience in the trivia community (including embarrassing episodes on Jeopardy!) to illustrate our arguments.
 (25.4% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2020_acl_trivia.pdf

Diggelmann, Thomas, Boyd-Graber, Jordan, Bulian, Jannis, Ciaramita, Massimiliano, and Leippold, Markus.  CLIMATE-FEVER: A Dataset for Verification of Real-World Climate Claims.  NIPS Workshop on Tackling Climate Change with Machine Learning, 2020. 
http://umiacs.umd.edu/~jbg/https://research.google/pubs/pub50541/

Eric Wallace, Shi Feng, and Jordan Boyd-Graber.  Misleading Failures of Partial-input Baselines.  Association for Computational Linguistics, 2019. (18% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2019_acl_flipside.pdf

Denis Peskov, Joe Barrow, Pedro Rodriguez, Graham Neubig, and Jordan Boyd-Graber.  Mitigating Noisy Inputs for Question Answering.  Conference of the International Speech Communication Association, 2019.
http://umiacs.umd.edu/~jbg/docs/2019_interspeech_asr

Ahmed Elgohary Ghoneim, Denis Peskov, and Jordan Boyd-Graber.  Can You Unpack That? Learning to Rewrite Questions-in-Context.  Empirical Methods in Natural Language Processing, 2019. 
http://umiacs.umd.edu/~jbg/docs/2019_emnlp_sequentialqa.pdf

Shi Feng and Jordan Boyd-Graber.  What AI can do for me: Evaluating Machine Learning Interpretations in Cooperative Play.  Intelligent User Interfaces, 2019.
 (25% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2019_iui_augment.pdf

Eric Wallace, Pedro Rodriguez, Shi Feng, Ikuya Yamada, and Jordan Boyd-Graber.  Trick Me If You Can: Human-in-the-loop Generation of Adversarial Question Answering Examples.  Transactions of the Association for Computational Linguistics, 2019.   

http://umiacs.umd.edu/~jbg/docs/2019_tacl_trick.pdf

Eric Wallace and Jordan Boyd-Graber.  Trick Me If You Can: Adversarial Writing of Trivia Challenge Questions.  ACL Student Research Workshop, 2018.
http://umiacs.umd.edu/~jbg/http://aclweb.org/anthology/P18-3018

Ahmed Elgohary Ghoneim, Chen Zhao, and Jordan Boyd-Graber.  Dataset and Baselines for Sequential Open-Domain Question Answering.  Empirical Methods in Natural Language Processing, 2018. (23% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2018_emnlp_linked.pdf

Shi Feng, Eric Wallace, Alvin Grissom II, Pedro Rodriguez, Mohit Iyyer, and Jordan Boyd-Graber.  Pathologies of Neural Models Make Interpretation Difficult.  Empirical Methods in Natural Language Processing, 2018. 
 (26% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2018_emnlp_rs.pdf

Mohit Iyyer, Varun Manjunatha, Anupam Guha, Yogarshi Vyas, Jordan Boyd-Graber, Hal Daume III, and Larry Davis.  The Amazing Mysteries of the Gutter: Drawing Inferences Between Panels in Comic Book Narratives.  Computer Vision and Pattern Recognition, 2017.  (30% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2017_cvpr_comics.pdf

He He, Jordan Boyd-Graber, Kevin Kwok, and Hal Daume III.  Opponent Modeling in Deep Reinforcement Learning.  International Conference on Machine Learning, 2016.  (24% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2016_icml_opponent.pdf

Anupam Guha, Mohit Iyyer, and Jordan Boyd-Graber.  A Distorted Skull Lies in the Bottom Center: Identifying Paintings from Text Descriptions.  NAACL Human-Computer Question Answering Workshop, 2016. 
http://umiacs.umd.edu/~jbg/docs/2016_naacl_paintings.pdf

Md Arafat Sultan, Jordan Boyd-Graber, and Tamara Sumner.  Bayesian Supervised Domain Adaptation for Short Text Similarity.  North American Association for Computational Linguistics, 2016. 
 (24% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2016_naacl_sts.pdf

Mohit Iyyer, Varun Manjunatha, Jordan Boyd-Graber, and Hal Daume III.  Deep Unordered Composition Rivals Syntactic Methods for Text Classification.  Association for Computational Linguistics, 2015.   
 (25% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2015_acl_dan.pdf

Jordan Boyd-Graber, Mohit Iyyer, He He, and Hal Daume III.  Interactive Incremental Question Answering.  Neural Information Processing Systems, 2015.This won the best demonstration award at NIPS 2015


Anupam Guha, Mohit Iyyer, Danny Bouman, and Jordan Boyd-Graber.  Removing the Training Wheels: A Coreference Dataset that Entertains Humans and Challenges Computers.  North American Association for Computational Linguistics, 2015.    
 (26% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2015_naacl_qb_coref.pdf

Mohit Iyyer, Jordan Boyd-Graber, Leonardo Claudino, Richard Socher, and Hal Daume III.  A Neural Network for Factoid Question Answering over Paragraphs.  Empirical Methods in Natural Language Processing, 2014. The partial derivatives of "C" and "J" with respect to the parameters should be switched in Equation 7.
 (26% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2014_emnlp_qb_rnn.pdf

Jordan Boyd-Graber, Brianna Satinoff, He He, and Hal Daume III.  Besting the Quiz Master: Crowdsourcing Incremental Classification Games.  Empirical Methods in Natural Language Processing, 2012.  
 (25% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/qb_emnlp_2012.pdf

Reinforcement Learning
-------------------------

Wenyan Li, Alvin Grissom II, and Jordan Boyd-Graber.  An Attentive Recurrent Model for Incremental Prediction of Sentence-final Verbs.  Findings of EMNLP, 2020.
http://umiacs.umd.edu/~jbg/docs/2020_findings_verbs.pdf

Tianze Shi, Chen Zhao, Jordan Boyd-Graber, Hal Daume III, and Lillian Lee.  On the Potential of Lexico-logical Alignments for Semantic Parsing to SQL Queries.  Findings of EMNLP, 2020.  
http://umiacs.umd.edu/~jbg/docs/2020_findings_qalign.pdf

Khanh Nguyen, Jordan Boyd-Graber, and Hal Daume III.  Reinforcement Learning for Bandit Neural Machine Translation with Simulated Human Feedback.  Empirical Methods in Natural Language Processing, 2017.  (26% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2017_emnlp_bandit_mt.pdf

He He, Jordan Boyd-Graber, Kevin Kwok, and Hal Daume III.  Opponent Modeling in Deep Reinforcement Learning.  International Conference on Machine Learning, 2016.  (24% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2016_icml_opponent.pdf

Alvin Grissom II, He He, Jordan Boyd-Graber, John Morgan, and Hal Daume III.  Don't Until the Final Verb Wait: Reinforcement Learning for Simultaneous Machine Translation.  Empirical Methods in Natural Language Processing, 2014. 
 (30% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2014_emnlp_simtrans.pdf

Jordan Boyd-Graber, Brianna Satinoff, He He, and Hal Daume III.  Besting the Quiz Master: Crowdsourcing Incremental Classification Games.  Empirical Methods in Natural Language Processing, 2012.  
 (25% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/qb_emnlp_2012.pdf

Sentiment and Perspective
-------------------------

Denis Peskov, Benny Cheng, Ahmed Elgohary Ghoneim, Joe Barrow, Cristian Danescu-Niculescu-Mizil, and Jordan Boyd-Graber.  It Takes Two to Lie: One to Lie and One to Listen.  Association for Computational Linguistics, 2020.   

Accessible Abstract: Machine learning techniques to detect deception in online communications requires training and evaluation data.  However, there is a dearth of data either because of uncertain gold labels or privacy concerns; we create a new, large deception-centered dataset in the online game of Diplomacy.  We gathered 17,289 messages from 12 games (each of which took over a month) involving 84 players, the majority of which were unique users.  This data was collected with a custom-made bot that allowed us to collect messages and annotations.  The user pool was created from scratch: we varied  participant demographics across gender, age, nationality, and past game experience.  Some of our participants included the former president of the Diplomacy players' association, several top ranked players in the world, a board game shop owner, and scientists.  We create machine learning models to detect lies using linguistic, context, and power-dynamic features.  Our best model had similar lie detection accuracy to humans.
 (25.4% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2020_acl_diplomacy.pdf

Francesco Saverio Varini, Jordan Boyd-Graber, Massimiliano Ciaramita, and Markus Leippold.  ClimaText: A Dataset for Climate Change Topic Detection.  NeurIPS Workshop on Tackling Climate Change with Machine Learning, 2020.

Mohit Iyyer, Varun Manjunatha, Jordan Boyd-Graber, and Hal Daume III.  Deep Unordered Composition Rivals Syntactic Methods for Text Classification.  Association for Computational Linguistics, 2015.   
 (25% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2015_acl_dan.pdf

Vlad Niculae, Srijan Kumar, Jordan Boyd-Graber, and Cristian Danescu-Niculescu-Mizil.  Linguistic Harbingers of Betrayal: A Case Study on an Online Strategy Game.  Association for Computational Linguistics, 2015. 

Accessible Abstract: This paper introduces the application of natural language processing techniques to understand the relationships (and their dissolution) in the game of Diplomacy.  This popular board game simulates Europe at the eve of World War I and forces players to work with each other to forge alliances and make plans together.  However, the game's setup also encourages players to turn against each other.  This paper analyzes whether we can predict these betrayals (we can!) and the linguistic and social phenomena (demands, politeness, and planning) that can predict when a betrayal will happen.
 (25% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2015_acl_diplomacy.pdf

Viet-An Nguyen, Jordan Boyd-Graber, Philip Resnik, and Kristina Miler.  Tea Party in the House: A Hierarchical Ideal Point Topic Model and Its Application to Republican Legislators in the 112th Congress.  Association for Computational Linguistics, 2015.   

Accessible Abstract: In the mid 2010s, the Republican party in the United States diverged: mainstream conservatives split from the so-called "tea party" caucus.  However, the primary statistical tool for analyzing political factions in legislative bodies (ideal point models) fail to account for these changes.  This is because the schism is not fully reflected in voting patterns but rather in how politicians present themselves: thus we need to extend these models to capture not just how politicians vote but also how they frame particular issues.  This paper proposes a new model to capture framing differences within a voting block to start explaining the new subcoalitions of the republican caucus.
 (25% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2015_acl_teaparty.pdf

Stephen H. Bach, Bert Huang, Jordan Boyd-Graber, and Lise Getoor.  Paired-Dual Learning for Fast Training of Latent Variable Hinge-Loss MRFs.  International Conference on Machine Learning, 2015.  (20% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2015_icml_paired_dual.pdf

Philip Resnik, William Armstrong, Leonardo Claudino, Thang Nguyen, Viet-An Nguyen, and Jordan Boyd-Graber.  Beyond LDA: Exploring Supervised Topic Modeling for Depression-Related Language in Twitter.  NAACL Workshop on Cognitive Modeling and Computational Linguistics, 2015.

Thang Nguyen, Jordan Boyd-Graber, Jeff Lund, Kevin Seppi, and Eric Ringger.  Is your anchor going up or down?  Fast and accurate supervised topic models.  North American Association for Computational Linguistics, 2015. (26% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2015_naacl_supervised_anchor.pdf

Mohit Iyyer, Peter Enns, Jordan Boyd-Graber, and Philip Resnik.  Political Ideology Detection Using Recursive Neural Networks.  Association for Computational Linguistics, 2014.  (26% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2014_acl_rnn_ideology.pdf

Viet-An Nguyen, Jordan Boyd-Graber, and Philip Resnik.  Sometimes Average is Best: The Importance of Averaging for Prediction using MCMC Inference in Topic Modeling.  Empirical Methods in Natural Language Processing, 2014. (30% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2014_emnlp_howto_gibbs.pdf

Viet-An Nguyen, Jordan Boyd-Graber, Philip Resnik, Deborah Cai, Jennifer Midberry, and Yuanxin Wang.  Modeling Topic Control to Detect Influence in Conversations using Nonparametric Topic Models.  Machine Learning, 2014.   
http://umiacs.umd.edu/~jbg/docs/2014_mlj_influencer.pdf

Kimberly Glasgow, Clay Fink, and Jordan Boyd-Graber.  Our grief is unspeakable: Measuring the community impact of a tragedy.  The International AAAI Conference on Weblogs and Social Media, 2014. (20% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2014_icwsm_grief.pdf

Jordan Boyd-Graber, Kimberly Glasgow, and Jackie Sauter Zajac.  Spoiler Alert: Machine Learning Approaches to Detect Social Media Posts with Revelatory Information.  ASIST 2013: The 76th Annual Meeting of the American Society for Information Science and Technology, 2013. 
http://umiacs.umd.edu/~jbg/docs/2013_spoiler.pdf

Viet-An Nguyen, Jordan Boyd-Graber, and Philip Resnik.  Lexical and Hierarchical Topic Regression.  Neural Information Processing Systems, 2013.  (25% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2013_shlda.pdf

Viet-An Nguyen, Yuening Hu, Jordan Boyd-Graber, and Philip Resnik.  Argviz: Interactive Visualization of Topic Dynamics in Multi-party Conversations.  North American Association for Computational Linguistics, 2013. (50% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2013_argviz.pdf

Viet-An Nguyen, Jordan Boyd-Graber, and Philip Resnik.  SITS: A Hierarchical Nonparametric Model using Speaker Identity for Topic Segmentation in Multiparty Conversations.  Association for Computational Linguistics, 2012.     (19% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/acl_2012_sits.pdf

Viet-An Nguyen, Jordan Boyd-Graber, and Philip Resnik.  &quot;I Want to Talk About, Again, My Record On Energy&nbsp;&hellip;'':  Modeling
  Topic Control in Conversations using Speaker-centric Nonparametric Topic Models.  Mid-Atlantic Student Colloquium on Speech, Language, and Learning, 2012.

Asad B. Sayeed, Jordan Boyd-Graber, Bryan Rusk, and Amy Weinberg.  Grammatical structures for word-level sentiment detection.  North American Association for Computational Linguistics, 2012.  (31% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/srt_naacl_2012.pdf

Clay Templeton, Travis Brown, Sayan Battacharyya, and Jordan Boyd-Graber.  Mining the Dispatch under Supervision: Using Casualty Counts to Guide Topics from the Richmond Daily Dispatch Corpus.  Chicago Colloquium on Digital Humanities and Computer Science, 2011.
http://umiacs.umd.edu/~jbg/docs/slda_civil_war.pdf

Clay Templeton, Kenneth R. Fleischmann, and Jordan Boyd-Graber.  Simulating Audiences: Automating Analysis of Values, Attitudes, and Sentiment.  IEEE International Conference on Social Computing, 2011. (10% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/simulating_audiences.pdf

Pranav Anand, Joseph King, Jordan Boyd-Graber, Earl Wagner, Craig Martell, Douglas W. Oard, and Philip Resnik.  Believe Me: We Can Do This!.  The AAAI 2011 workshop on Computational Models of Natural Argument, 2011.  
http://umiacs.umd.edu/~jbg/docs/persuasion.pdf

Clay Templeton, Kenneth R. Fleischmann, and Jordan Boyd-Graber.  Comparing Values and Sentiment Using Mechanical Turk.  iConference, 2011.
http://umiacs.umd.edu/~jbg/docs/iconference-2011-comparing.pdf

Kenneth R. Fleischmann, Clay Templeton, and Jordan Boyd-Graber.  Modeling Diverse Standpoints in Text Classification: Learning to Be Human by Modeling Human Values.  iConference, 2011.
http://umiacs.umd.edu/~jbg/docs/iconference-2011-learning.pdf

Jordan Boyd-Graber and Philip Resnik.  Holistic Sentiment Analysis Across Languages: Multilingual Supervised Latent Dirichlet Allocation.  Empirical Methods in Natural Language Processing, 2010.  (25% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/jbg-mlslda-2010.pdf

Eric Hardisty, Jordan Boyd-Graber, and Philip Resnik.  Modeling Perspective using Adaptor Grammars.  Empirical Methods in Natural Language Processing, 2010. (25% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/adapted_naive_bayes.pdf

Spectral Methods
-------------------------

Jeffrey Lund, Piper Armstrong, Wilson Fearn, Stephen Cowley, Courtni Byun, Jordan Boyd-Graber, and Kevin Seppi.  Automatic and Human Evaluation of Local Topic Quality.  Association for Computational Linguistics, 2019.  (26% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2019_acl_local.pdf

Michelle Yuan, Benjamin Van Durme, and Jordan Boyd-Graber.  Multilingual Anchoring: Interactive Topic Modeling and Alignment  Across Languages.  Neural Information Processing Systems, 2018. 
 (21% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2018_neurips_mtanchor.pdf

Jeff Lund, Connor Cook, Kevin Seppi, and Jordan Boyd-Graber.  Tandem Anchoring: A Multiword Anchor Approach for Interactive Topic Modeling.  Association for Computational Linguistics, 2017. 
 (22% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2017_acl_multiword_anchors.pdf

Thang Nguyen, Jordan Boyd-Graber, Jeff Lund, Kevin Seppi, and Eric Ringger.  Is your anchor going up or down?  Fast and accurate supervised topic models.  North American Association for Computational Linguistics, 2015. (26% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2015_naacl_supervised_anchor.pdf

Thang Nguyen, Yuening Hu, and Jordan Boyd-Graber.  Anchors Regularized: Adding Robustness and Extensibility to Scalable Topic-Modeling Algorithms.  Association for Computational Linguistics, 2014. 
 (26% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2014_acl_anchor_reg.pdf

Thang Nguyen, Yuening Hu, and Jordan Boyd-Graber.  Evaluating Regularized Anchor Words.  NIPS Workshop on Topic Models: Computation, Application, and Evaluation, 2013.

Speech
-------------------------

Denis Peskov, Joe Barrow, Pedro Rodriguez, Graham Neubig, and Jordan Boyd-Graber.  Mitigating Noisy Inputs for Question Answering.  Conference of the International Speech Communication Association, 2019.
http://umiacs.umd.edu/~jbg/docs/2019_interspeech_asr

Syntax
-------------------------

He He, Jordan Boyd-Graber, and Hal Daume III.  Interpretese vs. Translationese: The Uniqueness of Human Strategies in Simultaneous Interpretation.  North American Association for Computational Linguistics, 2016. 
 (29% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2016_naacl_interpretese.pdf

Mohit Iyyer, Varun Manjunatha, Jordan Boyd-Graber, and Hal Daume III.  Deep Unordered Composition Rivals Syntactic Methods for Text Classification.  Association for Computational Linguistics, 2015.   
 (25% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2015_acl_dan.pdf

Anupam Guha, Mohit Iyyer, Danny Bouman, and Jordan Boyd-Graber.  Removing the Training Wheels: A Coreference Dataset that Entertains Humans and Challenges Computers.  North American Association for Computational Linguistics, 2015.    
 (26% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2015_naacl_qb_coref.pdf

Naho Orita, Naomi Feldman, and Jordan Boyd-Graber.  Quantifying the role of discourse topicality in  speakers' choices of referring expressions.  ACL Workshop on Cognitive Modeling and Computational Linguistics, 2014.

Mohit Iyyer, Peter Enns, Jordan Boyd-Graber, and Philip Resnik.  Political Ideology Detection Using Recursive Neural Networks.  Association for Computational Linguistics, 2014.  (26% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2014_acl_rnn_ideology.pdf

Mohit Iyyer, Jordan Boyd-Graber, Leonardo Claudino, Richard Socher, and Hal Daume III.  A Neural Network for Factoid Question Answering over Paragraphs.  Empirical Methods in Natural Language Processing, 2014. The partial derivatives of "C" and "J" with respect to the parameters should be switched in Equation 7.
 (26% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2014_emnlp_qb_rnn.pdf

Ke Zhai, Jordan Boyd-Graber, and Shay B. Cohen.  Hybrid Online Inference with Adaptor Grammars.  NIPS Workshop on Advances in Variational Inference, 2014.

Mohit Iyyer, Jordan Boyd-Graber, and Hal Daume III.  Generating Sentences from Semantic Vector Space Representations.  NIPS Workshop on Learning Semantics, 2014.

Ke Zhai, Jordan Boyd-Graber, and Shay B. Cohen.  Online Adaptor Grammars with Hybrid Inference.  Transactions of the Association for Computational Linguistics, 2014. 
http://umiacs.umd.edu/~jbg/docs/2014_tacl_ag_vb_online.pdf

Naho Orita, Rebecca McKeown, Naomi H. Feldman, Jeffrey Lidz, and Jordan Boyd-Graber.  Discovering Pronoun Categories using Discourse Information.  Proceedings of the Cognitive Science Society, 2013.
http://umiacs.umd.edu/~jbg/docs/2013_cogsci_pronoun.pdf

Asad B. Sayeed, Jordan Boyd-Graber, Bryan Rusk, and Amy Weinberg.  Grammatical structures for word-level sentiment detection.  North American Association for Computational Linguistics, 2012.  (31% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/srt_naacl_2012.pdf

Jordan Boyd-Graber and David M. Blei.  Syntactic Topic Models.  Neural Information Processing Systems, 2008.   (25% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/nips2008.pdf

Topic Models
-------------------------

Zongxia Li, Andrew Mao, Daniel Kofi Stephens, Pranav Goel, Emily Walpole, Juan Francisco Fung, Alden Dima, and Jordan Lee Boyd-Graber.  TENOR: Topic Enabled Neural Organization and Recommendation: Evaluating Topic Models in Task Based Settings.  European Association for Computational Linguistics, 2024. (21% Acceptance Rate)

Alexander Hoyle, Pranav Goel, Denis Peskov, Andrew Hian-Cheong, Jordan Boyd-Graber, and Philip Resnik.  Is Automated Topic Model Evaluation Broken?: The Incoherence of Coherence.  Neural Information Processing Systems, 2021.    

Accessible Abstract: Topic models help historians, journalists, and analysts make sense of large text collections.  But how do you know if you have a good one?  The field has settled on using "Automatic Coherence", but this paper argues that maybe that isn't the right choice if you want to actually make real users happy.  This paper builds on our 2009 that showed perplexity was not a good evaluation of interpretability for topic models; while the field adopted automatic topic coherence as a result of that 2009 paper, this paper argues that automatic topic coherence is not a good metric for neural topic models (even though it worked for probabilistic topic models).
 (26% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2021_neurips_incoherence.pdf

Alison Smith, Varun Kumar, Jordan Boyd-Graber, Kevin Seppi, and Leah Findlater.  Digging into User Control: Perceptions of Adherence and
Instability in Transparent Models.  Intelligent User Interfaces, 2020. (23% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2020_iui_control.pdf

Fenfei Guo, Jordan Boyd-Graber, Mohit Iyyer, and Leah Findlater.  Which Evaluations Uncover Sense Representations that Actually Make Sense?.  Linguistic Resources and Evaluation Conference, 2020.
http://umiacs.umd.edu/~jbg/docs/2020_lrec_sense.pdf

Francesco Saverio Varini, Jordan Boyd-Graber, Massimiliano Ciaramita, and Markus Leippold.  ClimaText: A Dataset for Climate Change Topic Detection.  NeurIPS Workshop on Tackling Climate Change with Machine Learning, 2020.

Jeffrey Lund, Piper Armstrong, Wilson Fearn, Stephen Cowley, Courtni Byun, Jordan Boyd-Graber, and Kevin Seppi.  Automatic and Human Evaluation of Local Topic Quality.  Association for Computational Linguistics, 2019.  (26% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2019_acl_local.pdf

Varun Kumar, Alison Smith, Leah Findlater, Kevin Seppi, and Jordan Boyd-Graber.  Why Didn't You Listen to Me? Comparing User Control of Human-in-the-Loop Topic Models.  Association for Computational Linguistics, 2019. (18% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2019_acl_control.pdf

Weiwei Yang, Jordan Boyd-Graber, and Philip Resnik.  A Multilingual Topic Model for Learning Weighted Topic Links Across Incomparable Corpora.  Empirical Methods in Natural Language Processing, 2019.
http://umiacs.umd.edu/~jbg/docs/2019_emnlp_mtm.pdf

Dasha Pruss, Yoshinari Fujinuma, Ashlynn Daughton, Michael Paul, Brad Arnot, Danielle Szafir, and Jordan Boyd-Graber.  Zika discourse in the Americas: A multilingual topic analysis of Twitter.  PlosOne, 2019. 
http://umiacs.umd.edu/~jbg/https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0216922

Alison Smith, Varun Kumar, Jordan Boyd-Graber, Kevin Seppi, and Leah Findlater.  User-Centered Design and Evaluation of a Human-in-the-Loop Topic Modeling System.  Intelligent User Interfaces, 2018.Alison won a best student paper honorable mention (3 out of 300)
 (23% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2018_iui_itm.pdf

Michelle Yuan, Benjamin Van Durme, and Jordan Boyd-Graber.  Multilingual Anchoring: Interactive Topic Modeling and Alignment  Across Languages.  Neural Information Processing Systems, 2018. 
 (21% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2018_neurips_mtanchor.pdf

Shudong Hao, Michael J. Paul, and Jordan Boyd-Graber.  Lessons from the Bible on Modern Topics: Multilingual Topic Model Evaluation on Low-Resource Languages.  North American Association for Computational Linguistics, 2018.
 (35% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2018_naacl_mltm_eval.pdf

Aaron Gerow, Yuening Hu, Jordan Boyd-Graber, David M. Blei, and James A. Evans.  Measuring Discursive Influence Across Scholarship.  Proceedings of the National Academies of Science, 2018. 

Jordan Boyd-Graber, Yuening Hu, and David Mimno.  Applications of Topic Models.  2017. 
http://umiacs.umd.edu/~jbg/http://www.nowpublishers.com/article/Details/INR-030

Jeff Lund, Connor Cook, Kevin Seppi, and Jordan Boyd-Graber.  Tandem Anchoring: A Multiword Anchor Approach for Interactive Topic Modeling.  Association for Computational Linguistics, 2017. 
 (22% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2017_acl_multiword_anchors.pdf

Alison Smith, Varun Kumar, Jordan Boyd-Graber, Kevin Seppi, and Leah Findlater.  Accounting for Input Uncertainty in Human-in-the-Loop Systems.  CHI 2017 Designing for Uncertainty Workshop, 2017.
http://umiacs.umd.edu/~jbg/http://visualization.ischool.uw.edu/hci_uncertainty/papers/Paper11.pdf

Weiwei Yang, Jordan Boyd-Graber, and Philip Resnik.  Adapting Topic Models using Lexical Associations with Tree Priors.  Empirical Methods in Natural Language Processing, 2017. (18% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2017_emnlp_tree_prior.pdf

You Lu, Jeff Lund, and Jordan Boyd-Graber.  Why ADAGRAD Fails for Online Topic Modeling.  Empirical Methods in Natural Language Processing, 2017. (18% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2017_emnlp_adagrad_olda.pdf

Tak Yeon Lee, Alison Smith, Kevin Seppi, Niklas Elmqvist, Jordan Boyd-Graber, and Leah Findlater.  The Human Touch: How Non-expert Users Perceive, Interpret, and Fix Topic Models.  International Journal of Human-Computer Studies, 2017. 
http://umiacs.umd.edu/~jbg/docs/2017_ijhcs_human_touch.pdf

Alison Smith, Tak Yeon Lee, Forough Poursabzi-Sangdeh, Jordan Boyd-Graber, Kevin Seppi, Niklas Elmqvist, and Leah Findlater.  Evaluating Visual Representations for Topic Understanding and Their Effects on Manually Generated Labels.  Transactions of the Association for Computational Linguistics, 2017.  
http://umiacs.umd.edu/~jbg/docs/2017_tacl_eval_tm_viz.pdf

Weiwei Yang, Jordan Boyd-Graber, and Philip Resnik.  A Discriminative Topic Model using Document Network Structure.  Association for Computational Linguistics, 2016. 
 (28% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2016_acl_docblock.pdf

Forough Poursabzi-Sangdeh, Jordan Boyd-Graber, Leah Findlater, and Kevin Seppi.  ALTO: Active Learning with Topic Overviews for Speeding Label Induction and Document Labeling.  Association for Computational Linguistics, 2016.  (28% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2016_acl_doclabel.pdf

Alison Smith, Tak Yeon Lee, Forough Poursabzi-Sangdeh, Jordan Boyd-Graber, Kevin Seppi, Niklas Elmqvist, and Leah Findlater.  Human-Centered and Interactive: Expanding the Impact of Topic Models.  CHI Human Centred Machine Learning Workshop, 2016.


Weiwei Yang, Jordan Boyd-Graber, and Philip Resnik.  Birds of a Feather in the Same Nest: A Discriminative Topic Model using Block-based Priors.  Mid-Atlantic Student Colloquium on Speech, Language, and Learning, 2016.

Viet-An Nguyen, Jordan Boyd-Graber, Philip Resnik, and Kristina Miler.  Tea Party in the House: A Hierarchical Ideal Point Topic Model and Its Application to Republican Legislators in the 112th Congress.  Association for Computational Linguistics, 2015.   

Accessible Abstract: In the mid 2010s, the Republican party in the United States diverged: mainstream conservatives split from the so-called "tea party" caucus.  However, the primary statistical tool for analyzing political factions in legislative bodies (ideal point models) fail to account for these changes.  This is because the schism is not fully reflected in voting patterns but rather in how politicians present themselves: thus we need to extend these models to capture not just how politicians vote but also how they frame particular issues.  This paper proposes a new model to capture framing differences within a voting block to start explaining the new subcoalitions of the republican caucus.
 (25% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2015_acl_teaparty.pdf

Paul Felt, Eric Ringger, Jordan Boyd-Graber, and Kevin Seppi.  Making the Most of Crowdsourced Document Annotations: Confused Supervised LDA.  Conference on Computational Natural Language Learning, 2015. This paper received the best paper award at CoNLL

 (30% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2015_conll_cslda.pdf

Weiwei Yang, Jordan Boyd-Graber, and Philip Resnik.  Birds of a Feather Linked Together: A Discriminative Topic Model using Link-based Priors.  Empirical Methods in Natural Language Processing, 2015. (28% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2015_emnlp_hinge_link.pdf

Yi Yang, Doug Downey, and Jordan Boyd-Graber.  Efficient Methods for Incorporating Knowledge into Topic Models.  Empirical Methods in Natural Language Processing, 2015.  (24% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2015_emnlp_fast_priors.pdf

Forough Poursabzi-Sangdeh and Jordan Boyd-Graber.  Speeding Document Annotation with Topic Models.  NAACL Student Research Workshop, 2015.

Philip Resnik, William Armstrong, Leonardo Claudino, Thang Nguyen, Viet-An Nguyen, and Jordan Boyd-Graber.  Beyond LDA: Exploring Supervised Topic Modeling for Depression-Related Language in Twitter.  NAACL Workshop on Cognitive Modeling and Computational Linguistics, 2015.

Thang Nguyen, Jordan Boyd-Graber, Jeff Lund, Kevin Seppi, and Eric Ringger.  Is your anchor going up or down?  Fast and accurate supervised topic models.  North American Association for Computational Linguistics, 2015. (26% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2015_naacl_supervised_anchor.pdf

Naho Orita, Naomi Feldman, and Jordan Boyd-Graber.  Quantifying the role of discourse topicality in  speakers' choices of referring expressions.  ACL Workshop on Cognitive Modeling and Computational Linguistics, 2014.

Alison Smith, Jason Chuang, Yuening Hu, Jordan Boyd-Graber, and Leah Findlater.  Concurrent Visualization of Relationships between Words and Topics in Topic Models.  ACL Workshop on Workshop on Interactive Language Learning, Visualization, and Interfaces, 2014.

Thang Nguyen, Yuening Hu, and Jordan Boyd-Graber.  Anchors Regularized: Adding Robustness and Extensibility to Scalable Topic-Modeling Algorithms.  Association for Computational Linguistics, 2014. 
 (26% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2014_acl_anchor_reg.pdf

Yuening Hu, Ke Zhai, Vlad Eidelman, and Jordan Boyd-Graber.  Polylingual Tree-Based Topic Models for Translation Domain Adaptation.  Association for Computational Linguistics, 2014.  (26% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2014_acl_ptlda_mt.pdf

Viet-An Nguyen, Jordan Boyd-Graber, and Philip Resnik.  Sometimes Average is Best: The Importance of Averaging for Prediction using MCMC Inference in Topic Modeling.  Empirical Methods in Natural Language Processing, 2014. (30% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2014_emnlp_howto_gibbs.pdf

Jordan Boyd-Graber, David Mimno, and David Newman.  Care and Feeding of Topic Models: Problems, Diagnostics, and Improvements.  Handbook of Mixed Membership Models and Their Applications, 2014.
http://umiacs.umd.edu/~jbg/docs/2014_book_chapter_care_and_feeding.pdf

Yuening Hu, Jordan Boyd-Graber, Brianna Satinoff, and Alison Smith.  Interactive Topic Modeling.  Machine Learning, 2014.   
http://umiacs.umd.edu/~jbg/docs/2014_mlj_itm.pdf

Jason Chuang, John D. Wilkerson, Rebecca Weiss, Dustin Tingley, Brandon M. Stewart, Margaret E. Roberts, Forough Poursabzi-Sangdeh, Justin Grimmer, Leah Findlater, Jordan Boyd-Graber, and Jeffrey Heer.  Computer-Assisted Content Analysis: Topic Models for Exploring Multiple Subjective Interpretations.  NIPS Workshop on Human-Propelled Machine Learning, 2014.

Viet-An Nguyen, Jordan Boyd-Graber, Philip Resnik, and Jonathan Chang.  Learning a Concept Hierarchy from Multi-labeled Documents.  Neural Information Processing Systems, 2014.  (25% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2014_nips_l2h.pdf

Ke Zhai and Jordan Boyd-Graber.  Online Topic Models with Infinite Vocabulary.  International Conference on Machine Learning, 2013.    (20% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2013_icml_infvoc.pdf

Thang Nguyen, Yuening Hu, and Jordan Boyd-Graber.  Evaluating Regularized Anchor Words.  NIPS Workshop on Topic Models: Computation, Application, and Evaluation, 2013.

Yuening Hu, Ke Zhai, Vlad Edelman, and Jordan Boyd-Graber.  Topic Models for Translation Domain Adaptation.  NIPS Workshop on Topic Models: Computation, Application, and Evaluation, 2013.

Viet-An Nguyen, Jordan Boyd-Graber, Jonathan Chang, and Philip Resnik.  Tree-Based Label Dependency Topic Models.  NIPS Workshop on Topic Models: Computation, Application, and Evaluation, 2013.

Viet-An Nguyen, Jordan Boyd-Graber, and Philip Resnik.  Lexical and Hierarchical Topic Regression.  Neural Information Processing Systems, 2013.  (25% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2013_shlda.pdf

Naho Orita, Rebecca McKeown, Naomi H. Feldman, Jeffrey Lidz, and Jordan Boyd-Graber.  Discovering Pronoun Categories using Discourse Information.  Proceedings of the Cognitive Science Society, 2013.
http://umiacs.umd.edu/~jbg/docs/2013_cogsci_pronoun.pdf

Ke Zhai, Jordan Boyd-Graber, Nima Asadi, and Mohamad (Jude) Alkhouja.  Mr. LDA: A Flexible Large Scale Topic Modeling Package using Variational Inference in MapReduce.  ACM International Conference on World Wide Web, 2012.   (12% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2012_www_mrlda.pdf

Yuening Hu and Jordan Boyd-Graber.  Efficient Tree-Based Topic Modeling.  Association for Computational Linguistics, 2012. (21% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/acl_2012_fttm.pdf

Vladimir Eidelman, Jordan Boyd-Graber, and Philip Resnik.  Topic Models for Dynamic Translation Model Adaptation.  Association for Computational Linguistics, 2012.  For a more thorough evaluation and an exploration of more advanced topic models for machine translation, see: Yuening Hu, Ke Zhai, Vlad Eidelman, and Jordan Boyd-Graber. Polylingual Tree-Based Topic Models for Translation Domain Adaptation. Association for Computational Linguistics, 2014.

 (21% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/acl_2012_tm_for_mt.pdf

Yuening Hu and Jordan Boyd-Graber.  Suggesting Constraints for Interactive Topic Modeling.  ICML Workshop on Machine Learning in Human Computation and Crowdsourcing, 2012.

Ke Zhai and Jordan Boyd-Graber.  Online Topic Model with Infinite Vocabulary.  Mid-Atlantic Student Colloquium on Speech, Language, and Learning, 2012.

Yuening Hu, Jordan Boyd-Graber, and Brianna Satinoff.  Interactive Topic Modeling.  Association for Computational Linguistics, 2011.   (25% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/itm.pdf

Clay Templeton, Travis Brown, Sayan Battacharyya, and Jordan Boyd-Graber.  Mining the Dispatch under Supervision: Using Casualty Counts to Guide Topics from the Richmond Daily Dispatch Corpus.  Chicago Colloquium on Digital Humanities and Computer Science, 2011.
http://umiacs.umd.edu/~jbg/docs/slda_civil_war.pdf

Jordan Boyd-Graber.  Linguistic Extensions of Topic Models.  Ph.D. thesis, Princeton University, 2010.
http://umiacs.umd.edu/~jbg/docs/2010_jbg_thesis.pdf

Jordan Boyd-Graber and Philip Resnik.  Holistic Sentiment Analysis Across Languages: Multilingual Supervised Latent Dirichlet Allocation.  Empirical Methods in Natural Language Processing, 2010.  (25% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/jbg-mlslda-2010.pdf

Jonathan Chang, Jordan Boyd-Graber, and David M. Blei.  Connections between the Lines: Augmenting Social Networks with Text.  Knowledge Discovery and Data Mining, 2009.     (9% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/kdd2009.pdf

Jonathan Chang, Jordan Boyd-Graber, Chong Wang, Sean Gerrish, and David M. Blei.  Reading Tea Leaves: How Humans Interpret Topic Models.  Neural Information Processing Systems, 2009.   Jonathan Chang and I shared a NIPS student award honorable mention for this paper (5 out of 1105)


Accessible Abstract: Topic models are a tool that historians and social sciences use to explore large text corpora.  But how do you know if you have a good topic model?  Before this paper, the consensus was to use held-out likelihood to evaluate if you had a good model.  This paper argues that this does not fit how people actually use topic models and proposes new human-centered metrics for evaluating topic models.  This method inspired a rethinking of model evaluation and showed that the complexity of a model does not always correspond to what a user might want.
 (24% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/nips2009-rtl.pdf

Jordan Boyd-Graber and David M. Blei.  Multilingual Topic Models for Unaligned Text.  Uncertainty in Artificial Intelligence, 2009. For coverage of current state-of-the-art in cross-lingual topic models see: Yuening Hu, Ke Zhai, Vlad Eidelman, and Jordan Boyd-Graber. Polylingual Tree-Based Topic Models for Translation Domain Adaptation. Association for Computational Linguistics, 2014.
 (31% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/uai2009.pdf

Jonathan Chang, Jordan Boyd-Graber, and David M. Blei.  Discovering social networks from free text.  3rd Annual Machine Learning Symposium, 2008.

Jordan Boyd-Graber and David M. Blei.  Multilingual Topic Models.  NIPS Workshop on Unsupervised Latent Variable Models, 2008.

Jordan Boyd-Graber and David M. Blei.  Syntactic Topic Models.  Neural Information Processing Systems, 2008.   (25% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/nips2008.pdf

Jordan Boyd-Graber and David M. Blei.  PUTOP: Turning Predominant Senses into a Topic Model for WSD.  4th International Workshop on Semantic Evaluations, 2007.
http://umiacs.umd.edu/~jbg/docs/jbg-SEMEVAL07.pdf

Jordan Boyd-Graber, David M. Blei, and Xiaojin Zhu.  A Topic Model for Word Sense Disambiguation.  Empirical Methods in Natural Language Processing, 2007.   (27% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/jbg-EMNLP07.pdf

Variational Inference
-------------------------

Pedro Rodriguez, Joe Barrow, Alexander Hoyle, John P. Lalor, Robin Jia, and Jordan Boyd-Graber.  Evaluation Examples Are Not Equally Informative: How Should That Change NLP Leaderboards?.  Association for Computational Linguistics, 2021.    

Accessible Abstract: When can we call an AI "intelligent"?  Just like humans, a common approach is to ask them a bunch of questions.  These questions posed to modern machine learning methods are collected in metrics called leaderboards to monitor progress, but beyond ranking approaches, this does not help us better understand our problems or our systems very well.  This paper introduces probabilistic models inspired by psychometric approaches called item response theory models (think year-end standardized tests) to better understand how computers can answer questions and whether we are asking the right questions.  This allows researchers to better compare what kinds of questions systems can answer, better compare human and machine ability, and discover problematic questions (e.g., questions that have incorrect answer keys, are vague, or "trick" those trying to answer the questions).
 (21% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2021_acl_leaderboard.pdf

Varun Kumar, Alison Smith, Leah Findlater, Kevin Seppi, and Jordan Boyd-Graber.  Why Didn't You Listen to Me? Comparing User Control of Human-in-the-Loop Topic Models.  Association for Computational Linguistics, 2019. (18% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2019_acl_control.pdf

Paul Felt, Eric Ringger, Kevin Seppi, and Jordan Boyd-Graber.  Learning from Measurements in Crowdsourcing Models: Inferring Ground Truth from Diverse Annotation Types.  International Conference on Computational Linguistics, 2018. (37% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2018_coling_measurements.pdf

Aaron Gerow, Yuening Hu, Jordan Boyd-Graber, David M. Blei, and James A. Evans.  Measuring Discursive Influence Across Scholarship.  Proceedings of the National Academies of Science, 2018. 

You Lu, Jeff Lund, and Jordan Boyd-Graber.  Why ADAGRAD Fails for Online Topic Modeling.  Empirical Methods in Natural Language Processing, 2017. (18% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2017_emnlp_adagrad_olda.pdf

Stephen H. Bach, Bert Huang, Jordan Boyd-Graber, and Lise Getoor.  Paired-Dual Learning for Fast Training of Latent Variable Hinge-Loss MRFs.  International Conference on Machine Learning, 2015.  (20% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2015_icml_paired_dual.pdf

Yuening Hu, Ke Zhai, Vlad Eidelman, and Jordan Boyd-Graber.  Polylingual Tree-Based Topic Models for Translation Domain Adaptation.  Association for Computational Linguistics, 2014.  (26% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2014_acl_ptlda_mt.pdf

Ke Zhai, Jordan Boyd-Graber, and Shay B. Cohen.  Hybrid Online Inference with Adaptor Grammars.  NIPS Workshop on Advances in Variational Inference, 2014.

Ke Zhai, Jordan Boyd-Graber, and Shay B. Cohen.  Online Adaptor Grammars with Hybrid Inference.  Transactions of the Association for Computational Linguistics, 2014. 
http://umiacs.umd.edu/~jbg/docs/2014_tacl_ag_vb_online.pdf

Ke Zhai and Jordan Boyd-Graber.  Online Topic Models with Infinite Vocabulary.  International Conference on Machine Learning, 2013.    (20% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/2013_icml_infvoc.pdf

Jonathan Chang, Jordan Boyd-Graber, and David M. Blei.  Connections between the Lines: Augmenting Social Networks with Text.  Knowledge Discovery and Data Mining, 2009.     (9% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/kdd2009.pdf

Jordan Boyd-Graber and David M. Blei.  Syntactic Topic Models.  Neural Information Processing Systems, 2008.   (25% Acceptance Rate)
http://umiacs.umd.edu/~jbg/docs/nips2008.pdf