ASED ------------------------- Denis Peskov, Benny Cheng, Ahmed Elgohary Ghoneim, Joe Barrow, Cristian Danescu-Niculescu-Mizil, and Jordan Boyd-Graber. It Takes Two to Lie: One to Lie and One to Listen. Association for Computational Linguistics, 2020. Accessible Abstract: Machine learning techniques to detect deception in online communications requires training and evaluation data. However, there is a dearth of data either because of uncertain gold labels or privacy concerns; we create a new, large deception-centered dataset in the online game of Diplomacy. We gathered 17,289 messages from 12 games (each of which took over a month) involving 84 players, the majority of which were unique users. This data was collected with a custom-made bot that allowed us to collect messages and annotations. The user pool was created from scratch: we varied participant demographics across gender, age, nationality, and past game experience. Some of our participants included the former president of the Diplomacy players' association, several top ranked players in the world, a board game shop owner, and scientists. We create machine learning models to detect lies using linguistic, context, and power-dynamic features. Our best model had similar lie detection accuracy to humans. (25.4% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2020_acl_diplomacy.pdf Adaptive Heads-up Displays for Simultaneous Interpretation ------------------------- Chen Zhao, Chenyan Xiong, Xin Qian, and Jordan Boyd-Graber. Complex Factoid Question Answering with a Free-Text Knowledge Graph. ACM International Conference on World Wide Web, 2020. (19.2% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2020_www_delft.pdf Wenyan Li, Alvin Grissom II, and Jordan Boyd-Graber. An Attentive Recurrent Model for Incremental Prediction of Sentence-final Verbs. Findings of EMNLP, 2020. http://umiacs.umd.edu/~jbg/docs/2020_findings_verbs.pdf Denis Peskov, Joe Barrow, Pedro Rodriguez, Graham Neubig, and Jordan Boyd-Graber. Mitigating Noisy Inputs for Question Answering. Conference of the International Speech Communication Association, 2019. http://umiacs.umd.edu/~jbg/docs/2019_interspeech_asr Craig Stewart, Nikolai Vogler, Junjie Hu, Jordan Boyd-Graber, and Graham Neubig. Automatic Estimation of Simultaneous Interpreter Performance. Association for Computational Linguistics, 2018. (24% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2018_acl_interpeval.pdf Adobe ------------------------- Michelle Yuan, Patrick Xia, Chandler May, Benjamin Van Durme, and Jordan Boyd-Graber. Adapting Coreference Resolution Models through Active Learning. Association for Computational Linguistics, 2022. (21% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2022_acl_alcoref.pdf Fenfei Guo, Chen Zhang, Zhirui Zhang, Qixin He, Kejun Zhang, Jun Xie, and Jordan Boyd-Graber. Automatic Song Translation for Tonal Languages. Findings of the Association for Computational Linguistics, 2022. (31% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2022_acl_ast.pdf BETTER ------------------------- Yoshinari Fujinuma, Jordan Boyd-Graber, and Katharina Kann. How Does Multilingual Pretraining Affect Cross-Lingual Transferability?. Association for Computational Linguistics, 2022. (21% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2022_acl_multilingbert.pdf Wanrong He, Andrew Mao, and Jordan Boyd-Graber. Cheater's Bowl: Human vs. Computer Search Strategies for Open-Domain QA. Findings of Empirical Methods in Natural Language Processing, 2022. Accessible Abstract: When the Covid pandemic it, trivia games moved online. With it came cheating: people tried to quickly Google answers. This is bad for sportsmanship, but a good source of training data for helping teach computers how to find answers. We built an interface to harvest this training data from trivia players, fed these into retrieval-based QA systems, showing that these queries were better than the automatically generated queries used by the current state of the art. http://umiacs.umd.edu/~jbg/docs/2022_emnlp_cheaters.pdf Mozhi Zhang, Yoshinari Fujinuma, Michael J. Paul, and Jordan Boyd-Graber. Why Overfitting Isn't Always Bad: Retrofitting Cross-Lingual Word Embeddings to Dictionaries. Association for Computational Linguistics, 2020. Accessible Abstract: Computers need to represent words in a computer-readable way. This work talks about how slightly moving these representations for words in different languages to be closer to a small list of translations (like from a dictionary) after doing fancy machine learning works better on downstream tasks (e.g., guessing grammatical category of a word) but hurts on asking the algorithm for translations of unseen words. (17.6% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2020_acl_refine.pdf Michelle Yuan, Hsuan-Tien Lin, and Jordan Boyd-Graber. Cold-start Active Learning through Self-Supervised Language Modeling. Empirical Methods in Natural Language Processing, 2020. Accessible Abstract: Labeling data is a fundamental bottleneck in machine learning, especially for NLP, due to annotation cost and time. For medical text, obtaining labeled data is challenging because of privacy issues or shortage in expertise. Thus, active learning can be employed to recognize the most relevant examples and then query labels from an oracle. However, developing a strategy for selecting examples to label is non-trivial. Active learning is difficult to use in cold-start; all examples confuse the model because it has not trained on enough data. Fortunately, modern NLP provides an additional source of information: pre-trained language models. In our paper, we propose an active learning strategy called ALPS to find sentences that perplex the language model. We evaluate our approach on sentence classification datasets spanning across different domains. Results show that ALPS is an efficient active learning strategy that is competitive with state-of-the-art approaches. (25% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2020_emnlp_alps.pdf Michelle Yuan, Mozhi Zhang, Benjamin Van Durme, Leah Findlater, and Jordan Boyd-Graber. Interactive Refinement of Cross-Lingual Word Embeddings. Empirical Methods in Natural Language Processing, 2020. Accessible Abstract: Language technologies sometimes need to be quickly deployed in low-resource languages. For example, in the 2010 Haiti earthquake, researchers used machine learning models to analyze social media and text messages to gain situational awareness. We introduce CLIME, an interactive system that can help in these scenarios: users see which words related to the task the system thinks are similar, corrects the model to push similar words together and dissimilar words apart. (25% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2020_emnlp_clime.pdf CAREER ------------------------- Pedro Rodriguez, Shi Feng, Mohit Iyyer, He He, and Jordan Boyd-Graber. Quizbowl: The Case for Incremental Question Answering. ArXiv, Preprint. http://umiacs.umd.edu/~jbg/https://arxiv.org/abs/1904.04792 Shi Feng and Jordan Boyd-Graber. Learning to Explain Selectively: A Case Study on Question Answering. Empirical Methods in Natural Language Processing, 2022. Accessible Abstract: Many AI methods are a black box: input goes in, predictions come out. While there are many AI explanation tools that you can add to these predictions, how do you know if they are any good. In this work presented at EMNLP, if you put a human in front of a AI that's trying to answer questions, our hypothesis is that you can measure how good the underlying explanations are by how much the human's score goes up. This 2022 EMNLP publication not just measures which combinations of explanations are most effective for an individual. We use bandit exploration to quickly figure out what set of explanations best help a specific user. http://umiacs.umd.edu/~jbg/docs/2022_emnlp_augment.pdf HyoJung Han, Marine Carpuat, and Jordan Boyd-Graber. SimQA: Detecting Simultaneous MT Errors through Word-by-Word Question Answering. Empirical Methods in Natural Language Processing, 2022. Accessible Abstract: Simultaneous interpretation (where a translation happens word by word before the source sentence is finished) is difficult to evaluate. We created a new evaluation framework based on the following scenario: imagine that you're thrown into a trivia gameshow where you don't know the language. Specifically, it's a game format where you interrupt the question word by word as soon as possible. Our hypothesis is that a monolingual player (who doesn't speak the source language) will be able to do better in the game with a better simultaneous translation system. In this 2022 EMNLP publication, we show that this evaluation is not only cheaper (you just need to translate the answer) but can also detect hallucinations and undertranslations better than existing evaluation methods. http://umiacs.umd.edu/~jbg/docs/2022_emnlp_simqa.pdf Wanrong He, Andrew Mao, and Jordan Boyd-Graber. Cheater's Bowl: Human vs. Computer Search Strategies for Open-Domain QA. Findings of Empirical Methods in Natural Language Processing, 2022. Accessible Abstract: When the Covid pandemic it, trivia games moved online. With it came cheating: people tried to quickly Google answers. This is bad for sportsmanship, but a good source of training data for helping teach computers how to find answers. We built an interface to harvest this training data from trivia players, fed these into retrieval-based QA systems, showing that these queries were better than the automatically generated queries used by the current state of the art. http://umiacs.umd.edu/~jbg/docs/2022_emnlp_cheaters.pdf Chenglei Si, Chen Zhao, Sewon Min, and Jordan Boyd-Graber. Re-Examining Calibration: The Case of Question Answering. Findings of Empirical Methods in Natural Language Processing, 2022. Accessible Abstract: Calibration is an important problem in question answering: if a search engine or virtual assistant doesn't know the answer to a question, you should probably abstain from showing an answer (to save embarassment, as when Google said a horse had six legs). This EMNLP Findings paper shows that existing metrics to test how good a QA calibration push calibrated confidence toward the average confidence. We proposed an alternate method both for evaluation and to generate better calibration by looking how models change as they learn. http://umiacs.umd.edu/~jbg/docs/2022_emnlp_calibration.pdf Pedro Rodriguez, Joe Barrow, Alexander Hoyle, John P. Lalor, Robin Jia, and Jordan Boyd-Graber. Evaluation Examples Are Not Equally Informative: How Should That Change NLP Leaderboards?. Association for Computational Linguistics, 2021. Accessible Abstract: When can we call an AI "intelligent"? Just like humans, a common approach is to ask them a bunch of questions. These questions posed to modern machine learning methods are collected in metrics called leaderboards to monitor progress, but beyond ranking approaches, this does not help us better understand our problems or our systems very well. This paper introduces probabilistic models inspired by psychometric approaches called item response theory models (think year-end standardized tests) to better understand how computers can answer questions and whether we are asking the right questions. This allows researchers to better compare what kinds of questions systems can answer, better compare human and machine ability, and discover problematic questions (e.g., questions that have incorrect answer keys, are vague, or "trick" those trying to answer the questions). (21% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2021_acl_leaderboard.pdf Chen Zhao, Chenyan Xiong, Jordan Boyd-Graber, and Hal Daume III. Distantly-Supervised Dense Retrieval Enables Open-Domain Question Answering without Evidence Annotation. Empirical Methods in Natural Language Processing, 2021. Accessible Abstract: Answering questions sometimes requires tying multiple pieces of information together. Previous datasets have required annotators to explicitly build these reasoning chains (e.g., to answer "where do I know the cop from Die Hard from", you need to figure out that the actor's name is "Reginald VelJohnson" and then find out that he's best known as the dad on Family Matters.). By exploring search queries that get to the right answer, we're able to answer these questions without expensive annotation. (26% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2021_emnlp_weak_dpr.pdf Pedro Rodriguez and Jordan Boyd-Graber. Evaluation Paradigms in Question Answering. Empirical Methods in Natural Language Processing, 2021. Accessible Abstract: Why do we answer questions? Sometimes it's to provide information, which has been the interpretation of the computer science community. But sometimes it's to probe or test intelligence. This paper argues we should think more about that application of question answering and its connection to the foundations of artificial intelligence: The Turing Test. We thus argue that in addition to the long-standing Cranfield paradigm popularized by information retrieval, this paper proposes an alternative "Manchester paradigm" closer to the Turing test, trivia games, and education. (18% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2021_emnlp_paradigms.pdf Maharshi Gor, Kellie Webster, and Jordan Boyd-Graber. Toward Deconfounding the Influence of Subject's Demographic Characteristics in Question Answering. Empirical Methods in Natural Language Processing, 2021. Accessible Abstract: The data used to train computer question answering systems have three times as many men as women. This paper examines whether this is a problem for question answering accuracy. After a thorough investigation, we do not find evidence of serious accuracy discrepancies between languages. However, an absence of evidence is not evidence of absence, and we would argue that we need more diverse datasets to better represent the world's population. (26% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2021_emnlp_qa_fairness.pdf Chenglei Si, Chen Zhao, and Jordan Boyd-Graber. What's in a Name? Answer Equivalence For Open-Domain Question Answering. Empirical Methods in Natural Language Processing, 2021. Accessible Abstract: Is Tim Cook the same person as Timothy Donald Cook? You might think so, but the way we train computers to answer questions would say they aren't. We show that keeping track of multiple names (and it's really simple) can create better question answering systems. Simply by adding alternate answers mined from knowledge bases, we can improve accuracy 1-2 points on major QA datasets. (18% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2021_emnlp_answer_equiv.pdf Chenglei Si, Chen Zhao, and Jordan Boyd-Graber. What's in a Name? Answer Equivalence For Open-Domain Question Answering. Empirical Methods in Natural Language Processing, 2021. Accessible Abstract: Is Tim Cook the same person as Timothy Donald Cook? You might think so, but the way we train computers to answer questions would say they aren't. We show that keeping track of multiple names (and it's really simple) can create better question answering systems. Simply by adding alternate answers mined from knowledge bases, we can improve accuracy 1-2 points on major QA datasets. (18% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2021_emnlp_answer_equiv.pdf Alexander Hoyle, Pranav Goel, Denis Peskov, Andrew Hian-Cheong, Jordan Boyd-Graber, and Philip Resnik. Is Automated Topic Model Evaluation Broken?: The Incoherence of Coherence. Neural Information Processing Systems, 2021. Accessible Abstract: Topic models help historians, journalists, and analysts make sense of large text collections. But how do you know if you have a good one? The field has settled on using "Automatic Coherence", but this paper argues that maybe that isn't the right choice if you want to actually make real users happy. This paper builds on our 2009 that showed perplexity was not a good evaluation of interpretability for topic models; while the field adopted automatic topic coherence as a result of that 2009 paper, this paper argues that automatic topic coherence is not a good metric for neural topic models (even though it worked for probabilistic topic models). (26% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2021_neurips_incoherence.pdf Julian Martin Eisenschlos, Bhuwan Dhingra, Jannis Bulian, Benjamin Börschinger, and Jordan Boyd-Graber. Fool Me Twice: Entailment from Wikipedia Gamification. North American Association for Computational Linguistics, 2021. Accessible Abstract: Democracy and the free press depends on being able to recognize when facts online are true or not. For machine learning to help this critical problem, it needs good data identifying which statements are backed up by trusted sources and which are not. This research creates a game people can play online to craft difficult claims that can train computers to spot disinformation online. (28% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2021_naacl_fm2.pdf Chen Zhao, Chenyan Xiong, Hal Daume III, and Jordan Boyd-Graber. Multi-Step Reasoning Over Unstructured Text with Beam Dense Retrieval. North American Association for Computational Linguistics, 2021. Accessible Abstract: For computers to answer complicated questions online, they often need to put together multiple pieces of information (Ronald Reagan was both governor of California and an actor in Bedtime for Bonzo). However, existing approaches use the links in Wikipedia to combine these clues. This research helps computers find connected information without using these explicit links. (23% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2021_naacl_multi_ance.pdf Chen Zhao, Chenyan Xiong, Xin Qian, and Jordan Boyd-Graber. Complex Factoid Question Answering with a Free-Text Knowledge Graph. ACM International Conference on World Wide Web, 2020. (19.2% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2020_www_delft.pdf Jordan Boyd-Graber and Benjamin Börschinger. What Question Answering can Learn from Trivia Nerds. Association for Computational Linguistics, 2020. Accessible Abstract: This paper reflects on the similarities between trivia competitions and computer question answering research. Modern machine learning requires large, quality datasets. The central thesis of this article argues that the same things that make trivia tournaments good (they're fun, fair, and consistently crown the best trivia players) can also improve question answering datasets. Concretely, we argue that question answering datasets should clearly specify what answers are requested, have systematic policies to deal with natural ambiguity and variation, have authors look at the data (and help others do the same), make sure questions separate the best from the rest, and ensure people can have fun. We draw on the authors' experience in the trivia community (including embarrassing episodes on Jeopardy!) to illustrate our arguments. (25.4% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2020_acl_trivia.pdf Tianze Shi, Chen Zhao, Jordan Boyd-Graber, Hal Daume III, and Lillian Lee. On the Potential of Lexico-logical Alignments for Semantic Parsing to SQL Queries. Findings of EMNLP, 2020. http://umiacs.umd.edu/~jbg/docs/2020_findings_qalign.pdf Diggelmann, Thomas, Boyd-Graber, Jordan, Bulian, Jannis, Ciaramita, Massimiliano, and Leippold, Markus. CLIMATE-FEVER: A Dataset for Verification of Real-World Climate Claims. NIPS Workshop on Tackling Climate Change with Machine Learning, 2020. http://umiacs.umd.edu/~jbg/https://research.google/pubs/pub50541/ Eric Wallace, Shi Feng, and Jordan Boyd-Graber. Misleading Failures of Partial-input Baselines. Association for Computational Linguistics, 2019. (18% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2019_acl_flipside.pdf Ahmed Elgohary Ghoneim, Denis Peskov, and Jordan Boyd-Graber. Can You Unpack That? Learning to Rewrite Questions-in-Context. Empirical Methods in Natural Language Processing, 2019. http://umiacs.umd.edu/~jbg/docs/2019_emnlp_sequentialqa.pdf Shi Feng and Jordan Boyd-Graber. What AI can do for me: Evaluating Machine Learning Interpretations in Cooperative Play. Intelligent User Interfaces, 2019. (25% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2019_iui_augment.pdf Eric Wallace, Pedro Rodriguez, Shi Feng, Ikuya Yamada, and Jordan Boyd-Graber. Trick Me If You Can: Human-in-the-loop Generation of Adversarial Question Answering Examples. Transactions of the Association for Computational Linguistics, 2019. http://umiacs.umd.edu/~jbg/docs/2019_tacl_trick.pdf Eric Wallace and Jordan Boyd-Graber. Trick Me If You Can: Adversarial Writing of Trivia Challenge Questions. ACL Student Research Workshop, 2018. http://umiacs.umd.edu/~jbg/http://aclweb.org/anthology/P18-3018 Shi Feng, Eric Wallace, and Jordan Boyd-Graber. Interpreting Neural Networks with Nearest Neighbors. EMNLP Workshop on BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, 2018. http://umiacs.umd.edu/~jbg/http://aclweb.org/anthology/W18-5416 Ahmed Elgohary Ghoneim, Chen Zhao, and Jordan Boyd-Graber. Dataset and Baselines for Sequential Open-Domain Question Answering. Empirical Methods in Natural Language Processing, 2018. (23% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2018_emnlp_linked.pdf Shi Feng, Eric Wallace, Alvin Grissom II, Pedro Rodriguez, Mohit Iyyer, and Jordan Boyd-Graber. Pathologies of Neural Models Make Interpretation Difficult. Empirical Methods in Natural Language Processing, 2018. (26% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2018_emnlp_rs.pdf Mohit Iyyer, Varun Manjunatha, Jordan Boyd-Graber, and Larry Davis. Learning to Color from Language. North American Association for Computational Linguistics, 2018. (29% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2018_naacl_colorization.pdf Jordan Boyd-Graber, Shi Feng, and Pedro Rodriguez. Human-Computer Question Answering: The Case for Quizbowl. The NIPS '17 Competition: Building Intelligent Systems, 2018. http://umiacs.umd.edu/~jbg/docs/2018_nips_qbcomp.pdf Closing the Loop ------------------------- Pedro Rodriguez, Shi Feng, Mohit Iyyer, He He, and Jordan Boyd-Graber. Quizbowl: The Case for Incremental Question Answering. ArXiv, Preprint. http://umiacs.umd.edu/~jbg/https://arxiv.org/abs/1904.04792 Alison Smith, Jordan Boyd-Graber, Ron Fan, Melissa Birchfield, Tongshuang Wu, Dan Weld, and Leah Findlater. No Explainability without Accountability: An Empirical Study of Explanations and Feedback in Interactive ML. Computer-Human Interaction, 2020. (24% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2020_chi_explanation.pdf Alison Smith, Varun Kumar, Jordan Boyd-Graber, Kevin Seppi, and Leah Findlater. Digging into User Control: Perceptions of Adherence and Instability in Transparent Models. Intelligent User Interfaces, 2020. (23% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2020_iui_control.pdf Fenfei Guo, Jordan Boyd-Graber, Mohit Iyyer, and Leah Findlater. Which Evaluations Uncover Sense Representations that Actually Make Sense?. Linguistic Resources and Evaluation Conference, 2020. http://umiacs.umd.edu/~jbg/docs/2020_lrec_sense.pdf Jeffrey Lund, Piper Armstrong, Wilson Fearn, Stephen Cowley, Courtni Byun, Jordan Boyd-Graber, and Kevin Seppi. Automatic and Human Evaluation of Local Topic Quality. Association for Computational Linguistics, 2019. (26% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2019_acl_local.pdf Varun Kumar, Alison Smith, Leah Findlater, Kevin Seppi, and Jordan Boyd-Graber. Why Didn't You Listen to Me? Comparing User Control of Human-in-the-Loop Topic Models. Association for Computational Linguistics, 2019. (18% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2019_acl_control.pdf Dasha Pruss, Yoshinari Fujinuma, Ashlynn Daughton, Michael Paul, Brad Arnot, Danielle Szafir, and Jordan Boyd-Graber. Zika discourse in the Americas: A multilingual topic analysis of Twitter. PlosOne, 2019. http://umiacs.umd.edu/~jbg/https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0216922 Eric Wallace, Pedro Rodriguez, Shi Feng, Ikuya Yamada, and Jordan Boyd-Graber. Trick Me If You Can: Human-in-the-loop Generation of Adversarial Question Answering Examples. Transactions of the Association for Computational Linguistics, 2019. http://umiacs.umd.edu/~jbg/docs/2019_tacl_trick.pdf Alison Smith, Varun Kumar, Jordan Boyd-Graber, Kevin Seppi, and Leah Findlater. User-Centered Design and Evaluation of a Human-in-the-Loop Topic Modeling System. Intelligent User Interfaces, 2018.Alison won a best student paper honorable mention (3 out of 300) (23% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2018_iui_itm.pdf Paul Felt, Eric Ringger, Kevin Seppi, and Jordan Boyd-Graber. Learning from Measurements in Crowdsourcing Models: Inferring Ground Truth from Diverse Annotation Types. International Conference on Computational Linguistics, 2018. (37% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2018_coling_measurements.pdf Jeff Lund, Connor Cook, Kevin Seppi, and Jordan Boyd-Graber. Tandem Anchoring: A Multiword Anchor Approach for Interactive Topic Modeling. Association for Computational Linguistics, 2017. (22% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2017_acl_multiword_anchors.pdf Alison Smith, Varun Kumar, Jordan Boyd-Graber, Kevin Seppi, and Leah Findlater. Accounting for Input Uncertainty in Human-in-the-Loop Systems. CHI 2017 Designing for Uncertainty Workshop, 2017. http://umiacs.umd.edu/~jbg/http://visualization.ischool.uw.edu/hci_uncertainty/papers/Paper11.pdf You Lu, Jeff Lund, and Jordan Boyd-Graber. Why ADAGRAD Fails for Online Topic Modeling. Empirical Methods in Natural Language Processing, 2017. (18% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2017_emnlp_adagrad_olda.pdf Tak Yeon Lee, Alison Smith, Kevin Seppi, Niklas Elmqvist, Jordan Boyd-Graber, and Leah Findlater. The Human Touch: How Non-expert Users Perceive, Interpret, and Fix Topic Models. International Journal of Human-Computer Studies, 2017. http://umiacs.umd.edu/~jbg/docs/2017_ijhcs_human_touch.pdf Jordan Boyd-Graber. Humans and Computers Working Together to Measure Machine Learning Interpretability. The Bridge, 2017. Alison Smith, Tak Yeon Lee, Forough Poursabzi-Sangdeh, Jordan Boyd-Graber, Kevin Seppi, Niklas Elmqvist, and Leah Findlater. Evaluating Visual Representations for Topic Understanding and Their Effects on Manually Generated Labels. Transactions of the Association for Computational Linguistics, 2017. http://umiacs.umd.edu/~jbg/docs/2017_tacl_eval_tm_viz.pdf Forough Poursabzi-Sangdeh, Jordan Boyd-Graber, Leah Findlater, and Kevin Seppi. ALTO: Active Learning with Topic Overviews for Speeding Label Induction and Document Labeling. Association for Computational Linguistics, 2016. (28% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2016_acl_doclabel.pdf Alison Smith, Tak Yeon Lee, Forough Poursabzi-Sangdeh, Jordan Boyd-Graber, Kevin Seppi, Niklas Elmqvist, and Leah Findlater. Human-Centered and Interactive: Expanding the Impact of Topic Models. CHI Human Centred Machine Learning Workshop, 2016. Md Arafat Sultan, Jordan Boyd-Graber, and Tamara Sumner. Bayesian Supervised Domain Adaptation for Short Text Similarity. North American Association for Computational Linguistics, 2016. (24% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2016_naacl_sts.pdf Viet-An Nguyen, Jordan Boyd-Graber, Philip Resnik, and Kristina Miler. Tea Party in the House: A Hierarchical Ideal Point Topic Model and Its Application to Republican Legislators in the 112th Congress. Association for Computational Linguistics, 2015. Accessible Abstract: In the mid 2010s, the Republican party in the United States diverged: mainstream conservatives split from the so-called "tea party" caucus. However, the primary statistical tool for analyzing political factions in legislative bodies (ideal point models) fail to account for these changes. This is because the schism is not fully reflected in voting patterns but rather in how politicians present themselves: thus we need to extend these models to capture not just how politicians vote but also how they frame particular issues. This paper proposes a new model to capture framing differences within a voting block to start explaining the new subcoalitions of the republican caucus. (25% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2015_acl_teaparty.pdf Paul Felt, Eric Ringger, Jordan Boyd-Graber, and Kevin Seppi. Making the Most of Crowdsourced Document Annotations: Confused Supervised LDA. Conference on Computational Natural Language Learning, 2015. This paper received the best paper award at CoNLL (30% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2015_conll_cslda.pdf Yi Yang, Doug Downey, and Jordan Boyd-Graber. Efficient Methods for Incorporating Knowledge into Topic Models. Empirical Methods in Natural Language Processing, 2015. (24% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2015_emnlp_fast_priors.pdf Stephen H. Bach, Bert Huang, Jordan Boyd-Graber, and Lise Getoor. Paired-Dual Learning for Fast Training of Latent Variable Hinge-Loss MRFs. International Conference on Machine Learning, 2015. (20% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2015_icml_paired_dual.pdf Thang Nguyen, Jordan Boyd-Graber, Jeff Lund, Kevin Seppi, and Eric Ringger. Is your anchor going up or down? Fast and accurate supervised topic models. North American Association for Computational Linguistics, 2015. (26% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2015_naacl_supervised_anchor.pdf Alison Smith, Jason Chuang, Yuening Hu, Jordan Boyd-Graber, and Leah Findlater. Concurrent Visualization of Relationships between Words and Topics in Topic Models. ACL Workshop on Workshop on Interactive Language Learning, Visualization, and Interfaces, 2014. Yuening Hu, Jordan Boyd-Graber, Brianna Satinoff, and Alison Smith. Interactive Topic Modeling. Machine Learning, 2014. http://umiacs.umd.edu/~jbg/docs/2014_mlj_itm.pdf Yuening Hu, Jordan Boyd-Graber, and Brianna Satinoff. Interactive Topic Modeling. Association for Computational Linguistics, 2011. (25% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/itm.pdf Cross-Language Bayesian Models for Web-Scale Text Analysis ------------------------- Thang Nguyen, Yuening Hu, and Jordan Boyd-Graber. Anchors Regularized: Adding Robustness and Extensibility to Scalable Topic-Modeling Algorithms. Association for Computational Linguistics, 2014. (26% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2014_acl_anchor_reg.pdf Mohit Iyyer, Peter Enns, Jordan Boyd-Graber, and Philip Resnik. Political Ideology Detection Using Recursive Neural Networks. Association for Computational Linguistics, 2014. (26% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2014_acl_rnn_ideology.pdf Yuening Hu, Jordan Boyd-Graber, Brianna Satinoff, and Alison Smith. Interactive Topic Modeling. Machine Learning, 2014. http://umiacs.umd.edu/~jbg/docs/2014_mlj_itm.pdf Viet-An Nguyen, Jordan Boyd-Graber, Philip Resnik, Deborah Cai, Jennifer Midberry, and Yuanxin Wang. Modeling Topic Control to Detect Influence in Conversations using Nonparametric Topic Models. Machine Learning, 2014. http://umiacs.umd.edu/~jbg/docs/2014_mlj_influencer.pdf Ke Zhai, Jordan Boyd-Graber, and Shay B. Cohen. Hybrid Online Inference with Adaptor Grammars. NIPS Workshop on Advances in Variational Inference, 2014. Ke Zhai, Jordan Boyd-Graber, and Shay B. Cohen. Online Adaptor Grammars with Hybrid Inference. Transactions of the Association for Computational Linguistics, 2014. http://umiacs.umd.edu/~jbg/docs/2014_tacl_ag_vb_online.pdf Jordan Boyd-Graber, Kimberly Glasgow, and Jackie Sauter Zajac. Spoiler Alert: Machine Learning Approaches to Detect Social Media Posts with Revelatory Information. ASIST 2013: The 76th Annual Meeting of the American Society for Information Science and Technology, 2013. http://umiacs.umd.edu/~jbg/docs/2013_spoiler.pdf Ke Zhai and Jordan Boyd-Graber. Online Topic Models with Infinite Vocabulary. International Conference on Machine Learning, 2013. (20% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2013_icml_infvoc.pdf Viet-An Nguyen, Jordan Boyd-Graber, and Stephen Altschul. Dirichlet Mixtures, the Dirichlet Process, and the Structure of Protein Space. Journal of Computational Biology, 2013. http://umiacs.umd.edu/~jbg/docs/2013_dp_protein.pdf Yuening Hu, Jordan Boyd-Graber, Hal Daume III, and Z. Irene Ying. Binary to Bushy: Bayesian Hierarchical Clustering with the Beta Coalescent. Neural Information Processing Systems, 2013. (25% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2013_coalescent.pdf Viet-An Nguyen, Jordan Boyd-Graber, and Philip Resnik. Lexical and Hierarchical Topic Regression. Neural Information Processing Systems, 2013. (25% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2013_shlda.pdf Viet-An Nguyen, Yuening Hu, Jordan Boyd-Graber, and Philip Resnik. Argviz: Interactive Visualization of Topic Dynamics in Multi-party Conversations. North American Association for Computational Linguistics, 2013. (50% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2013_argviz.pdf Naho Orita, Rebecca McKeown, Naomi H. Feldman, Jeffrey Lidz, and Jordan Boyd-Graber. Discovering Pronoun Categories using Discourse Information. Proceedings of the Cognitive Science Society, 2013. http://umiacs.umd.edu/~jbg/docs/2013_cogsci_pronoun.pdf Ke Zhai, Jordan Boyd-Graber, Nima Asadi, and Mohamad (Jude) Alkhouja. Mr. LDA: A Flexible Large Scale Topic Modeling Package using Variational Inference in MapReduce. ACM International Conference on World Wide Web, 2012. (12% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2012_www_mrlda.pdf Yuening Hu and Jordan Boyd-Graber. Efficient Tree-Based Topic Modeling. Association for Computational Linguistics, 2012. (21% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/acl_2012_fttm.pdf Vladimir Eidelman, Jordan Boyd-Graber, and Philip Resnik. Topic Models for Dynamic Translation Model Adaptation. Association for Computational Linguistics, 2012. For a more thorough evaluation and an exploration of more advanced topic models for machine translation, see: Yuening Hu, Ke Zhai, Vlad Eidelman, and Jordan Boyd-Graber. Polylingual Tree-Based Topic Models for Translation Domain Adaptation. Association for Computational Linguistics, 2014. (21% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/acl_2012_tm_for_mt.pdf Viet-An Nguyen, Jordan Boyd-Graber, and Philip Resnik. SITS: A Hierarchical Nonparametric Model using Speaker Identity for Topic Segmentation in Multiparty Conversations. Association for Computational Linguistics, 2012. (19% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/acl_2012_sits.pdf Yuening Hu, Ke Zhai, Sinead Williamson, and Jordan Boyd-Graber. Modeling Images using Transformed Indian Buffet Processes. International Conference on Machine Learning, 2012. (27% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/mtibp_icml_2012.pdf DADC ------------------------- HyoJung Han, Marine Carpuat, and Jordan Boyd-Graber. Automatic Explicitation to Bridge the Background Knowledge Gap in Translation and its Evaluation with Multilingual QA. Empirical Methods in Natural Language Processing, 2023. Accessible Abstract: Sometimes when you a translating from one language to another, a literal translation is not enough. Sometimes to actually understand what is being said, you need additional context. Professional translators know this, and the process that they use to help a listener is called "explicitation" to capturing cultural differences between source and target audiences. We introduce techniques for automatically generating explicitations, motivated by WikiExpl(a dataset collected from Wikipedia and annotate with human translators), and evaluate the explicitation. (23% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2023_emnlp_explicitation.pdf Sander V Schulhoff, Jeremy Pinto, Anaum Khan, Louis-François Bouchard, Chenglei Si, Jordan Lee Boyd-Graber, Svetlina Anati, Valen Tagliabue, Anson Liu Kost, and Christopher R Carnahan. Ignore This Title and HackAPrompt: Exposing Systemic Vulnerabilities of LLMs Through a Global Prompt Hacking Competition. Empirical Methods in Natural Language Processing, 2023. This paper was selected as the Best Theme Paper at EMNLP 2023 (1 of 4909) Accessible Abstract: As more AI services online are provided by prompted language models, we need to be aware of the weaknesses and exploits of the models. We present the HackAPrompt competition to help elicit a broad array of exploits that get around large langauge models. (23% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2023_emnlp_hackaprompt.pdf Chenglei Si, Weijia Shi, Chen Zhao, Luke Zettlemoyer, and Jordan Lee Boyd-Graber. Getting MoRE out of Mixture of Language Model Reasoning Experts. Findings of Empirical Methods in Natural Language Processing, 2023. Accessible Abstract: There are many ways for a computer to answer a question: a general knowledge question, a common sense question, or a math question. Each of these types of questions can be answered by a particular kind of expert. This paper investigates if we can automatically detect what kind of expert is best suited to answer a question and route the question to the correct expert. (45% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2023_findings_more.pdf Chenglei Si, Zhe Gan, Zhengyuan Yang, Shuohang Wang, Jianfeng Wang, Jordan Boyd-Graber, and Lijuan Wang. Prompting GPT-3 To Be Reliable. International Conference on Learning Representations, 2023. http://umiacs.umd.edu/~jbg/docs/2023_iclr_reliable.pdf DADC ------------------------- Yoo Yeon Sung, Naeemul Hassan, and Jordan Boyd-Graber. Not all Fake News is Written: A Dataset and Analysis of Misleading Video Headlines. Empirical Methods in Natural Language Processing, 2023. Accessible Abstract: Misinformation online is not all text-based. More information is being consumed in video form, and both social media companies and external monitors need to know when misleading videos are being shared online. We create a new dataset of misleading videos and describe what makes the problem so challenging. (23% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2023_emnlp_videoheadline.pdf FACT ------------------------- Chenglei Si, Navita Goyal, Tongshuang Wu, Chen Zhao, Shi Feng, Hal Daume III, and Jordan Lee Boyd-Graber. Large Language Models Help Humans Verify Truthfulness---Except When They Are Convincingly Wrong. North American Association for Computational Linguistics, 2024. LORELEI ------------------------- Daniel Peterson, Jordan Boyd-Graber, Martha Palmer, and Daisuke Kawahara. Leveraging VerbNet to build Corpus-Specific Verb Clusters. Proceedings of the Fifth Joint Conference on Lexical and Computational Semantics, 2016. http://umiacs.umd.edu/~jbg/https://aclanthology.org/S16-2012/ LORELEI ------------------------- Mozhi Zhang, Yoshinari Fujinuma, and Jordan Boyd-Graber. Exploiting Cross-Lingual Subword Similarities in Low-Resource Document Classification. Association for the Advancement of Artificial Intelligence, 2020. (20.6% Acceptance Rate) http://umiacs.umd.edu/~jbg/https://arxiv.org/abs/1812.09617 Yoshinari Fujinuma, Michael Paul, and Jordan Boyd-Graber. A Resource-Free Evaluation Metric for Cross-Lingual Word Embeddings Based on Graph Modularity. Association for Computational Linguistics, 2019. (26% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2019_acl_modularity.pdf Mozhi Zhang, Keyulu Xu, Ken-ichi Kawarabayashi, Stefanie Jegelka, and Jordan Boyd-Graber. Are Girls Neko or Shōjo? Cross-Lingual Alignment of Non-Isomorphic Embeddings with Iterative Normalization. Association for Computational Linguistics, 2019. (18.3% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2019_acl_clwe.pdf Weiwei Yang, Jordan Boyd-Graber, and Philip Resnik. A Multilingual Topic Model for Learning Weighted Topic Links Across Incomparable Corpora. Empirical Methods in Natural Language Processing, 2019. http://umiacs.umd.edu/~jbg/docs/2019_emnlp_mtm.pdf Shi Feng and Jordan Boyd-Graber. What AI can do for me: Evaluating Machine Learning Interpretations in Cooperative Play. Intelligent User Interfaces, 2019. (25% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2019_iui_augment.pdf Shi Feng, Eric Wallace, and Jordan Boyd-Graber. Interpreting Neural Networks with Nearest Neighbors. EMNLP Workshop on BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, 2018. http://umiacs.umd.edu/~jbg/http://aclweb.org/anthology/W18-5416 Shi Feng, Eric Wallace, Alvin Grissom II, Pedro Rodriguez, Mohit Iyyer, and Jordan Boyd-Graber. Pathologies of Neural Models Make Interpretation Difficult. Empirical Methods in Natural Language Processing, 2018. (26% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2018_emnlp_rs.pdf Michelle Yuan, Benjamin Van Durme, and Jordan Boyd-Graber. Multilingual Anchoring: Interactive Topic Modeling and Alignment Across Languages. Neural Information Processing Systems, 2018. (21% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2018_neurips_mtanchor.pdf Shudong Hao, Michael J. Paul, and Jordan Boyd-Graber. Lessons from the Bible on Modern Topics: Multilingual Topic Model Evaluation on Low-Resource Languages. North American Association for Computational Linguistics, 2018. (35% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2018_naacl_mltm_eval.pdf Weiwei Yang, Jordan Boyd-Graber, and Philip Resnik. Adapting Topic Models using Lexical Associations with Tree Priors. Empirical Methods in Natural Language Processing, 2017. (18% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2017_emnlp_tree_prior.pdf Weiwei Yang, Jordan Boyd-Graber, and Philip Resnik. A Discriminative Topic Model using Document Network Structure. Association for Computational Linguistics, 2016. (28% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2016_acl_docblock.pdf Weiwei Yang, Jordan Boyd-Graber, and Philip Resnik. Birds of a Feather in the Same Nest: A Discriminative Topic Model using Block-based Priors. Mid-Atlantic Student Colloquium on Speech, Language, and Learning, 2016. Weiwei Yang, Jordan Boyd-Graber, and Philip Resnik. Birds of a Feather Linked Together: A Discriminative Topic Model using Link-based Priors. Empirical Methods in Natural Language Processing, 2015. (28% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2015_emnlp_hinge_link.pdf NIST ------------------------- Ishani Mondal, Shwetha S, Anandhavelu Natarajan, Aparna Garimella, Sambaran Bandyopadhyay, and Jordan Lee Boyd-Graber. Presentations by the People, for the People: Harnessing LLMs for Generating Persona-Aware Slides from Documents. European Association for Computational Linguistics, 2024. (21% Acceptance Rate) Zongxia Li, Andrew Mao, Daniel Kofi Stephens, Pranav Goel, Emily Walpole, Juan Francisco Fung, Alden Dima, and Jordan Lee Boyd-Graber. TENOR: Topic Enabled Neural Organization and Recommendation: Evaluating Topic Models in Task Based Settings. European Association for Computational Linguistics, 2024. (21% Acceptance Rate) Rosie ------------------------- Srikanth, Neha Pundlik, Rupak Sarkar, Mane, Heran Y., Aparicio, Elizabeth M., Nguyen, Quynh C., Rachel Rudinger, and Jordan Boyd-Graber. Large Language Models Help Humans Verify Truthfulness---Except When They Are Convincingly Wrong. North American Association for Computational Linguistics, 2024. Rosie ------------------------- Quynh C. Nguyen, Elizabeth M. Aparicio, Michelle Jasczynski, Amara Channell Doig, Xiaohe Yue, Heran Mane, Neha Pundlik Srikanth, Francia Ximena Marin Gutierrez, Nataly Delcid, Xin He, and Jordan Boyd-Graber. Randomized Pilot of Rosie, a Health Education Question-and-Answer Chatbot for New Mothers. Journal of Medical Internet Research: Journal of Formative Research, 2024. Mane, Heran Y., Channell Doig, Amara, Marin Gutierrez, Francia Ximena, Jasczynski, Michelle, Yue, Xiaohe, Srikanth, Neha Pundlik, Mane, Sourabh, Sun, Abby, Moats, Rachel Ann, Patel, Pragat, He, Xin, Boyd-Graber, Jordan Lee, Aparicio, Elizabeth M., and Nguyen, Quynh C.. Practical Guidance for the Development of Rosie, a Health Education Question-and-Answer Chatbot for New Mothers. Journal of Public Health Management and Practice, 2023. http://umiacs.umd.edu/~jbg/https://journals.lww.com/jphmp/fulltext/2023/09000/practical_guidance_for_the_development_of_rosie,_a.9.aspx Scaling Insight ------------------------- Aaron Gerow, Yuening Hu, Jordan Boyd-Graber, David M. Blei, and James A. Evans. Measuring Discursive Influence Across Scholarship. Proceedings of the National Academies of Science, 2018. You Lu, Jeff Lund, and Jordan Boyd-Graber. Why ADAGRAD Fails for Online Topic Modeling. Empirical Methods in Natural Language Processing, 2017. (18% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2017_emnlp_adagrad_olda.pdf Forough Poursabzi-Sangdeh, Jordan Boyd-Graber, Leah Findlater, and Kevin Seppi. ALTO: Active Learning with Topic Overviews for Speeding Label Induction and Document Labeling. Association for Computational Linguistics, 2016. (28% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2016_acl_doclabel.pdf Evgeny Klochikhin and Jordan Boyd-Graber. Text Analysis. Big Data and Social Science Research: Theory and Practical Approaches, 2016. Forough Poursabzi-Sangdeh and Jordan Boyd-Graber. Speeding Document Annotation with Topic Models. NAACL Student Research Workshop, 2015. Thinking on Your Feet ------------------------- Mohit Iyyer, Varun Manjunatha, Anupam Guha, Yogarshi Vyas, Jordan Boyd-Graber, Hal Daume III, and Larry Davis. The Amazing Mysteries of the Gutter: Drawing Inferences Between Panels in Comic Book Narratives. Computer Vision and Pattern Recognition, 2017. (30% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2017_cvpr_comics.pdf Khanh Nguyen, Jordan Boyd-Graber, and Hal Daume III. Reinforcement Learning for Bandit Neural Machine Translation with Simulated Human Feedback. Empirical Methods in Natural Language Processing, 2017. (26% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2017_emnlp_bandit_mt.pdf Alvin Grissom II, Naho Orita, and Jordan Boyd-Graber. Incremental Prediction of Sentence-final Verbs. Conference on Computational Natural Language Learning, 2016. (20% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2016_conll_verbpred.pdf He He, Jordan Boyd-Graber, Kevin Kwok, and Hal Daume III. Opponent Modeling in Deep Reinforcement Learning. International Conference on Machine Learning, 2016. (24% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2016_icml_opponent.pdf Anupam Guha, Mohit Iyyer, and Jordan Boyd-Graber. A Distorted Skull Lies in the Bottom Center: Identifying Paintings from Text Descriptions. NAACL Human-Computer Question Answering Workshop, 2016. http://umiacs.umd.edu/~jbg/docs/2016_naacl_paintings.pdf Mohit Iyyer, Anupam Guha, Snigdha Chaturvedi, Jordan Boyd-Graber, and Hal Daume III. Feuding Families and Former Friends: Unsupervised Learning for Dynamic Fictional Relationships. North American Association for Computational Linguistics, 2016. Best paper award (2 out of 1592) (24% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2016_naacl_relationships.pdf He He, Jordan Boyd-Graber, and Hal Daume III. Interpretese vs. Translationese: The Uniqueness of Human Strategies in Simultaneous Interpretation. North American Association for Computational Linguistics, 2016. (29% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2016_naacl_interpretese.pdf Mohit Iyyer, Varun Manjunatha, Jordan Boyd-Graber, and Hal Daume III. Deep Unordered Composition Rivals Syntactic Methods for Text Classification. Association for Computational Linguistics, 2015. (25% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2015_acl_dan.pdf Vlad Niculae, Srijan Kumar, Jordan Boyd-Graber, and Cristian Danescu-Niculescu-Mizil. Linguistic Harbingers of Betrayal: A Case Study on an Online Strategy Game. Association for Computational Linguistics, 2015. Accessible Abstract: This paper introduces the application of natural language processing techniques to understand the relationships (and their dissolution) in the game of Diplomacy. This popular board game simulates Europe at the eve of World War I and forces players to work with each other to forge alliances and make plans together. However, the game's setup also encourages players to turn against each other. This paper analyzes whether we can predict these betrayals (we can!) and the linguistic and social phenomena (demands, politeness, and planning) that can predict when a betrayal will happen. (25% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2015_acl_diplomacy.pdf He He, Alvin Grissom II, Jordan Boyd-Graber, and Hal Daume III. Syntax-based Rewriting for Simultaneous Machine Translation. Empirical Methods in Natural Language Processing, 2015. (24% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2015_emnlp_rewrite.pdf Jordan Boyd-Graber, Mohit Iyyer, He He, and Hal Daume III. Interactive Incremental Question Answering. Neural Information Processing Systems, 2015.This won the best demonstration award at NIPS 2015 Anupam Guha, Mohit Iyyer, Danny Bouman, and Jordan Boyd-Graber. Removing the Training Wheels: A Coreference Dataset that Entertains Humans and Challenges Computers. North American Association for Computational Linguistics, 2015. (26% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2015_naacl_qb_coref.pdf Mohit Iyyer, Jordan Boyd-Graber, Leonardo Claudino, Richard Socher, and Hal Daume III. A Neural Network for Factoid Question Answering over Paragraphs. Empirical Methods in Natural Language Processing, 2014. The partial derivatives of "C" and "J" with respect to the parameters should be switched in Equation 7. (26% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2014_emnlp_qb_rnn.pdf Alvin Grissom II, He He, Jordan Boyd-Graber, John Morgan, and Hal Daume III. Don't Until the Final Verb Wait: Reinforcement Learning for Simultaneous Machine Translation. Empirical Methods in Natural Language Processing, 2014. (30% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2014_emnlp_simtrans.pdf Mohit Iyyer, Jordan Boyd-Graber, and Hal Daume III. Generating Sentences from Semantic Vector Space Representations. NIPS Workshop on Learning Semantics, 2014. Jordan Boyd-Graber, Brianna Satinoff, He He, and Hal Daume III. Besting the Quiz Master: Crowdsourcing Incremental Classification Games. Empirical Methods in Natural Language Processing, 2012. (25% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/qb_emnlp_2012.pdf