Ahmed Elgohary ------------------------- Denis Peskov, Benny Cheng, Ahmed Elgohary Ghoneim, Joe Barrow, Cristian Danescu-Niculescu-Mizil, and Jordan Boyd-Graber. It Takes Two to Lie: One to Lie and One to Listen. Association for Computational Linguistics, 2020. Accessible Abstract: Machine learning techniques to detect deception in online communications requires training and evaluation data. However, there is a dearth of data either because of uncertain gold labels or privacy concerns; we create a new, large deception-centered dataset in the online game of Diplomacy. We gathered 17,289 messages from 12 games (each of which took over a month) involving 84 players, the majority of which were unique users. This data was collected with a custom-made bot that allowed us to collect messages and annotations. The user pool was created from scratch: we varied participant demographics across gender, age, nationality, and past game experience. Some of our participants included the former president of the Diplomacy players' association, several top ranked players in the world, a board game shop owner, and scientists. We create machine learning models to detect lies using linguistic, context, and power-dynamic features. Our best model had similar lie detection accuracy to humans. (25.4% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2020_acl_diplomacy.pdf Ahmed Elgohary Ghoneim, Denis Peskov, and Jordan Boyd-Graber. Can You Unpack That? Learning to Rewrite Questions-in-Context. Empirical Methods in Natural Language Processing, 2019. http://umiacs.umd.edu/~jbg/docs/2019_emnlp_sequentialqa.pdf Ahmed Elgohary Ghoneim, Chen Zhao, and Jordan Boyd-Graber. Dataset and Baselines for Sequential Open-Domain Question Answering. Empirical Methods in Natural Language Processing, 2018. (23% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2018_emnlp_linked.pdf Alexander Hoyle ------------------------- Pedro Rodriguez, Joe Barrow, Alexander Hoyle, John P. Lalor, Robin Jia, and Jordan Boyd-Graber. Evaluation Examples Are Not Equally Informative: How Should That Change NLP Leaderboards?. Association for Computational Linguistics, 2021. Accessible Abstract: When can we call an AI "intelligent"? Just like humans, a common approach is to ask them a bunch of questions. These questions posed to modern machine learning methods are collected in metrics called leaderboards to monitor progress, but beyond ranking approaches, this does not help us better understand our problems or our systems very well. This paper introduces probabilistic models inspired by psychometric approaches called item response theory models (think year-end standardized tests) to better understand how computers can answer questions and whether we are asking the right questions. This allows researchers to better compare what kinds of questions systems can answer, better compare human and machine ability, and discover problematic questions (e.g., questions that have incorrect answer keys, are vague, or "trick" those trying to answer the questions). (21% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2021_acl_leaderboard.pdf Alexander Hoyle, Pranav Goel, Denis Peskov, Andrew Hian-Cheong, Jordan Boyd-Graber, and Philip Resnik. Is Automated Topic Model Evaluation Broken?: The Incoherence of Coherence. Neural Information Processing Systems, 2021. Accessible Abstract: Topic models help historians, journalists, and analysts make sense of large text collections. But how do you know if you have a good one? The field has settled on using "Automatic Coherence", but this paper argues that maybe that isn't the right choice if you want to actually make real users happy. This paper builds on our 2009 that showed perplexity was not a good evaluation of interpretability for topic models; while the field adopted automatic topic coherence as a result of that 2009 paper, this paper argues that automatic topic coherence is not a good metric for neural topic models (even though it worked for probabilistic topic models). (26% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2021_neurips_incoherence.pdf Alison Smith ------------------------- Alison Smith, Jordan Boyd-Graber, Ron Fan, Melissa Birchfield, Tongshuang Wu, Dan Weld, and Leah Findlater. No Explainability without Accountability: An Empirical Study of Explanations and Feedback in Interactive ML. Computer-Human Interaction, 2020. (24% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2020_chi_explanation.pdf Alison Smith, Varun Kumar, Jordan Boyd-Graber, Kevin Seppi, and Leah Findlater. Digging into User Control: Perceptions of Adherence and Instability in Transparent Models. Intelligent User Interfaces, 2020. (23% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2020_iui_control.pdf Varun Kumar, Alison Smith, Leah Findlater, Kevin Seppi, and Jordan Boyd-Graber. Why Didn't You Listen to Me? Comparing User Control of Human-in-the-Loop Topic Models. Association for Computational Linguistics, 2019. (18% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2019_acl_control.pdf Alison Smith, Varun Kumar, Jordan Boyd-Graber, Kevin Seppi, and Leah Findlater. User-Centered Design and Evaluation of a Human-in-the-Loop Topic Modeling System. Intelligent User Interfaces, 2018.Alison won a best student paper honorable mention (3 out of 300) (23% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2018_iui_itm.pdf Alison Smith, Varun Kumar, Jordan Boyd-Graber, Kevin Seppi, and Leah Findlater. Accounting for Input Uncertainty in Human-in-the-Loop Systems. CHI 2017 Designing for Uncertainty Workshop, 2017. http://umiacs.umd.edu/~jbg/http://visualization.ischool.uw.edu/hci_uncertainty/papers/Paper11.pdf Tak Yeon Lee, Alison Smith, Kevin Seppi, Niklas Elmqvist, Jordan Boyd-Graber, and Leah Findlater. The Human Touch: How Non-expert Users Perceive, Interpret, and Fix Topic Models. International Journal of Human-Computer Studies, 2017. http://umiacs.umd.edu/~jbg/docs/2017_ijhcs_human_touch.pdf Alison Smith, Tak Yeon Lee, Forough Poursabzi-Sangdeh, Jordan Boyd-Graber, Kevin Seppi, Niklas Elmqvist, and Leah Findlater. Evaluating Visual Representations for Topic Understanding and Their Effects on Manually Generated Labels. Transactions of the Association for Computational Linguistics, 2017. http://umiacs.umd.edu/~jbg/docs/2017_tacl_eval_tm_viz.pdf Alison Smith, Tak Yeon Lee, Forough Poursabzi-Sangdeh, Jordan Boyd-Graber, Kevin Seppi, Niklas Elmqvist, and Leah Findlater. Human-Centered and Interactive: Expanding the Impact of Topic Models. CHI Human Centred Machine Learning Workshop, 2016. Alison Smith, Jason Chuang, Yuening Hu, Jordan Boyd-Graber, and Leah Findlater. Concurrent Visualization of Relationships between Words and Topics in Topic Models. ACL Workshop on Workshop on Interactive Language Learning, Visualization, and Interfaces, 2014. Yuening Hu, Jordan Boyd-Graber, Brianna Satinoff, and Alison Smith. Interactive Topic Modeling. Machine Learning, 2014. http://umiacs.umd.edu/~jbg/docs/2014_mlj_itm.pdf Alvin Grissom II ------------------------- Wenyan Li, Alvin Grissom II, and Jordan Boyd-Graber. An Attentive Recurrent Model for Incremental Prediction of Sentence-final Verbs. Findings of EMNLP, 2020. http://umiacs.umd.edu/~jbg/docs/2020_findings_verbs.pdf Alvin Grissom II, Naho Orita, and Jordan Boyd-Graber. Incremental Prediction of Sentence-final Verbs. Conference on Computational Natural Language Learning, 2016. (20% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2016_conll_verbpred.pdf He He, Alvin Grissom II, Jordan Boyd-Graber, and Hal Daume III. Syntax-based Rewriting for Simultaneous Machine Translation. Empirical Methods in Natural Language Processing, 2015. (24% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2015_emnlp_rewrite.pdf Alvin Grissom II, He He, Jordan Boyd-Graber, John Morgan, and Hal Daume III. Don't Until the Final Verb Wait: Reinforcement Learning for Simultaneous Machine Translation. Empirical Methods in Natural Language Processing, 2014. (30% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2014_emnlp_simtrans.pdf Andrew Mao ------------------------- Zongxia Li, Andrew Mao, Daniel Kofi Stephens, Pranav Goel, Emily Walpole, Juan Francisco Fung, Alden Dima, and Jordan Lee Boyd-Graber. TENOR: Topic Enabled Neural Organization and Recommendation: Evaluating Topic Models in Task Based Settings. European Association for Computational Linguistics, 2024. (21% Acceptance Rate) Wanrong He, Andrew Mao, and Jordan Boyd-Graber. Cheater's Bowl: Human vs. Computer Search Strategies for Open-Domain QA. Findings of Empirical Methods in Natural Language Processing, 2022. Accessible Abstract: When the Covid pandemic it, trivia games moved online. With it came cheating: people tried to quickly Google answers. This is bad for sportsmanship, but a good source of training data for helping teach computers how to find answers. We built an interface to harvest this training data from trivia players, fed these into retrieval-based QA systems, showing that these queries were better than the automatically generated queries used by the current state of the art. http://umiacs.umd.edu/~jbg/docs/2022_emnlp_cheaters.pdf Anupam Guha ------------------------- Mohit Iyyer, Varun Manjunatha, Anupam Guha, Yogarshi Vyas, Jordan Boyd-Graber, Hal Daume III, and Larry Davis. The Amazing Mysteries of the Gutter: Drawing Inferences Between Panels in Comic Book Narratives. Computer Vision and Pattern Recognition, 2017. (30% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2017_cvpr_comics.pdf Anupam Guha, Mohit Iyyer, and Jordan Boyd-Graber. A Distorted Skull Lies in the Bottom Center: Identifying Paintings from Text Descriptions. NAACL Human-Computer Question Answering Workshop, 2016. http://umiacs.umd.edu/~jbg/docs/2016_naacl_paintings.pdf Mohit Iyyer, Anupam Guha, Snigdha Chaturvedi, Jordan Boyd-Graber, and Hal Daume III. Feuding Families and Former Friends: Unsupervised Learning for Dynamic Fictional Relationships. North American Association for Computational Linguistics, 2016. Best paper award (2 out of 1592) (24% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2016_naacl_relationships.pdf Anupam Guha, Mohit Iyyer, Danny Bouman, and Jordan Boyd-Graber. Removing the Training Wheels: A Coreference Dataset that Entertains Humans and Challenges Computers. North American Association for Computational Linguistics, 2015. (26% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2015_naacl_qb_coref.pdf Aparicio, Elizabeth M. ------------------------- Srikanth, Neha Pundlik, Rupak Sarkar, Mane, Heran Y., Aparicio, Elizabeth M., Nguyen, Quynh C., Rachel Rudinger, and Jordan Boyd-Graber. Large Language Models Help Humans Verify Truthfulness---Except When They Are Convincingly Wrong. North American Association for Computational Linguistics, 2024. Mane, Heran Y., Channell Doig, Amara, Marin Gutierrez, Francia Ximena, Jasczynski, Michelle, Yue, Xiaohe, Srikanth, Neha Pundlik, Mane, Sourabh, Sun, Abby, Moats, Rachel Ann, Patel, Pragat, He, Xin, Boyd-Graber, Jordan Lee, Aparicio, Elizabeth M., and Nguyen, Quynh C.. Practical Guidance for the Development of Rosie, a Health Education Question-and-Answer Chatbot for New Mothers. Journal of Public Health Management and Practice, 2023. http://umiacs.umd.edu/~jbg/https://journals.lww.com/jphmp/fulltext/2023/09000/practical_guidance_for_the_development_of_rosie,_a.9.aspx Benjamin Börschinger ------------------------- Benjamin Börschinger, Jordan Boyd-Graber, Christian Buck, Jannis Bulian, Massimiliano Ciaramita, Michelle Chen Huebscher, Wojciech Gajewski, Yannic Kilcher, Rodrigo Nogueira, and Lierni Sestorain Saralegu. Meta Answering for Machine Reading. ArXiv, Preprint. http://umiacs.umd.edu/~jbg/https://arxiv.org/abs/1911.04156 Julian Martin Eisenschlos, Bhuwan Dhingra, Jannis Bulian, Benjamin Börschinger, and Jordan Boyd-Graber. Fool Me Twice: Entailment from Wikipedia Gamification. North American Association for Computational Linguistics, 2021. Accessible Abstract: Democracy and the free press depends on being able to recognize when facts online are true or not. For machine learning to help this critical problem, it needs good data identifying which statements are backed up by trusted sources and which are not. This research creates a game people can play online to craft difficult claims that can train computers to spot disinformation online. (28% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2021_naacl_fm2.pdf Jordan Boyd-Graber and Benjamin Börschinger. What Question Answering can Learn from Trivia Nerds. Association for Computational Linguistics, 2020. Accessible Abstract: This paper reflects on the similarities between trivia competitions and computer question answering research. Modern machine learning requires large, quality datasets. The central thesis of this article argues that the same things that make trivia tournaments good (they're fun, fair, and consistently crown the best trivia players) can also improve question answering datasets. Concretely, we argue that question answering datasets should clearly specify what answers are requested, have systematic policies to deal with natural ambiguity and variation, have authors look at the data (and help others do the same), make sure questions separate the best from the rest, and ensure people can have fun. We draw on the authors' experience in the trivia community (including embarrassing episodes on Jeopardy!) to illustrate our arguments. (25.4% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2020_acl_trivia.pdf Benjamin Van Durme ------------------------- Michelle Yuan, Mozhi Zhang, Benjamin Van Durme, Leah Findlater, and Jordan Boyd-Graber. Interactive Refinement of Cross-Lingual Word Embeddings. Empirical Methods in Natural Language Processing, 2020. Accessible Abstract: Language technologies sometimes need to be quickly deployed in low-resource languages. For example, in the 2010 Haiti earthquake, researchers used machine learning models to analyze social media and text messages to gain situational awareness. We introduce CLIME, an interactive system that can help in these scenarios: users see which words related to the task the system thinks are similar, corrects the model to push similar words together and dissimilar words apart. (25% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2020_emnlp_clime.pdf Michelle Yuan, Benjamin Van Durme, and Jordan Boyd-Graber. Multilingual Anchoring: Interactive Topic Modeling and Alignment Across Languages. Neural Information Processing Systems, 2018. (21% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2018_neurips_mtanchor.pdf Brianna Satinoff ------------------------- Yuening Hu, Jordan Boyd-Graber, Brianna Satinoff, and Alison Smith. Interactive Topic Modeling. Machine Learning, 2014. http://umiacs.umd.edu/~jbg/docs/2014_mlj_itm.pdf Jordan Boyd-Graber, Brianna Satinoff, He He, and Hal Daume III. Besting the Quiz Master: Crowdsourcing Incremental Classification Games. Empirical Methods in Natural Language Processing, 2012. (25% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/qb_emnlp_2012.pdf Yuening Hu, Jordan Boyd-Graber, and Brianna Satinoff. Interactive Topic Modeling. Association for Computational Linguistics, 2011. (25% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/itm.pdf Brianna Satinoff and Jordan Boyd-Graber. Trivial Classification: What features do humans use for classification?. Workshop on Crowdsourcing Technologies for Language and Cognition Studies, 2011. Chen Zhao ------------------------- Chenglei Si, Navita Goyal, Tongshuang Wu, Chen Zhao, Shi Feng, Hal Daume III, and Jordan Lee Boyd-Graber. Large Language Models Help Humans Verify Truthfulness---Except When They Are Convincingly Wrong. North American Association for Computational Linguistics, 2024. Chenglei Si, Weijia Shi, Chen Zhao, Luke Zettlemoyer, and Jordan Lee Boyd-Graber. Getting MoRE out of Mixture of Language Model Reasoning Experts. Findings of Empirical Methods in Natural Language Processing, 2023. Accessible Abstract: There are many ways for a computer to answer a question: a general knowledge question, a common sense question, or a math question. Each of these types of questions can be answered by a particular kind of expert. This paper investigates if we can automatically detect what kind of expert is best suited to answer a question and route the question to the correct expert. (45% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2023_findings_more.pdf Chenglei Si, Chen Zhao, Sewon Min, and Jordan Boyd-Graber. Re-Examining Calibration: The Case of Question Answering. Findings of Empirical Methods in Natural Language Processing, 2022. Accessible Abstract: Calibration is an important problem in question answering: if a search engine or virtual assistant doesn't know the answer to a question, you should probably abstain from showing an answer (to save embarassment, as when Google said a horse had six legs). This EMNLP Findings paper shows that existing metrics to test how good a QA calibration push calibrated confidence toward the average confidence. We proposed an alternate method both for evaluation and to generate better calibration by looking how models change as they learn. http://umiacs.umd.edu/~jbg/docs/2022_emnlp_calibration.pdf Chen Zhao, Chenyan Xiong, Jordan Boyd-Graber, and Hal Daume III. Distantly-Supervised Dense Retrieval Enables Open-Domain Question Answering without Evidence Annotation. Empirical Methods in Natural Language Processing, 2021. Accessible Abstract: Answering questions sometimes requires tying multiple pieces of information together. Previous datasets have required annotators to explicitly build these reasoning chains (e.g., to answer "where do I know the cop from Die Hard from", you need to figure out that the actor's name is "Reginald VelJohnson" and then find out that he's best known as the dad on Family Matters.). By exploring search queries that get to the right answer, we're able to answer these questions without expensive annotation. (26% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2021_emnlp_weak_dpr.pdf Chenglei Si, Chen Zhao, and Jordan Boyd-Graber. What's in a Name? Answer Equivalence For Open-Domain Question Answering. Empirical Methods in Natural Language Processing, 2021. Accessible Abstract: Is Tim Cook the same person as Timothy Donald Cook? You might think so, but the way we train computers to answer questions would say they aren't. We show that keeping track of multiple names (and it's really simple) can create better question answering systems. Simply by adding alternate answers mined from knowledge bases, we can improve accuracy 1-2 points on major QA datasets. (18% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2021_emnlp_answer_equiv.pdf Chen Zhao, Chenyan Xiong, Hal Daume III, and Jordan Boyd-Graber. Multi-Step Reasoning Over Unstructured Text with Beam Dense Retrieval. North American Association for Computational Linguistics, 2021. Accessible Abstract: For computers to answer complicated questions online, they often need to put together multiple pieces of information (Ronald Reagan was both governor of California and an actor in Bedtime for Bonzo). However, existing approaches use the links in Wikipedia to combine these clues. This research helps computers find connected information without using these explicit links. (23% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2021_naacl_multi_ance.pdf Chen Zhao, Chenyan Xiong, Xin Qian, and Jordan Boyd-Graber. Complex Factoid Question Answering with a Free-Text Knowledge Graph. ACM International Conference on World Wide Web, 2020. (19.2% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2020_www_delft.pdf Tianze Shi, Chen Zhao, Jordan Boyd-Graber, Hal Daume III, and Lillian Lee. On the Potential of Lexico-logical Alignments for Semantic Parsing to SQL Queries. Findings of EMNLP, 2020. http://umiacs.umd.edu/~jbg/docs/2020_findings_qalign.pdf Ahmed Elgohary Ghoneim, Chen Zhao, and Jordan Boyd-Graber. Dataset and Baselines for Sequential Open-Domain Question Answering. Empirical Methods in Natural Language Processing, 2018. (23% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2018_emnlp_linked.pdf Chenglei Si ------------------------- Chenglei Si, Navita Goyal, Tongshuang Wu, Chen Zhao, Shi Feng, Hal Daume III, and Jordan Lee Boyd-Graber. Large Language Models Help Humans Verify Truthfulness---Except When They Are Convincingly Wrong. North American Association for Computational Linguistics, 2024. Sander V Schulhoff, Jeremy Pinto, Anaum Khan, Louis-François Bouchard, Chenglei Si, Jordan Lee Boyd-Graber, Svetlina Anati, Valen Tagliabue, Anson Liu Kost, and Christopher R Carnahan. Ignore This Title and HackAPrompt: Exposing Systemic Vulnerabilities of LLMs Through a Global Prompt Hacking Competition. Empirical Methods in Natural Language Processing, 2023. This paper was selected as the Best Theme Paper at EMNLP 2023 (1 of 4909) Accessible Abstract: As more AI services online are provided by prompted language models, we need to be aware of the weaknesses and exploits of the models. We present the HackAPrompt competition to help elicit a broad array of exploits that get around large langauge models. (23% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2023_emnlp_hackaprompt.pdf Chenglei Si, Weijia Shi, Chen Zhao, Luke Zettlemoyer, and Jordan Lee Boyd-Graber. Getting MoRE out of Mixture of Language Model Reasoning Experts. Findings of Empirical Methods in Natural Language Processing, 2023. Accessible Abstract: There are many ways for a computer to answer a question: a general knowledge question, a common sense question, or a math question. Each of these types of questions can be answered by a particular kind of expert. This paper investigates if we can automatically detect what kind of expert is best suited to answer a question and route the question to the correct expert. (45% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2023_findings_more.pdf Chenglei Si, Zhe Gan, Zhengyuan Yang, Shuohang Wang, Jianfeng Wang, Jordan Boyd-Graber, and Lijuan Wang. Prompting GPT-3 To Be Reliable. International Conference on Learning Representations, 2023. http://umiacs.umd.edu/~jbg/docs/2023_iclr_reliable.pdf Chenglei Si, Chen Zhao, Sewon Min, and Jordan Boyd-Graber. Re-Examining Calibration: The Case of Question Answering. Findings of Empirical Methods in Natural Language Processing, 2022. Accessible Abstract: Calibration is an important problem in question answering: if a search engine or virtual assistant doesn't know the answer to a question, you should probably abstain from showing an answer (to save embarassment, as when Google said a horse had six legs). This EMNLP Findings paper shows that existing metrics to test how good a QA calibration push calibrated confidence toward the average confidence. We proposed an alternate method both for evaluation and to generate better calibration by looking how models change as they learn. http://umiacs.umd.edu/~jbg/docs/2022_emnlp_calibration.pdf Chenglei Si, Chen Zhao, and Jordan Boyd-Graber. What's in a Name? Answer Equivalence For Open-Domain Question Answering. Empirical Methods in Natural Language Processing, 2021. Accessible Abstract: Is Tim Cook the same person as Timothy Donald Cook? You might think so, but the way we train computers to answer questions would say they aren't. We show that keeping track of multiple names (and it's really simple) can create better question answering systems. Simply by adding alternate answers mined from knowledge bases, we can improve accuracy 1-2 points on major QA datasets. (18% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2021_emnlp_answer_equiv.pdf Chenyan Xiong ------------------------- Chen Zhao, Chenyan Xiong, Jordan Boyd-Graber, and Hal Daume III. Distantly-Supervised Dense Retrieval Enables Open-Domain Question Answering without Evidence Annotation. Empirical Methods in Natural Language Processing, 2021. Accessible Abstract: Answering questions sometimes requires tying multiple pieces of information together. Previous datasets have required annotators to explicitly build these reasoning chains (e.g., to answer "where do I know the cop from Die Hard from", you need to figure out that the actor's name is "Reginald VelJohnson" and then find out that he's best known as the dad on Family Matters.). By exploring search queries that get to the right answer, we're able to answer these questions without expensive annotation. (26% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2021_emnlp_weak_dpr.pdf Chen Zhao, Chenyan Xiong, Hal Daume III, and Jordan Boyd-Graber. Multi-Step Reasoning Over Unstructured Text with Beam Dense Retrieval. North American Association for Computational Linguistics, 2021. Accessible Abstract: For computers to answer complicated questions online, they often need to put together multiple pieces of information (Ronald Reagan was both governor of California and an actor in Bedtime for Bonzo). However, existing approaches use the links in Wikipedia to combine these clues. This research helps computers find connected information without using these explicit links. (23% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2021_naacl_multi_ance.pdf Chen Zhao, Chenyan Xiong, Xin Qian, and Jordan Boyd-Graber. Complex Factoid Question Answering with a Free-Text Knowledge Graph. ACM International Conference on World Wide Web, 2020. (19.2% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2020_www_delft.pdf Christiane Fellbaum ------------------------- Sonya S. Nikolova, Jordan Boyd-Graber, and Christiane Fellbaum. Collecting Semantic Similarity Ratings to Connect Concepts in Assistive Communication Tools. Modeling, Learning and Processing of Text Technological Data Structures, 2011. http://umiacs.umd.edu/~jbg/docs/2011_book_chapter_evocation.pdf Sonya S. Nikolova, Jordan Boyd-Graber, Christiane Fellbaum, and Perry Cook. Better Vocabularies for Assistive Communication Aids: Connecting Terms using Semantic Networks and Untrained Annotators. ACM Conference on Computers and Accessibility, 2009. (31% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/evocation-viva.pdf Jordan Boyd-Graber, Christiane Fellbaum, Daniel Osherson, and Robert Schapire. Adding Dense, Weighted, Connections to WordNet. Proceedings of the Global WordNet Conference, 2006. http://umiacs.umd.edu/~jbg/docs/jbg-jeju.pdf Clay Templeton ------------------------- Clay Templeton, Travis Brown, Sayan Battacharyya, and Jordan Boyd-Graber. Mining the Dispatch under Supervision: Using Casualty Counts to Guide Topics from the Richmond Daily Dispatch Corpus. Chicago Colloquium on Digital Humanities and Computer Science, 2011. http://umiacs.umd.edu/~jbg/docs/slda_civil_war.pdf Clay Templeton, Kenneth R. Fleischmann, and Jordan Boyd-Graber. Simulating Audiences: Automating Analysis of Values, Attitudes, and Sentiment. IEEE International Conference on Social Computing, 2011. (10% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/simulating_audiences.pdf Clay Templeton, Kenneth R. Fleischmann, and Jordan Boyd-Graber. Comparing Values and Sentiment Using Mechanical Turk. iConference, 2011. http://umiacs.umd.edu/~jbg/docs/iconference-2011-comparing.pdf Kenneth R. Fleischmann, Clay Templeton, and Jordan Boyd-Graber. Modeling Diverse Standpoints in Text Classification: Learning to Be Human by Modeling Human Values. iConference, 2011. http://umiacs.umd.edu/~jbg/docs/iconference-2011-learning.pdf Cristian Danescu-Niculescu-Mizil ------------------------- Denis Peskov, Benny Cheng, Ahmed Elgohary Ghoneim, Joe Barrow, Cristian Danescu-Niculescu-Mizil, and Jordan Boyd-Graber. It Takes Two to Lie: One to Lie and One to Listen. Association for Computational Linguistics, 2020. Accessible Abstract: Machine learning techniques to detect deception in online communications requires training and evaluation data. However, there is a dearth of data either because of uncertain gold labels or privacy concerns; we create a new, large deception-centered dataset in the online game of Diplomacy. We gathered 17,289 messages from 12 games (each of which took over a month) involving 84 players, the majority of which were unique users. This data was collected with a custom-made bot that allowed us to collect messages and annotations. The user pool was created from scratch: we varied participant demographics across gender, age, nationality, and past game experience. Some of our participants included the former president of the Diplomacy players' association, several top ranked players in the world, a board game shop owner, and scientists. We create machine learning models to detect lies using linguistic, context, and power-dynamic features. Our best model had similar lie detection accuracy to humans. (25.4% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2020_acl_diplomacy.pdf Vlad Niculae, Srijan Kumar, Jordan Boyd-Graber, and Cristian Danescu-Niculescu-Mizil. Linguistic Harbingers of Betrayal: A Case Study on an Online Strategy Game. Association for Computational Linguistics, 2015. Accessible Abstract: This paper introduces the application of natural language processing techniques to understand the relationships (and their dissolution) in the game of Diplomacy. This popular board game simulates Europe at the eve of World War I and forces players to work with each other to forge alliances and make plans together. However, the game's setup also encourages players to turn against each other. This paper analyzes whether we can predict these betrayals (we can!) and the linguistic and social phenomena (demands, politeness, and planning) that can predict when a betrayal will happen. (25% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2015_acl_diplomacy.pdf David M. Blei ------------------------- Aaron Gerow, Yuening Hu, Jordan Boyd-Graber, David M. Blei, and James A. Evans. Measuring Discursive Influence Across Scholarship. Proceedings of the National Academies of Science, 2018. Jonathan Chang, Jordan Boyd-Graber, and David M. Blei. Connections between the Lines: Augmenting Social Networks with Text. Knowledge Discovery and Data Mining, 2009. (9% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/kdd2009.pdf Jonathan Chang, Jordan Boyd-Graber, Chong Wang, Sean Gerrish, and David M. Blei. Reading Tea Leaves: How Humans Interpret Topic Models. Neural Information Processing Systems, 2009. Jonathan Chang and I shared a NIPS student award honorable mention for this paper (5 out of 1105) Accessible Abstract: Topic models are a tool that historians and social sciences use to explore large text corpora. But how do you know if you have a good topic model? Before this paper, the consensus was to use held-out likelihood to evaluate if you had a good model. This paper argues that this does not fit how people actually use topic models and proposes new human-centered metrics for evaluating topic models. This method inspired a rethinking of model evaluation and showed that the complexity of a model does not always correspond to what a user might want. (24% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/nips2009-rtl.pdf Jordan Boyd-Graber and David M. Blei. Multilingual Topic Models for Unaligned Text. Uncertainty in Artificial Intelligence, 2009. For coverage of current state-of-the-art in cross-lingual topic models see: Yuening Hu, Ke Zhai, Vlad Eidelman, and Jordan Boyd-Graber. Polylingual Tree-Based Topic Models for Translation Domain Adaptation. Association for Computational Linguistics, 2014. (31% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/uai2009.pdf Jonathan Chang, Jordan Boyd-Graber, and David M. Blei. Discovering social networks from free text. 3rd Annual Machine Learning Symposium, 2008. Jordan Boyd-Graber and David M. Blei. Multilingual Topic Models. NIPS Workshop on Unsupervised Latent Variable Models, 2008. Jordan Boyd-Graber and David M. Blei. Syntactic Topic Models. Neural Information Processing Systems, 2008. (25% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/nips2008.pdf Jordan Boyd-Graber and David M. Blei. PUTOP: Turning Predominant Senses into a Topic Model for WSD. 4th International Workshop on Semantic Evaluations, 2007. http://umiacs.umd.edu/~jbg/docs/jbg-SEMEVAL07.pdf Jordan Boyd-Graber, David M. Blei, and Xiaojin Zhu. A Topic Model for Word Sense Disambiguation. Empirical Methods in Natural Language Processing, 2007. (27% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/jbg-EMNLP07.pdf David Mimno ------------------------- Jordan Boyd-Graber, Yuening Hu, and David Mimno. Applications of Topic Models. 2017. http://umiacs.umd.edu/~jbg/http://www.nowpublishers.com/article/Details/INR-030 Jordan Boyd-Graber, David Mimno, and David Newman. Care and Feeding of Topic Models: Problems, Diagnostics, and Improvements. Handbook of Mixed Membership Models and Their Applications, 2014. http://umiacs.umd.edu/~jbg/docs/2014_book_chapter_care_and_feeding.pdf Denis Peskov ------------------------- Denis Peskov, Viktor Hangya, Jordan Boyd-Graber, and Alexander Fraser. Adapting Entities across Languages and Cultures. Findings of Empirical Methods in Natural Language Processing, 2021. Accessible Abstract: If you ask who Germany's "Christian Drosten" is, a simple answer is that he's their "Anthony Fauci". We create a system to automatically generate these adaptations, which can help improve cross-cultural understanding and create new training data for tasks like question answering. (37% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2021_emnlp_adaptation.pdf Alexander Hoyle, Pranav Goel, Denis Peskov, Andrew Hian-Cheong, Jordan Boyd-Graber, and Philip Resnik. Is Automated Topic Model Evaluation Broken?: The Incoherence of Coherence. Neural Information Processing Systems, 2021. Accessible Abstract: Topic models help historians, journalists, and analysts make sense of large text collections. But how do you know if you have a good one? The field has settled on using "Automatic Coherence", but this paper argues that maybe that isn't the right choice if you want to actually make real users happy. This paper builds on our 2009 that showed perplexity was not a good evaluation of interpretability for topic models; while the field adopted automatic topic coherence as a result of that 2009 paper, this paper argues that automatic topic coherence is not a good metric for neural topic models (even though it worked for probabilistic topic models). (26% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2021_neurips_incoherence.pdf Denis Peskov, Benny Cheng, Ahmed Elgohary Ghoneim, Joe Barrow, Cristian Danescu-Niculescu-Mizil, and Jordan Boyd-Graber. It Takes Two to Lie: One to Lie and One to Listen. Association for Computational Linguistics, 2020. Accessible Abstract: Machine learning techniques to detect deception in online communications requires training and evaluation data. However, there is a dearth of data either because of uncertain gold labels or privacy concerns; we create a new, large deception-centered dataset in the online game of Diplomacy. We gathered 17,289 messages from 12 games (each of which took over a month) involving 84 players, the majority of which were unique users. This data was collected with a custom-made bot that allowed us to collect messages and annotations. The user pool was created from scratch: we varied participant demographics across gender, age, nationality, and past game experience. Some of our participants included the former president of the Diplomacy players' association, several top ranked players in the world, a board game shop owner, and scientists. We create machine learning models to detect lies using linguistic, context, and power-dynamic features. Our best model had similar lie detection accuracy to humans. (25.4% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2020_acl_diplomacy.pdf Denis Peskov, Joe Barrow, Pedro Rodriguez, Graham Neubig, and Jordan Boyd-Graber. Mitigating Noisy Inputs for Question Answering. Conference of the International Speech Communication Association, 2019. http://umiacs.umd.edu/~jbg/docs/2019_interspeech_asr Ahmed Elgohary Ghoneim, Denis Peskov, and Jordan Boyd-Graber. Can You Unpack That? Learning to Rewrite Questions-in-Context. Empirical Methods in Natural Language Processing, 2019. http://umiacs.umd.edu/~jbg/docs/2019_emnlp_sequentialqa.pdf Eric Ringger ------------------------- Paul Felt, Eric Ringger, Kevin Seppi, and Jordan Boyd-Graber. Learning from Measurements in Crowdsourcing Models: Inferring Ground Truth from Diverse Annotation Types. International Conference on Computational Linguistics, 2018. (37% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2018_coling_measurements.pdf Paul Felt, Eric Ringger, Jordan Boyd-Graber, and Kevin Seppi. Making the Most of Crowdsourced Document Annotations: Confused Supervised LDA. Conference on Computational Natural Language Learning, 2015. This paper received the best paper award at CoNLL (30% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2015_conll_cslda.pdf Thang Nguyen, Jordan Boyd-Graber, Jeff Lund, Kevin Seppi, and Eric Ringger. Is your anchor going up or down? Fast and accurate supervised topic models. North American Association for Computational Linguistics, 2015. (26% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2015_naacl_supervised_anchor.pdf Eric Wallace ------------------------- Eric Wallace, Shi Feng, and Jordan Boyd-Graber. Misleading Failures of Partial-input Baselines. Association for Computational Linguistics, 2019. (18% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2019_acl_flipside.pdf Eric Wallace, Pedro Rodriguez, Shi Feng, Ikuya Yamada, and Jordan Boyd-Graber. Trick Me If You Can: Human-in-the-loop Generation of Adversarial Question Answering Examples. Transactions of the Association for Computational Linguistics, 2019. http://umiacs.umd.edu/~jbg/docs/2019_tacl_trick.pdf Eric Wallace and Jordan Boyd-Graber. Trick Me If You Can: Adversarial Writing of Trivia Challenge Questions. ACL Student Research Workshop, 2018. http://umiacs.umd.edu/~jbg/http://aclweb.org/anthology/P18-3018 Shi Feng, Eric Wallace, and Jordan Boyd-Graber. Interpreting Neural Networks with Nearest Neighbors. EMNLP Workshop on BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, 2018. http://umiacs.umd.edu/~jbg/http://aclweb.org/anthology/W18-5416 Shi Feng, Eric Wallace, Alvin Grissom II, Pedro Rodriguez, Mohit Iyyer, and Jordan Boyd-Graber. Pathologies of Neural Models Make Interpretation Difficult. Empirical Methods in Natural Language Processing, 2018. (26% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2018_emnlp_rs.pdf Fenfei Guo ------------------------- Fenfei Guo, Chen Zhang, Zhirui Zhang, Qixin He, Kejun Zhang, Jun Xie, and Jordan Boyd-Graber. Automatic Song Translation for Tonal Languages. Findings of the Association for Computational Linguistics, 2022. (31% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2022_acl_ast.pdf Fenfei Guo, Jordan Boyd-Graber, Mohit Iyyer, and Leah Findlater. Which Evaluations Uncover Sense Representations that Actually Make Sense?. Linguistic Resources and Evaluation Conference, 2020. http://umiacs.umd.edu/~jbg/docs/2020_lrec_sense.pdf Forough Poursabzi-Sangdeh ------------------------- Alison Smith, Tak Yeon Lee, Forough Poursabzi-Sangdeh, Jordan Boyd-Graber, Kevin Seppi, Niklas Elmqvist, and Leah Findlater. Evaluating Visual Representations for Topic Understanding and Their Effects on Manually Generated Labels. Transactions of the Association for Computational Linguistics, 2017. http://umiacs.umd.edu/~jbg/docs/2017_tacl_eval_tm_viz.pdf Forough Poursabzi-Sangdeh, Jordan Boyd-Graber, Leah Findlater, and Kevin Seppi. ALTO: Active Learning with Topic Overviews for Speeding Label Induction and Document Labeling. Association for Computational Linguistics, 2016. (28% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2016_acl_doclabel.pdf Alison Smith, Tak Yeon Lee, Forough Poursabzi-Sangdeh, Jordan Boyd-Graber, Kevin Seppi, Niklas Elmqvist, and Leah Findlater. Human-Centered and Interactive: Expanding the Impact of Topic Models. CHI Human Centred Machine Learning Workshop, 2016. Forough Poursabzi-Sangdeh and Jordan Boyd-Graber. Speeding Document Annotation with Topic Models. NAACL Student Research Workshop, 2015. Jason Chuang, John D. Wilkerson, Rebecca Weiss, Dustin Tingley, Brandon M. Stewart, Margaret E. Roberts, Forough Poursabzi-Sangdeh, Justin Grimmer, Leah Findlater, Jordan Boyd-Graber, and Jeffrey Heer. Computer-Assisted Content Analysis: Topic Models for Exploring Multiple Subjective Interpretations. NIPS Workshop on Human-Propelled Machine Learning, 2014. Graham Neubig ------------------------- Denis Peskov, Joe Barrow, Pedro Rodriguez, Graham Neubig, and Jordan Boyd-Graber. Mitigating Noisy Inputs for Question Answering. Conference of the International Speech Communication Association, 2019. http://umiacs.umd.edu/~jbg/docs/2019_interspeech_asr Craig Stewart, Nikolai Vogler, Junjie Hu, Jordan Boyd-Graber, and Graham Neubig. Automatic Estimation of Simultaneous Interpreter Performance. Association for Computational Linguistics, 2018. (24% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2018_acl_interpeval.pdf Hal Daume III ------------------------- Chen Zhao, Chenyan Xiong, Jordan Boyd-Graber, and Hal Daume III. Distantly-Supervised Dense Retrieval Enables Open-Domain Question Answering without Evidence Annotation. Empirical Methods in Natural Language Processing, 2021. Accessible Abstract: Answering questions sometimes requires tying multiple pieces of information together. Previous datasets have required annotators to explicitly build these reasoning chains (e.g., to answer "where do I know the cop from Die Hard from", you need to figure out that the actor's name is "Reginald VelJohnson" and then find out that he's best known as the dad on Family Matters.). By exploring search queries that get to the right answer, we're able to answer these questions without expensive annotation. (26% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2021_emnlp_weak_dpr.pdf Chen Zhao, Chenyan Xiong, Hal Daume III, and Jordan Boyd-Graber. Multi-Step Reasoning Over Unstructured Text with Beam Dense Retrieval. North American Association for Computational Linguistics, 2021. Accessible Abstract: For computers to answer complicated questions online, they often need to put together multiple pieces of information (Ronald Reagan was both governor of California and an actor in Bedtime for Bonzo). However, existing approaches use the links in Wikipedia to combine these clues. This research helps computers find connected information without using these explicit links. (23% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2021_naacl_multi_ance.pdf Tianze Shi, Chen Zhao, Jordan Boyd-Graber, Hal Daume III, and Lillian Lee. On the Potential of Lexico-logical Alignments for Semantic Parsing to SQL Queries. Findings of EMNLP, 2020. http://umiacs.umd.edu/~jbg/docs/2020_findings_qalign.pdf Mohit Iyyer, Varun Manjunatha, Anupam Guha, Yogarshi Vyas, Jordan Boyd-Graber, Hal Daume III, and Larry Davis. The Amazing Mysteries of the Gutter: Drawing Inferences Between Panels in Comic Book Narratives. Computer Vision and Pattern Recognition, 2017. (30% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2017_cvpr_comics.pdf Khanh Nguyen, Jordan Boyd-Graber, and Hal Daume III. Reinforcement Learning for Bandit Neural Machine Translation with Simulated Human Feedback. Empirical Methods in Natural Language Processing, 2017. (26% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2017_emnlp_bandit_mt.pdf Hadi Amiri, Philip Resnik, Jordan Boyd-Graber, and Hal Daume III. Learning Text Pair Similarity with Context-sensitive Autoencoders. Association for Computational Linguistics, 2016. (28% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2016_acl_context_ae.pdf He He, Jordan Boyd-Graber, Kevin Kwok, and Hal Daume III. Opponent Modeling in Deep Reinforcement Learning. International Conference on Machine Learning, 2016. (24% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2016_icml_opponent.pdf Mohit Iyyer, Anupam Guha, Snigdha Chaturvedi, Jordan Boyd-Graber, and Hal Daume III. Feuding Families and Former Friends: Unsupervised Learning for Dynamic Fictional Relationships. North American Association for Computational Linguistics, 2016. Best paper award (2 out of 1592) (24% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2016_naacl_relationships.pdf He He, Jordan Boyd-Graber, and Hal Daume III. Interpretese vs. Translationese: The Uniqueness of Human Strategies in Simultaneous Interpretation. North American Association for Computational Linguistics, 2016. (29% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2016_naacl_interpretese.pdf Mohit Iyyer, Varun Manjunatha, Jordan Boyd-Graber, and Hal Daume III. Deep Unordered Composition Rivals Syntactic Methods for Text Classification. Association for Computational Linguistics, 2015. (25% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2015_acl_dan.pdf He He, Alvin Grissom II, Jordan Boyd-Graber, and Hal Daume III. Syntax-based Rewriting for Simultaneous Machine Translation. Empirical Methods in Natural Language Processing, 2015. (24% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2015_emnlp_rewrite.pdf Jordan Boyd-Graber, Mohit Iyyer, He He, and Hal Daume III. Interactive Incremental Question Answering. Neural Information Processing Systems, 2015.This won the best demonstration award at NIPS 2015 Mohit Iyyer, Jordan Boyd-Graber, Leonardo Claudino, Richard Socher, and Hal Daume III. A Neural Network for Factoid Question Answering over Paragraphs. Empirical Methods in Natural Language Processing, 2014. The partial derivatives of "C" and "J" with respect to the parameters should be switched in Equation 7. (26% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2014_emnlp_qb_rnn.pdf Alvin Grissom II, He He, Jordan Boyd-Graber, John Morgan, and Hal Daume III. Don't Until the Final Verb Wait: Reinforcement Learning for Simultaneous Machine Translation. Empirical Methods in Natural Language Processing, 2014. (30% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2014_emnlp_simtrans.pdf Mohit Iyyer, Jordan Boyd-Graber, and Hal Daume III. Generating Sentences from Semantic Vector Space Representations. NIPS Workshop on Learning Semantics, 2014. Yuening Hu, Jordan Boyd-Graber, Hal Daume III, and Z. Irene Ying. Binary to Bushy: Bayesian Hierarchical Clustering with the Beta Coalescent. Neural Information Processing Systems, 2013. (25% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2013_coalescent.pdf Jordan Boyd-Graber, Brianna Satinoff, He He, and Hal Daume III. Besting the Quiz Master: Crowdsourcing Incremental Classification Games. Empirical Methods in Natural Language Processing, 2012. (25% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/qb_emnlp_2012.pdf He He ------------------------- Pedro Rodriguez, Shi Feng, Mohit Iyyer, He He, and Jordan Boyd-Graber. Quizbowl: The Case for Incremental Question Answering. ArXiv, Preprint. http://umiacs.umd.edu/~jbg/https://arxiv.org/abs/1904.04792 He He, Jordan Boyd-Graber, Kevin Kwok, and Hal Daume III. Opponent Modeling in Deep Reinforcement Learning. International Conference on Machine Learning, 2016. (24% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2016_icml_opponent.pdf He He, Jordan Boyd-Graber, and Hal Daume III. Interpretese vs. Translationese: The Uniqueness of Human Strategies in Simultaneous Interpretation. North American Association for Computational Linguistics, 2016. (29% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2016_naacl_interpretese.pdf He He, Alvin Grissom II, Jordan Boyd-Graber, and Hal Daume III. Syntax-based Rewriting for Simultaneous Machine Translation. Empirical Methods in Natural Language Processing, 2015. (24% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2015_emnlp_rewrite.pdf Jordan Boyd-Graber, Mohit Iyyer, He He, and Hal Daume III. Interactive Incremental Question Answering. Neural Information Processing Systems, 2015.This won the best demonstration award at NIPS 2015 Alvin Grissom II, He He, Jordan Boyd-Graber, John Morgan, and Hal Daume III. Don't Until the Final Verb Wait: Reinforcement Learning for Simultaneous Machine Translation. Empirical Methods in Natural Language Processing, 2014. (30% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2014_emnlp_simtrans.pdf Jordan Boyd-Graber, Brianna Satinoff, He He, and Hal Daume III. Besting the Quiz Master: Crowdsourcing Incremental Classification Games. Empirical Methods in Natural Language Processing, 2012. (25% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/qb_emnlp_2012.pdf HyoJung Han ------------------------- HyoJung Han, Marine Carpuat, and Jordan Boyd-Graber. Automatic Explicitation to Bridge the Background Knowledge Gap in Translation and its Evaluation with Multilingual QA. Empirical Methods in Natural Language Processing, 2023. Accessible Abstract: Sometimes when you a translating from one language to another, a literal translation is not enough. Sometimes to actually understand what is being said, you need additional context. Professional translators know this, and the process that they use to help a listener is called "explicitation" to capturing cultural differences between source and target audiences. We introduce techniques for automatically generating explicitations, motivated by WikiExpl(a dataset collected from Wikipedia and annotate with human translators), and evaluate the explicitation. (23% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2023_emnlp_explicitation.pdf HyoJung Han, Marine Carpuat, and Jordan Boyd-Graber. SimQA: Detecting Simultaneous MT Errors through Word-by-Word Question Answering. Empirical Methods in Natural Language Processing, 2022. Accessible Abstract: Simultaneous interpretation (where a translation happens word by word before the source sentence is finished) is difficult to evaluate. We created a new evaluation framework based on the following scenario: imagine that you're thrown into a trivia gameshow where you don't know the language. Specifically, it's a game format where you interrupt the question word by word as soon as possible. Our hypothesis is that a monolingual player (who doesn't speak the source language) will be able to do better in the game with a better simultaneous translation system. In this 2022 EMNLP publication, we show that this evaluation is not only cheaper (you just need to translate the answer) but can also detect hallucinations and undertranslations better than existing evaluation methods. http://umiacs.umd.edu/~jbg/docs/2022_emnlp_simqa.pdf Jannis Bulian ------------------------- Benjamin Börschinger, Jordan Boyd-Graber, Christian Buck, Jannis Bulian, Massimiliano Ciaramita, Michelle Chen Huebscher, Wojciech Gajewski, Yannic Kilcher, Rodrigo Nogueira, and Lierni Sestorain Saralegu. Meta Answering for Machine Reading. ArXiv, Preprint. http://umiacs.umd.edu/~jbg/https://arxiv.org/abs/1911.04156 Julian Martin Eisenschlos, Bhuwan Dhingra, Jannis Bulian, Benjamin Börschinger, and Jordan Boyd-Graber. Fool Me Twice: Entailment from Wikipedia Gamification. North American Association for Computational Linguistics, 2021. Accessible Abstract: Democracy and the free press depends on being able to recognize when facts online are true or not. For machine learning to help this critical problem, it needs good data identifying which statements are backed up by trusted sources and which are not. This research creates a game people can play online to craft difficult claims that can train computers to spot disinformation online. (28% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2021_naacl_fm2.pdf Jason Chuang ------------------------- Alison Smith, Jason Chuang, Yuening Hu, Jordan Boyd-Graber, and Leah Findlater. Concurrent Visualization of Relationships between Words and Topics in Topic Models. ACL Workshop on Workshop on Interactive Language Learning, Visualization, and Interfaces, 2014. Jason Chuang, John D. Wilkerson, Rebecca Weiss, Dustin Tingley, Brandon M. Stewart, Margaret E. Roberts, Forough Poursabzi-Sangdeh, Justin Grimmer, Leah Findlater, Jordan Boyd-Graber, and Jeffrey Heer. Computer-Assisted Content Analysis: Topic Models for Exploring Multiple Subjective Interpretations. NIPS Workshop on Human-Propelled Machine Learning, 2014. Jeff Lund ------------------------- Jeff Lund, Connor Cook, Kevin Seppi, and Jordan Boyd-Graber. Tandem Anchoring: A Multiword Anchor Approach for Interactive Topic Modeling. Association for Computational Linguistics, 2017. (22% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2017_acl_multiword_anchors.pdf You Lu, Jeff Lund, and Jordan Boyd-Graber. Why ADAGRAD Fails for Online Topic Modeling. Empirical Methods in Natural Language Processing, 2017. (18% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2017_emnlp_adagrad_olda.pdf Thang Nguyen, Jordan Boyd-Graber, Jeff Lund, Kevin Seppi, and Eric Ringger. Is your anchor going up or down? Fast and accurate supervised topic models. North American Association for Computational Linguistics, 2015. (26% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2015_naacl_supervised_anchor.pdf Joe Barrow ------------------------- Pedro Rodriguez, Joe Barrow, Alexander Hoyle, John P. Lalor, Robin Jia, and Jordan Boyd-Graber. Evaluation Examples Are Not Equally Informative: How Should That Change NLP Leaderboards?. Association for Computational Linguistics, 2021. Accessible Abstract: When can we call an AI "intelligent"? Just like humans, a common approach is to ask them a bunch of questions. These questions posed to modern machine learning methods are collected in metrics called leaderboards to monitor progress, but beyond ranking approaches, this does not help us better understand our problems or our systems very well. This paper introduces probabilistic models inspired by psychometric approaches called item response theory models (think year-end standardized tests) to better understand how computers can answer questions and whether we are asking the right questions. This allows researchers to better compare what kinds of questions systems can answer, better compare human and machine ability, and discover problematic questions (e.g., questions that have incorrect answer keys, are vague, or "trick" those trying to answer the questions). (21% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2021_acl_leaderboard.pdf Denis Peskov, Benny Cheng, Ahmed Elgohary Ghoneim, Joe Barrow, Cristian Danescu-Niculescu-Mizil, and Jordan Boyd-Graber. It Takes Two to Lie: One to Lie and One to Listen. Association for Computational Linguistics, 2020. Accessible Abstract: Machine learning techniques to detect deception in online communications requires training and evaluation data. However, there is a dearth of data either because of uncertain gold labels or privacy concerns; we create a new, large deception-centered dataset in the online game of Diplomacy. We gathered 17,289 messages from 12 games (each of which took over a month) involving 84 players, the majority of which were unique users. This data was collected with a custom-made bot that allowed us to collect messages and annotations. The user pool was created from scratch: we varied participant demographics across gender, age, nationality, and past game experience. Some of our participants included the former president of the Diplomacy players' association, several top ranked players in the world, a board game shop owner, and scientists. We create machine learning models to detect lies using linguistic, context, and power-dynamic features. Our best model had similar lie detection accuracy to humans. (25.4% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2020_acl_diplomacy.pdf Denis Peskov, Joe Barrow, Pedro Rodriguez, Graham Neubig, and Jordan Boyd-Graber. Mitigating Noisy Inputs for Question Answering. Conference of the International Speech Communication Association, 2019. http://umiacs.umd.edu/~jbg/docs/2019_interspeech_asr Jonathan Chang ------------------------- Viet-An Nguyen, Jordan Boyd-Graber, Philip Resnik, and Jonathan Chang. Learning a Concept Hierarchy from Multi-labeled Documents. Neural Information Processing Systems, 2014. (25% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2014_nips_l2h.pdf Viet-An Nguyen, Jordan Boyd-Graber, Jonathan Chang, and Philip Resnik. Tree-Based Label Dependency Topic Models. NIPS Workshop on Topic Models: Computation, Application, and Evaluation, 2013. Jonathan Chang, Jordan Boyd-Graber, and David M. Blei. Connections between the Lines: Augmenting Social Networks with Text. Knowledge Discovery and Data Mining, 2009. (9% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/kdd2009.pdf Jonathan Chang, Jordan Boyd-Graber, Chong Wang, Sean Gerrish, and David M. Blei. Reading Tea Leaves: How Humans Interpret Topic Models. Neural Information Processing Systems, 2009. Jonathan Chang and I shared a NIPS student award honorable mention for this paper (5 out of 1105) Accessible Abstract: Topic models are a tool that historians and social sciences use to explore large text corpora. But how do you know if you have a good topic model? Before this paper, the consensus was to use held-out likelihood to evaluate if you had a good model. This paper argues that this does not fit how people actually use topic models and proposes new human-centered metrics for evaluating topic models. This method inspired a rethinking of model evaluation and showed that the complexity of a model does not always correspond to what a user might want. (24% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/nips2009-rtl.pdf Jonathan Chang, Jordan Boyd-Graber, and David M. Blei. Discovering social networks from free text. 3rd Annual Machine Learning Symposium, 2008. Jordan Lee Boyd-Graber ------------------------- Ishani Mondal, Shwetha S, Anandhavelu Natarajan, Aparna Garimella, Sambaran Bandyopadhyay, and Jordan Lee Boyd-Graber. Presentations by the People, for the People: Harnessing LLMs for Generating Persona-Aware Slides from Documents. European Association for Computational Linguistics, 2024. (21% Acceptance Rate) Zongxia Li, Andrew Mao, Daniel Kofi Stephens, Pranav Goel, Emily Walpole, Juan Francisco Fung, Alden Dima, and Jordan Lee Boyd-Graber. TENOR: Topic Enabled Neural Organization and Recommendation: Evaluating Topic Models in Task Based Settings. European Association for Computational Linguistics, 2024. (21% Acceptance Rate) Chenglei Si, Navita Goyal, Tongshuang Wu, Chen Zhao, Shi Feng, Hal Daume III, and Jordan Lee Boyd-Graber. Large Language Models Help Humans Verify Truthfulness---Except When They Are Convincingly Wrong. North American Association for Computational Linguistics, 2024. Sander V Schulhoff, Jeremy Pinto, Anaum Khan, Louis-François Bouchard, Chenglei Si, Jordan Lee Boyd-Graber, Svetlina Anati, Valen Tagliabue, Anson Liu Kost, and Christopher R Carnahan. Ignore This Title and HackAPrompt: Exposing Systemic Vulnerabilities of LLMs Through a Global Prompt Hacking Competition. Empirical Methods in Natural Language Processing, 2023. This paper was selected as the Best Theme Paper at EMNLP 2023 (1 of 4909) Accessible Abstract: As more AI services online are provided by prompted language models, we need to be aware of the weaknesses and exploits of the models. We present the HackAPrompt competition to help elicit a broad array of exploits that get around large langauge models. (23% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2023_emnlp_hackaprompt.pdf Chenglei Si, Weijia Shi, Chen Zhao, Luke Zettlemoyer, and Jordan Lee Boyd-Graber. Getting MoRE out of Mixture of Language Model Reasoning Experts. Findings of Empirical Methods in Natural Language Processing, 2023. Accessible Abstract: There are many ways for a computer to answer a question: a general knowledge question, a common sense question, or a math question. Each of these types of questions can be answered by a particular kind of expert. This paper investigates if we can automatically detect what kind of expert is best suited to answer a question and route the question to the correct expert. (45% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2023_findings_more.pdf Ke Zhai ------------------------- Yuening Hu, Ke Zhai, Vlad Eidelman, and Jordan Boyd-Graber. Polylingual Tree-Based Topic Models for Translation Domain Adaptation. Association for Computational Linguistics, 2014. (26% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2014_acl_ptlda_mt.pdf Ke Zhai, Jordan Boyd-Graber, and Shay B. Cohen. Hybrid Online Inference with Adaptor Grammars. NIPS Workshop on Advances in Variational Inference, 2014. Ke Zhai, Jordan Boyd-Graber, and Shay B. Cohen. Online Adaptor Grammars with Hybrid Inference. Transactions of the Association for Computational Linguistics, 2014. http://umiacs.umd.edu/~jbg/docs/2014_tacl_ag_vb_online.pdf Ke Zhai and Jordan Boyd-Graber. Online Topic Models with Infinite Vocabulary. International Conference on Machine Learning, 2013. (20% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2013_icml_infvoc.pdf Yuening Hu, Ke Zhai, Vlad Edelman, and Jordan Boyd-Graber. Topic Models for Translation Domain Adaptation. NIPS Workshop on Topic Models: Computation, Application, and Evaluation, 2013. Ke Zhai, Jordan Boyd-Graber, Nima Asadi, and Mohamad (Jude) Alkhouja. Mr. LDA: A Flexible Large Scale Topic Modeling Package using Variational Inference in MapReduce. ACM International Conference on World Wide Web, 2012. (12% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2012_www_mrlda.pdf Yuening Hu, Ke Zhai, Sinead Williamson, and Jordan Boyd-Graber. Modeling Images using Transformed Indian Buffet Processes. International Conference on Machine Learning, 2012. (27% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/mtibp_icml_2012.pdf Ke Zhai and Jordan Boyd-Graber. Online Topic Model with Infinite Vocabulary. Mid-Atlantic Student Colloquium on Speech, Language, and Learning, 2012. Kenneth R. Fleischmann ------------------------- Clay Templeton, Kenneth R. Fleischmann, and Jordan Boyd-Graber. Simulating Audiences: Automating Analysis of Values, Attitudes, and Sentiment. IEEE International Conference on Social Computing, 2011. (10% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/simulating_audiences.pdf Clay Templeton, Kenneth R. Fleischmann, and Jordan Boyd-Graber. Comparing Values and Sentiment Using Mechanical Turk. iConference, 2011. http://umiacs.umd.edu/~jbg/docs/iconference-2011-comparing.pdf Kenneth R. Fleischmann, Clay Templeton, and Jordan Boyd-Graber. Modeling Diverse Standpoints in Text Classification: Learning to Be Human by Modeling Human Values. iConference, 2011. http://umiacs.umd.edu/~jbg/docs/iconference-2011-learning.pdf Kevin Seppi ------------------------- Alison Smith, Varun Kumar, Jordan Boyd-Graber, Kevin Seppi, and Leah Findlater. Digging into User Control: Perceptions of Adherence and Instability in Transparent Models. Intelligent User Interfaces, 2020. (23% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2020_iui_control.pdf Jeffrey Lund, Piper Armstrong, Wilson Fearn, Stephen Cowley, Courtni Byun, Jordan Boyd-Graber, and Kevin Seppi. Automatic and Human Evaluation of Local Topic Quality. Association for Computational Linguistics, 2019. (26% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2019_acl_local.pdf Varun Kumar, Alison Smith, Leah Findlater, Kevin Seppi, and Jordan Boyd-Graber. Why Didn't You Listen to Me? Comparing User Control of Human-in-the-Loop Topic Models. Association for Computational Linguistics, 2019. (18% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2019_acl_control.pdf Alison Smith, Varun Kumar, Jordan Boyd-Graber, Kevin Seppi, and Leah Findlater. User-Centered Design and Evaluation of a Human-in-the-Loop Topic Modeling System. Intelligent User Interfaces, 2018.Alison won a best student paper honorable mention (3 out of 300) (23% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2018_iui_itm.pdf Paul Felt, Eric Ringger, Kevin Seppi, and Jordan Boyd-Graber. Learning from Measurements in Crowdsourcing Models: Inferring Ground Truth from Diverse Annotation Types. International Conference on Computational Linguistics, 2018. (37% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2018_coling_measurements.pdf Jeff Lund, Connor Cook, Kevin Seppi, and Jordan Boyd-Graber. Tandem Anchoring: A Multiword Anchor Approach for Interactive Topic Modeling. Association for Computational Linguistics, 2017. (22% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2017_acl_multiword_anchors.pdf Alison Smith, Varun Kumar, Jordan Boyd-Graber, Kevin Seppi, and Leah Findlater. Accounting for Input Uncertainty in Human-in-the-Loop Systems. CHI 2017 Designing for Uncertainty Workshop, 2017. http://umiacs.umd.edu/~jbg/http://visualization.ischool.uw.edu/hci_uncertainty/papers/Paper11.pdf Tak Yeon Lee, Alison Smith, Kevin Seppi, Niklas Elmqvist, Jordan Boyd-Graber, and Leah Findlater. The Human Touch: How Non-expert Users Perceive, Interpret, and Fix Topic Models. International Journal of Human-Computer Studies, 2017. http://umiacs.umd.edu/~jbg/docs/2017_ijhcs_human_touch.pdf Alison Smith, Tak Yeon Lee, Forough Poursabzi-Sangdeh, Jordan Boyd-Graber, Kevin Seppi, Niklas Elmqvist, and Leah Findlater. Evaluating Visual Representations for Topic Understanding and Their Effects on Manually Generated Labels. Transactions of the Association for Computational Linguistics, 2017. http://umiacs.umd.edu/~jbg/docs/2017_tacl_eval_tm_viz.pdf Forough Poursabzi-Sangdeh, Jordan Boyd-Graber, Leah Findlater, and Kevin Seppi. ALTO: Active Learning with Topic Overviews for Speeding Label Induction and Document Labeling. Association for Computational Linguistics, 2016. (28% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2016_acl_doclabel.pdf Alison Smith, Tak Yeon Lee, Forough Poursabzi-Sangdeh, Jordan Boyd-Graber, Kevin Seppi, Niklas Elmqvist, and Leah Findlater. Human-Centered and Interactive: Expanding the Impact of Topic Models. CHI Human Centred Machine Learning Workshop, 2016. Paul Felt, Eric Ringger, Jordan Boyd-Graber, and Kevin Seppi. Making the Most of Crowdsourced Document Annotations: Confused Supervised LDA. Conference on Computational Natural Language Learning, 2015. This paper received the best paper award at CoNLL (30% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2015_conll_cslda.pdf Thang Nguyen, Jordan Boyd-Graber, Jeff Lund, Kevin Seppi, and Eric Ringger. Is your anchor going up or down? Fast and accurate supervised topic models. North American Association for Computational Linguistics, 2015. (26% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2015_naacl_supervised_anchor.pdf Kimberly Glasgow ------------------------- Kimberly Glasgow, Clay Fink, and Jordan Boyd-Graber. Our grief is unspeakable: Measuring the community impact of a tragedy. The International AAAI Conference on Weblogs and Social Media, 2014. (20% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2014_icwsm_grief.pdf Jordan Boyd-Graber, Kimberly Glasgow, and Jackie Sauter Zajac. Spoiler Alert: Machine Learning Approaches to Detect Social Media Posts with Revelatory Information. ASIST 2013: The 76th Annual Meeting of the American Society for Information Science and Technology, 2013. http://umiacs.umd.edu/~jbg/docs/2013_spoiler.pdf Larry Davis ------------------------- Mohit Iyyer, Varun Manjunatha, Jordan Boyd-Graber, and Larry Davis. Learning to Color from Language. North American Association for Computational Linguistics, 2018. (29% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2018_naacl_colorization.pdf Mohit Iyyer, Varun Manjunatha, Anupam Guha, Yogarshi Vyas, Jordan Boyd-Graber, Hal Daume III, and Larry Davis. The Amazing Mysteries of the Gutter: Drawing Inferences Between Panels in Comic Book Narratives. Computer Vision and Pattern Recognition, 2017. (30% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2017_cvpr_comics.pdf Leah Findlater ------------------------- Alison Smith, Jordan Boyd-Graber, Ron Fan, Melissa Birchfield, Tongshuang Wu, Dan Weld, and Leah Findlater. No Explainability without Accountability: An Empirical Study of Explanations and Feedback in Interactive ML. Computer-Human Interaction, 2020. (24% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2020_chi_explanation.pdf Michelle Yuan, Mozhi Zhang, Benjamin Van Durme, Leah Findlater, and Jordan Boyd-Graber. Interactive Refinement of Cross-Lingual Word Embeddings. Empirical Methods in Natural Language Processing, 2020. Accessible Abstract: Language technologies sometimes need to be quickly deployed in low-resource languages. For example, in the 2010 Haiti earthquake, researchers used machine learning models to analyze social media and text messages to gain situational awareness. We introduce CLIME, an interactive system that can help in these scenarios: users see which words related to the task the system thinks are similar, corrects the model to push similar words together and dissimilar words apart. (25% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2020_emnlp_clime.pdf Alison Smith, Varun Kumar, Jordan Boyd-Graber, Kevin Seppi, and Leah Findlater. Digging into User Control: Perceptions of Adherence and Instability in Transparent Models. Intelligent User Interfaces, 2020. (23% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2020_iui_control.pdf Fenfei Guo, Jordan Boyd-Graber, Mohit Iyyer, and Leah Findlater. Which Evaluations Uncover Sense Representations that Actually Make Sense?. Linguistic Resources and Evaluation Conference, 2020. http://umiacs.umd.edu/~jbg/docs/2020_lrec_sense.pdf Varun Kumar, Alison Smith, Leah Findlater, Kevin Seppi, and Jordan Boyd-Graber. Why Didn't You Listen to Me? Comparing User Control of Human-in-the-Loop Topic Models. Association for Computational Linguistics, 2019. (18% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2019_acl_control.pdf Alison Smith, Varun Kumar, Jordan Boyd-Graber, Kevin Seppi, and Leah Findlater. User-Centered Design and Evaluation of a Human-in-the-Loop Topic Modeling System. Intelligent User Interfaces, 2018.Alison won a best student paper honorable mention (3 out of 300) (23% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2018_iui_itm.pdf Alison Smith, Varun Kumar, Jordan Boyd-Graber, Kevin Seppi, and Leah Findlater. Accounting for Input Uncertainty in Human-in-the-Loop Systems. CHI 2017 Designing for Uncertainty Workshop, 2017. http://umiacs.umd.edu/~jbg/http://visualization.ischool.uw.edu/hci_uncertainty/papers/Paper11.pdf Tak Yeon Lee, Alison Smith, Kevin Seppi, Niklas Elmqvist, Jordan Boyd-Graber, and Leah Findlater. The Human Touch: How Non-expert Users Perceive, Interpret, and Fix Topic Models. International Journal of Human-Computer Studies, 2017. http://umiacs.umd.edu/~jbg/docs/2017_ijhcs_human_touch.pdf Alison Smith, Tak Yeon Lee, Forough Poursabzi-Sangdeh, Jordan Boyd-Graber, Kevin Seppi, Niklas Elmqvist, and Leah Findlater. Evaluating Visual Representations for Topic Understanding and Their Effects on Manually Generated Labels. Transactions of the Association for Computational Linguistics, 2017. http://umiacs.umd.edu/~jbg/docs/2017_tacl_eval_tm_viz.pdf Forough Poursabzi-Sangdeh, Jordan Boyd-Graber, Leah Findlater, and Kevin Seppi. ALTO: Active Learning with Topic Overviews for Speeding Label Induction and Document Labeling. Association for Computational Linguistics, 2016. (28% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2016_acl_doclabel.pdf Alison Smith, Tak Yeon Lee, Forough Poursabzi-Sangdeh, Jordan Boyd-Graber, Kevin Seppi, Niklas Elmqvist, and Leah Findlater. Human-Centered and Interactive: Expanding the Impact of Topic Models. CHI Human Centred Machine Learning Workshop, 2016. Alison Smith, Jason Chuang, Yuening Hu, Jordan Boyd-Graber, and Leah Findlater. Concurrent Visualization of Relationships between Words and Topics in Topic Models. ACL Workshop on Workshop on Interactive Language Learning, Visualization, and Interfaces, 2014. Jason Chuang, John D. Wilkerson, Rebecca Weiss, Dustin Tingley, Brandon M. Stewart, Margaret E. Roberts, Forough Poursabzi-Sangdeh, Justin Grimmer, Leah Findlater, Jordan Boyd-Graber, and Jeffrey Heer. Computer-Assisted Content Analysis: Topic Models for Exploring Multiple Subjective Interpretations. NIPS Workshop on Human-Propelled Machine Learning, 2014. Leonardo Claudino ------------------------- Philip Resnik, William Armstrong, Leonardo Claudino, Thang Nguyen, Viet-An Nguyen, and Jordan Boyd-Graber. Beyond LDA: Exploring Supervised Topic Modeling for Depression-Related Language in Twitter. NAACL Workshop on Cognitive Modeling and Computational Linguistics, 2015. Mohit Iyyer, Jordan Boyd-Graber, Leonardo Claudino, Richard Socher, and Hal Daume III. A Neural Network for Factoid Question Answering over Paragraphs. Empirical Methods in Natural Language Processing, 2014. The partial derivatives of "C" and "J" with respect to the parameters should be switched in Equation 7. (26% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2014_emnlp_qb_rnn.pdf Mane, Heran Y. ------------------------- Srikanth, Neha Pundlik, Rupak Sarkar, Mane, Heran Y., Aparicio, Elizabeth M., Nguyen, Quynh C., Rachel Rudinger, and Jordan Boyd-Graber. Large Language Models Help Humans Verify Truthfulness---Except When They Are Convincingly Wrong. North American Association for Computational Linguistics, 2024. Mane, Heran Y., Channell Doig, Amara, Marin Gutierrez, Francia Ximena, Jasczynski, Michelle, Yue, Xiaohe, Srikanth, Neha Pundlik, Mane, Sourabh, Sun, Abby, Moats, Rachel Ann, Patel, Pragat, He, Xin, Boyd-Graber, Jordan Lee, Aparicio, Elizabeth M., and Nguyen, Quynh C.. Practical Guidance for the Development of Rosie, a Health Education Question-and-Answer Chatbot for New Mothers. Journal of Public Health Management and Practice, 2023. http://umiacs.umd.edu/~jbg/https://journals.lww.com/jphmp/fulltext/2023/09000/practical_guidance_for_the_development_of_rosie,_a.9.aspx Marine Carpuat ------------------------- HyoJung Han, Marine Carpuat, and Jordan Boyd-Graber. Automatic Explicitation to Bridge the Background Knowledge Gap in Translation and its Evaluation with Multilingual QA. Empirical Methods in Natural Language Processing, 2023. Accessible Abstract: Sometimes when you a translating from one language to another, a literal translation is not enough. Sometimes to actually understand what is being said, you need additional context. Professional translators know this, and the process that they use to help a listener is called "explicitation" to capturing cultural differences between source and target audiences. We introduce techniques for automatically generating explicitations, motivated by WikiExpl(a dataset collected from Wikipedia and annotate with human translators), and evaluate the explicitation. (23% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2023_emnlp_explicitation.pdf HyoJung Han, Marine Carpuat, and Jordan Boyd-Graber. SimQA: Detecting Simultaneous MT Errors through Word-by-Word Question Answering. Empirical Methods in Natural Language Processing, 2022. Accessible Abstract: Simultaneous interpretation (where a translation happens word by word before the source sentence is finished) is difficult to evaluate. We created a new evaluation framework based on the following scenario: imagine that you're thrown into a trivia gameshow where you don't know the language. Specifically, it's a game format where you interrupt the question word by word as soon as possible. Our hypothesis is that a monolingual player (who doesn't speak the source language) will be able to do better in the game with a better simultaneous translation system. In this 2022 EMNLP publication, we show that this evaluation is not only cheaper (you just need to translate the answer) but can also detect hallucinations and undertranslations better than existing evaluation methods. http://umiacs.umd.edu/~jbg/docs/2022_emnlp_simqa.pdf Massimiliano Ciaramita ------------------------- Benjamin Börschinger, Jordan Boyd-Graber, Christian Buck, Jannis Bulian, Massimiliano Ciaramita, Michelle Chen Huebscher, Wojciech Gajewski, Yannic Kilcher, Rodrigo Nogueira, and Lierni Sestorain Saralegu. Meta Answering for Machine Reading. ArXiv, Preprint. http://umiacs.umd.edu/~jbg/https://arxiv.org/abs/1911.04156 Francesco Saverio Varini, Jordan Boyd-Graber, Massimiliano Ciaramita, and Markus Leippold. ClimaText: A Dataset for Climate Change Topic Detection. NeurIPS Workshop on Tackling Climate Change with Machine Learning, 2020. Michael J. Paul ------------------------- Mozhi Zhang, Yoshinari Fujinuma, Michael J. Paul, and Jordan Boyd-Graber. Why Overfitting Isn't Always Bad: Retrofitting Cross-Lingual Word Embeddings to Dictionaries. Association for Computational Linguistics, 2020. Accessible Abstract: Computers need to represent words in a computer-readable way. This work talks about how slightly moving these representations for words in different languages to be closer to a small list of translations (like from a dictionary) after doing fancy machine learning works better on downstream tasks (e.g., guessing grammatical category of a word) but hurts on asking the algorithm for translations of unseen words. (17.6% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2020_acl_refine.pdf Shudong Hao, Michael J. Paul, and Jordan Boyd-Graber. Lessons from the Bible on Modern Topics: Multilingual Topic Model Evaluation on Low-Resource Languages. North American Association for Computational Linguistics, 2018. (35% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2018_naacl_mltm_eval.pdf Michael Paul ------------------------- Yoshinari Fujinuma, Michael Paul, and Jordan Boyd-Graber. A Resource-Free Evaluation Metric for Cross-Lingual Word Embeddings Based on Graph Modularity. Association for Computational Linguistics, 2019. (26% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2019_acl_modularity.pdf Dasha Pruss, Yoshinari Fujinuma, Ashlynn Daughton, Michael Paul, Brad Arnot, Danielle Szafir, and Jordan Boyd-Graber. Zika discourse in the Americas: A multilingual topic analysis of Twitter. PlosOne, 2019. http://umiacs.umd.edu/~jbg/https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0216922 Michelle Yuan ------------------------- Michelle Yuan, Patrick Xia, Chandler May, Benjamin Van Durme, and Jordan Boyd-Graber. Adapting Coreference Resolution Models through Active Learning. Association for Computational Linguistics, 2022. (21% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2022_acl_alcoref.pdf Michelle Yuan, Hsuan-Tien Lin, and Jordan Boyd-Graber. Cold-start Active Learning through Self-Supervised Language Modeling. Empirical Methods in Natural Language Processing, 2020. Accessible Abstract: Labeling data is a fundamental bottleneck in machine learning, especially for NLP, due to annotation cost and time. For medical text, obtaining labeled data is challenging because of privacy issues or shortage in expertise. Thus, active learning can be employed to recognize the most relevant examples and then query labels from an oracle. However, developing a strategy for selecting examples to label is non-trivial. Active learning is difficult to use in cold-start; all examples confuse the model because it has not trained on enough data. Fortunately, modern NLP provides an additional source of information: pre-trained language models. In our paper, we propose an active learning strategy called ALPS to find sentences that perplex the language model. We evaluate our approach on sentence classification datasets spanning across different domains. Results show that ALPS is an efficient active learning strategy that is competitive with state-of-the-art approaches. (25% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2020_emnlp_alps.pdf Michelle Yuan, Mozhi Zhang, Benjamin Van Durme, Leah Findlater, and Jordan Boyd-Graber. Interactive Refinement of Cross-Lingual Word Embeddings. Empirical Methods in Natural Language Processing, 2020. Accessible Abstract: Language technologies sometimes need to be quickly deployed in low-resource languages. For example, in the 2010 Haiti earthquake, researchers used machine learning models to analyze social media and text messages to gain situational awareness. We introduce CLIME, an interactive system that can help in these scenarios: users see which words related to the task the system thinks are similar, corrects the model to push similar words together and dissimilar words apart. (25% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2020_emnlp_clime.pdf Michelle Yuan, Benjamin Van Durme, and Jordan Boyd-Graber. Multilingual Anchoring: Interactive Topic Modeling and Alignment Across Languages. Neural Information Processing Systems, 2018. (21% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2018_neurips_mtanchor.pdf Mohit Iyyer ------------------------- Pedro Rodriguez, Shi Feng, Mohit Iyyer, He He, and Jordan Boyd-Graber. Quizbowl: The Case for Incremental Question Answering. ArXiv, Preprint. http://umiacs.umd.edu/~jbg/https://arxiv.org/abs/1904.04792 Fenfei Guo, Jordan Boyd-Graber, Mohit Iyyer, and Leah Findlater. Which Evaluations Uncover Sense Representations that Actually Make Sense?. Linguistic Resources and Evaluation Conference, 2020. http://umiacs.umd.edu/~jbg/docs/2020_lrec_sense.pdf Shi Feng, Eric Wallace, Alvin Grissom II, Pedro Rodriguez, Mohit Iyyer, and Jordan Boyd-Graber. Pathologies of Neural Models Make Interpretation Difficult. Empirical Methods in Natural Language Processing, 2018. (26% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2018_emnlp_rs.pdf Mohit Iyyer, Varun Manjunatha, Jordan Boyd-Graber, and Larry Davis. Learning to Color from Language. North American Association for Computational Linguistics, 2018. (29% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2018_naacl_colorization.pdf Mohit Iyyer, Varun Manjunatha, Anupam Guha, Yogarshi Vyas, Jordan Boyd-Graber, Hal Daume III, and Larry Davis. The Amazing Mysteries of the Gutter: Drawing Inferences Between Panels in Comic Book Narratives. Computer Vision and Pattern Recognition, 2017. (30% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2017_cvpr_comics.pdf Anupam Guha, Mohit Iyyer, and Jordan Boyd-Graber. A Distorted Skull Lies in the Bottom Center: Identifying Paintings from Text Descriptions. NAACL Human-Computer Question Answering Workshop, 2016. http://umiacs.umd.edu/~jbg/docs/2016_naacl_paintings.pdf Mohit Iyyer, Anupam Guha, Snigdha Chaturvedi, Jordan Boyd-Graber, and Hal Daume III. Feuding Families and Former Friends: Unsupervised Learning for Dynamic Fictional Relationships. North American Association for Computational Linguistics, 2016. Best paper award (2 out of 1592) (24% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2016_naacl_relationships.pdf Mohit Iyyer, Varun Manjunatha, Jordan Boyd-Graber, and Hal Daume III. Deep Unordered Composition Rivals Syntactic Methods for Text Classification. Association for Computational Linguistics, 2015. (25% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2015_acl_dan.pdf Jordan Boyd-Graber, Mohit Iyyer, He He, and Hal Daume III. Interactive Incremental Question Answering. Neural Information Processing Systems, 2015.This won the best demonstration award at NIPS 2015 Anupam Guha, Mohit Iyyer, Danny Bouman, and Jordan Boyd-Graber. Removing the Training Wheels: A Coreference Dataset that Entertains Humans and Challenges Computers. North American Association for Computational Linguistics, 2015. (26% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2015_naacl_qb_coref.pdf Mohit Iyyer, Peter Enns, Jordan Boyd-Graber, and Philip Resnik. Political Ideology Detection Using Recursive Neural Networks. Association for Computational Linguistics, 2014. (26% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2014_acl_rnn_ideology.pdf Mohit Iyyer, Jordan Boyd-Graber, Leonardo Claudino, Richard Socher, and Hal Daume III. A Neural Network for Factoid Question Answering over Paragraphs. Empirical Methods in Natural Language Processing, 2014. The partial derivatives of "C" and "J" with respect to the parameters should be switched in Equation 7. (26% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2014_emnlp_qb_rnn.pdf Mohit Iyyer, Jordan Boyd-Graber, and Hal Daume III. Generating Sentences from Semantic Vector Space Representations. NIPS Workshop on Learning Semantics, 2014. Mozhi Zhang ------------------------- Mozhi Zhang, Yoshinari Fujinuma, Michael J. Paul, and Jordan Boyd-Graber. Why Overfitting Isn't Always Bad: Retrofitting Cross-Lingual Word Embeddings to Dictionaries. Association for Computational Linguistics, 2020. Accessible Abstract: Computers need to represent words in a computer-readable way. This work talks about how slightly moving these representations for words in different languages to be closer to a small list of translations (like from a dictionary) after doing fancy machine learning works better on downstream tasks (e.g., guessing grammatical category of a word) but hurts on asking the algorithm for translations of unseen words. (17.6% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2020_acl_refine.pdf Mozhi Zhang, Yoshinari Fujinuma, and Jordan Boyd-Graber. Exploiting Cross-Lingual Subword Similarities in Low-Resource Document Classification. Association for the Advancement of Artificial Intelligence, 2020. (20.6% Acceptance Rate) http://umiacs.umd.edu/~jbg/https://arxiv.org/abs/1812.09617 Michelle Yuan, Mozhi Zhang, Benjamin Van Durme, Leah Findlater, and Jordan Boyd-Graber. Interactive Refinement of Cross-Lingual Word Embeddings. Empirical Methods in Natural Language Processing, 2020. Accessible Abstract: Language technologies sometimes need to be quickly deployed in low-resource languages. For example, in the 2010 Haiti earthquake, researchers used machine learning models to analyze social media and text messages to gain situational awareness. We introduce CLIME, an interactive system that can help in these scenarios: users see which words related to the task the system thinks are similar, corrects the model to push similar words together and dissimilar words apart. (25% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2020_emnlp_clime.pdf Mozhi Zhang, Keyulu Xu, Ken-ichi Kawarabayashi, Stefanie Jegelka, and Jordan Boyd-Graber. Are Girls Neko or Shōjo? Cross-Lingual Alignment of Non-Isomorphic Embeddings with Iterative Normalization. Association for Computational Linguistics, 2019. (18.3% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2019_acl_clwe.pdf Mozhi Zhang, Yoshinari Fujinuma, and Jordan Boyd-Graber. Exploiting Cross-Lingual Subword Similarities in Low-Resource Document Classification. ACL Workshop on Deep Learning Approaches for Low-Resource Natural Language Processing, 2018. Naho Orita ------------------------- Alvin Grissom II, Naho Orita, and Jordan Boyd-Graber. Incremental Prediction of Sentence-final Verbs. Conference on Computational Natural Language Learning, 2016. (20% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2016_conll_verbpred.pdf Naho Orita, Naomi Feldman, and Jordan Boyd-Graber. Quantifying the role of discourse topicality in speakers' choices of referring expressions. ACL Workshop on Cognitive Modeling and Computational Linguistics, 2014. Naho Orita, Rebecca McKeown, Naomi H. Feldman, Jeffrey Lidz, and Jordan Boyd-Graber. Discovering Pronoun Categories using Discourse Information. Proceedings of the Cognitive Science Society, 2013. http://umiacs.umd.edu/~jbg/docs/2013_cogsci_pronoun.pdf Nguyen, Quynh C. ------------------------- Srikanth, Neha Pundlik, Rupak Sarkar, Mane, Heran Y., Aparicio, Elizabeth M., Nguyen, Quynh C., Rachel Rudinger, and Jordan Boyd-Graber. Large Language Models Help Humans Verify Truthfulness---Except When They Are Convincingly Wrong. North American Association for Computational Linguistics, 2024. Mane, Heran Y., Channell Doig, Amara, Marin Gutierrez, Francia Ximena, Jasczynski, Michelle, Yue, Xiaohe, Srikanth, Neha Pundlik, Mane, Sourabh, Sun, Abby, Moats, Rachel Ann, Patel, Pragat, He, Xin, Boyd-Graber, Jordan Lee, Aparicio, Elizabeth M., and Nguyen, Quynh C.. Practical Guidance for the Development of Rosie, a Health Education Question-and-Answer Chatbot for New Mothers. Journal of Public Health Management and Practice, 2023. http://umiacs.umd.edu/~jbg/https://journals.lww.com/jphmp/fulltext/2023/09000/practical_guidance_for_the_development_of_rosie,_a.9.aspx Niklas Elmqvist ------------------------- Tak Yeon Lee, Alison Smith, Kevin Seppi, Niklas Elmqvist, Jordan Boyd-Graber, and Leah Findlater. The Human Touch: How Non-expert Users Perceive, Interpret, and Fix Topic Models. International Journal of Human-Computer Studies, 2017. http://umiacs.umd.edu/~jbg/docs/2017_ijhcs_human_touch.pdf Alison Smith, Tak Yeon Lee, Forough Poursabzi-Sangdeh, Jordan Boyd-Graber, Kevin Seppi, Niklas Elmqvist, and Leah Findlater. Evaluating Visual Representations for Topic Understanding and Their Effects on Manually Generated Labels. Transactions of the Association for Computational Linguistics, 2017. http://umiacs.umd.edu/~jbg/docs/2017_tacl_eval_tm_viz.pdf Alison Smith, Tak Yeon Lee, Forough Poursabzi-Sangdeh, Jordan Boyd-Graber, Kevin Seppi, Niklas Elmqvist, and Leah Findlater. Human-Centered and Interactive: Expanding the Impact of Topic Models. CHI Human Centred Machine Learning Workshop, 2016. Paul Felt ------------------------- Paul Felt, Eric Ringger, Kevin Seppi, and Jordan Boyd-Graber. Learning from Measurements in Crowdsourcing Models: Inferring Ground Truth from Diverse Annotation Types. International Conference on Computational Linguistics, 2018. (37% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2018_coling_measurements.pdf Paul Felt, Eric Ringger, Jordan Boyd-Graber, and Kevin Seppi. Making the Most of Crowdsourced Document Annotations: Confused Supervised LDA. Conference on Computational Natural Language Learning, 2015. This paper received the best paper award at CoNLL (30% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2015_conll_cslda.pdf Pedro Rodriguez ------------------------- Pedro Rodriguez, Shi Feng, Mohit Iyyer, He He, and Jordan Boyd-Graber. Quizbowl: The Case for Incremental Question Answering. ArXiv, Preprint. http://umiacs.umd.edu/~jbg/https://arxiv.org/abs/1904.04792 Pedro Rodriguez, Joe Barrow, Alexander Hoyle, John P. Lalor, Robin Jia, and Jordan Boyd-Graber. Evaluation Examples Are Not Equally Informative: How Should That Change NLP Leaderboards?. Association for Computational Linguistics, 2021. Accessible Abstract: When can we call an AI "intelligent"? Just like humans, a common approach is to ask them a bunch of questions. These questions posed to modern machine learning methods are collected in metrics called leaderboards to monitor progress, but beyond ranking approaches, this does not help us better understand our problems or our systems very well. This paper introduces probabilistic models inspired by psychometric approaches called item response theory models (think year-end standardized tests) to better understand how computers can answer questions and whether we are asking the right questions. This allows researchers to better compare what kinds of questions systems can answer, better compare human and machine ability, and discover problematic questions (e.g., questions that have incorrect answer keys, are vague, or "trick" those trying to answer the questions). (21% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2021_acl_leaderboard.pdf Pedro Rodriguez and Jordan Boyd-Graber. Evaluation Paradigms in Question Answering. Empirical Methods in Natural Language Processing, 2021. Accessible Abstract: Why do we answer questions? Sometimes it's to provide information, which has been the interpretation of the computer science community. But sometimes it's to probe or test intelligence. This paper argues we should think more about that application of question answering and its connection to the foundations of artificial intelligence: The Turing Test. We thus argue that in addition to the long-standing Cranfield paradigm popularized by information retrieval, this paper proposes an alternative "Manchester paradigm" closer to the Turing test, trivia games, and education. (18% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2021_emnlp_paradigms.pdf Denis Peskov, Joe Barrow, Pedro Rodriguez, Graham Neubig, and Jordan Boyd-Graber. Mitigating Noisy Inputs for Question Answering. Conference of the International Speech Communication Association, 2019. http://umiacs.umd.edu/~jbg/docs/2019_interspeech_asr Eric Wallace, Pedro Rodriguez, Shi Feng, Ikuya Yamada, and Jordan Boyd-Graber. Trick Me If You Can: Human-in-the-loop Generation of Adversarial Question Answering Examples. Transactions of the Association for Computational Linguistics, 2019. http://umiacs.umd.edu/~jbg/docs/2019_tacl_trick.pdf Shi Feng, Eric Wallace, Alvin Grissom II, Pedro Rodriguez, Mohit Iyyer, and Jordan Boyd-Graber. Pathologies of Neural Models Make Interpretation Difficult. Empirical Methods in Natural Language Processing, 2018. (26% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2018_emnlp_rs.pdf Jordan Boyd-Graber, Shi Feng, and Pedro Rodriguez. Human-Computer Question Answering: The Case for Quizbowl. The NIPS '17 Competition: Building Intelligent Systems, 2018. http://umiacs.umd.edu/~jbg/docs/2018_nips_qbcomp.pdf Perry Cook ------------------------- Sonya S. Nikolova, Jordan Boyd-Graber, Christiane Fellbaum, and Perry Cook. Better Vocabularies for Assistive Communication Aids: Connecting Terms using Semantic Networks and Untrained Annotators. ACM Conference on Computers and Accessibility, 2009. (31% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/evocation-viva.pdf Xiaojuan Ma, Jordan Boyd-Graber, Sonya S. Nikolova, and Perry Cook. Speaking Through Pictures: Images vs. Icons. ACM Conference on Computers and Accessibility, 2009. (31% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/image_icon.pdf Sonya S. Nikolova, Jordan Boyd-Graber, and Perry Cook. The Design of ViVA: A Mixed-initiative Visual Vocabulary for Aphasia. Proceedings of the 27th international conference extended abstracts on Human factors in computing systems, 2009. http://umiacs.umd.edu/~jbg/docs/viva.pdf Philip Resnik ------------------------- Alexander Hoyle, Pranav Goel, Denis Peskov, Andrew Hian-Cheong, Jordan Boyd-Graber, and Philip Resnik. Is Automated Topic Model Evaluation Broken?: The Incoherence of Coherence. Neural Information Processing Systems, 2021. Accessible Abstract: Topic models help historians, journalists, and analysts make sense of large text collections. But how do you know if you have a good one? The field has settled on using "Automatic Coherence", but this paper argues that maybe that isn't the right choice if you want to actually make real users happy. This paper builds on our 2009 that showed perplexity was not a good evaluation of interpretability for topic models; while the field adopted automatic topic coherence as a result of that 2009 paper, this paper argues that automatic topic coherence is not a good metric for neural topic models (even though it worked for probabilistic topic models). (26% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2021_neurips_incoherence.pdf Weiwei Yang, Jordan Boyd-Graber, and Philip Resnik. A Multilingual Topic Model for Learning Weighted Topic Links Across Incomparable Corpora. Empirical Methods in Natural Language Processing, 2019. http://umiacs.umd.edu/~jbg/docs/2019_emnlp_mtm.pdf Weiwei Yang, Jordan Boyd-Graber, and Philip Resnik. Adapting Topic Models using Lexical Associations with Tree Priors. Empirical Methods in Natural Language Processing, 2017. (18% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2017_emnlp_tree_prior.pdf Weiwei Yang, Jordan Boyd-Graber, and Philip Resnik. A Discriminative Topic Model using Document Network Structure. Association for Computational Linguistics, 2016. (28% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2016_acl_docblock.pdf Hadi Amiri, Philip Resnik, Jordan Boyd-Graber, and Hal Daume III. Learning Text Pair Similarity with Context-sensitive Autoencoders. Association for Computational Linguistics, 2016. (28% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2016_acl_context_ae.pdf Weiwei Yang, Jordan Boyd-Graber, and Philip Resnik. Birds of a Feather in the Same Nest: A Discriminative Topic Model using Block-based Priors. Mid-Atlantic Student Colloquium on Speech, Language, and Learning, 2016. Viet-An Nguyen, Jordan Boyd-Graber, Philip Resnik, and Kristina Miler. Tea Party in the House: A Hierarchical Ideal Point Topic Model and Its Application to Republican Legislators in the 112th Congress. Association for Computational Linguistics, 2015. Accessible Abstract: In the mid 2010s, the Republican party in the United States diverged: mainstream conservatives split from the so-called "tea party" caucus. However, the primary statistical tool for analyzing political factions in legislative bodies (ideal point models) fail to account for these changes. This is because the schism is not fully reflected in voting patterns but rather in how politicians present themselves: thus we need to extend these models to capture not just how politicians vote but also how they frame particular issues. This paper proposes a new model to capture framing differences within a voting block to start explaining the new subcoalitions of the republican caucus. (25% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2015_acl_teaparty.pdf Weiwei Yang, Jordan Boyd-Graber, and Philip Resnik. Birds of a Feather Linked Together: A Discriminative Topic Model using Link-based Priors. Empirical Methods in Natural Language Processing, 2015. (28% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2015_emnlp_hinge_link.pdf Philip Resnik, William Armstrong, Leonardo Claudino, Thang Nguyen, Viet-An Nguyen, and Jordan Boyd-Graber. Beyond LDA: Exploring Supervised Topic Modeling for Depression-Related Language in Twitter. NAACL Workshop on Cognitive Modeling and Computational Linguistics, 2015. Mohit Iyyer, Peter Enns, Jordan Boyd-Graber, and Philip Resnik. Political Ideology Detection Using Recursive Neural Networks. Association for Computational Linguistics, 2014. (26% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2014_acl_rnn_ideology.pdf Viet-An Nguyen, Jordan Boyd-Graber, and Philip Resnik. Sometimes Average is Best: The Importance of Averaging for Prediction using MCMC Inference in Topic Modeling. Empirical Methods in Natural Language Processing, 2014. (30% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2014_emnlp_howto_gibbs.pdf Viet-An Nguyen, Jordan Boyd-Graber, Philip Resnik, Deborah Cai, Jennifer Midberry, and Yuanxin Wang. Modeling Topic Control to Detect Influence in Conversations using Nonparametric Topic Models. Machine Learning, 2014. http://umiacs.umd.edu/~jbg/docs/2014_mlj_influencer.pdf Viet-An Nguyen, Jordan Boyd-Graber, Philip Resnik, and Jonathan Chang. Learning a Concept Hierarchy from Multi-labeled Documents. Neural Information Processing Systems, 2014. (25% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2014_nips_l2h.pdf Viet-An Nguyen, Jordan Boyd-Graber, Jonathan Chang, and Philip Resnik. Tree-Based Label Dependency Topic Models. NIPS Workshop on Topic Models: Computation, Application, and Evaluation, 2013. Viet-An Nguyen, Jordan Boyd-Graber, and Philip Resnik. Lexical and Hierarchical Topic Regression. Neural Information Processing Systems, 2013. (25% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2013_shlda.pdf Viet-An Nguyen, Yuening Hu, Jordan Boyd-Graber, and Philip Resnik. Argviz: Interactive Visualization of Topic Dynamics in Multi-party Conversations. North American Association for Computational Linguistics, 2013. (50% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2013_argviz.pdf Vladimir Eidelman, Jordan Boyd-Graber, and Philip Resnik. Topic Models for Dynamic Translation Model Adaptation. Association for Computational Linguistics, 2012. For a more thorough evaluation and an exploration of more advanced topic models for machine translation, see: Yuening Hu, Ke Zhai, Vlad Eidelman, and Jordan Boyd-Graber. Polylingual Tree-Based Topic Models for Translation Domain Adaptation. Association for Computational Linguistics, 2014. (21% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/acl_2012_tm_for_mt.pdf Viet-An Nguyen, Jordan Boyd-Graber, and Philip Resnik. SITS: A Hierarchical Nonparametric Model using Speaker Identity for Topic Segmentation in Multiparty Conversations. Association for Computational Linguistics, 2012. (19% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/acl_2012_sits.pdf Viet-An Nguyen, Jordan Boyd-Graber, and Philip Resnik. "I Want to Talk About, Again, My Record On Energy …'': Modeling Topic Control in Conversations using Speaker-centric Nonparametric Topic Models. Mid-Atlantic Student Colloquium on Speech, Language, and Learning, 2012. Pranav Anand, Joseph King, Jordan Boyd-Graber, Earl Wagner, Craig Martell, Douglas W. Oard, and Philip Resnik. Believe Me: We Can Do This!. The AAAI 2011 workshop on Computational Models of Natural Argument, 2011. http://umiacs.umd.edu/~jbg/docs/persuasion.pdf Nitin Madnani, Jordan Boyd-Graber, and Philip Resnik. Measuring Transitivity Using Untrained Annotators. Creating Speech and Language Data With Amazon's Mechanical Turk, 2010. http://umiacs.umd.edu/~jbg/docs/madnani-boyd-graber-turk-workshop.pdf Jordan Boyd-Graber and Philip Resnik. Holistic Sentiment Analysis Across Languages: Multilingual Supervised Latent Dirichlet Allocation. Empirical Methods in Natural Language Processing, 2010. (25% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/jbg-mlslda-2010.pdf Eric Hardisty, Jordan Boyd-Graber, and Philip Resnik. Modeling Perspective using Adaptor Grammars. Empirical Methods in Natural Language Processing, 2010. (25% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/adapted_naive_bayes.pdf Pranav Goel ------------------------- Zongxia Li, Andrew Mao, Daniel Kofi Stephens, Pranav Goel, Emily Walpole, Juan Francisco Fung, Alden Dima, and Jordan Lee Boyd-Graber. TENOR: Topic Enabled Neural Organization and Recommendation: Evaluating Topic Models in Task Based Settings. European Association for Computational Linguistics, 2024. (21% Acceptance Rate) Alexander Hoyle, Pranav Goel, Denis Peskov, Andrew Hian-Cheong, Jordan Boyd-Graber, and Philip Resnik. Is Automated Topic Model Evaluation Broken?: The Incoherence of Coherence. Neural Information Processing Systems, 2021. Accessible Abstract: Topic models help historians, journalists, and analysts make sense of large text collections. But how do you know if you have a good one? The field has settled on using "Automatic Coherence", but this paper argues that maybe that isn't the right choice if you want to actually make real users happy. This paper builds on our 2009 that showed perplexity was not a good evaluation of interpretability for topic models; while the field adopted automatic topic coherence as a result of that 2009 paper, this paper argues that automatic topic coherence is not a good metric for neural topic models (even though it worked for probabilistic topic models). (26% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2021_neurips_incoherence.pdf Shay B. Cohen ------------------------- Ke Zhai, Jordan Boyd-Graber, and Shay B. Cohen. Hybrid Online Inference with Adaptor Grammars. NIPS Workshop on Advances in Variational Inference, 2014. Ke Zhai, Jordan Boyd-Graber, and Shay B. Cohen. Online Adaptor Grammars with Hybrid Inference. Transactions of the Association for Computational Linguistics, 2014. http://umiacs.umd.edu/~jbg/docs/2014_tacl_ag_vb_online.pdf Shi Feng ------------------------- Pedro Rodriguez, Shi Feng, Mohit Iyyer, He He, and Jordan Boyd-Graber. Quizbowl: The Case for Incremental Question Answering. ArXiv, Preprint. http://umiacs.umd.edu/~jbg/https://arxiv.org/abs/1904.04792 Chenglei Si, Navita Goyal, Tongshuang Wu, Chen Zhao, Shi Feng, Hal Daume III, and Jordan Lee Boyd-Graber. Large Language Models Help Humans Verify Truthfulness---Except When They Are Convincingly Wrong. North American Association for Computational Linguistics, 2024. Shi Feng and Jordan Boyd-Graber. Learning to Explain Selectively: A Case Study on Question Answering. Empirical Methods in Natural Language Processing, 2022. Accessible Abstract: Many AI methods are a black box: input goes in, predictions come out. While there are many AI explanation tools that you can add to these predictions, how do you know if they are any good. In this work presented at EMNLP, if you put a human in front of a AI that's trying to answer questions, our hypothesis is that you can measure how good the underlying explanations are by how much the human's score goes up. This 2022 EMNLP publication not just measures which combinations of explanations are most effective for an individual. We use bandit exploration to quickly figure out what set of explanations best help a specific user. http://umiacs.umd.edu/~jbg/docs/2022_emnlp_augment.pdf Eric Wallace, Shi Feng, and Jordan Boyd-Graber. Misleading Failures of Partial-input Baselines. Association for Computational Linguistics, 2019. (18% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2019_acl_flipside.pdf Shi Feng and Jordan Boyd-Graber. What AI can do for me: Evaluating Machine Learning Interpretations in Cooperative Play. Intelligent User Interfaces, 2019. (25% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2019_iui_augment.pdf Eric Wallace, Pedro Rodriguez, Shi Feng, Ikuya Yamada, and Jordan Boyd-Graber. Trick Me If You Can: Human-in-the-loop Generation of Adversarial Question Answering Examples. Transactions of the Association for Computational Linguistics, 2019. http://umiacs.umd.edu/~jbg/docs/2019_tacl_trick.pdf Shi Feng, Eric Wallace, and Jordan Boyd-Graber. Interpreting Neural Networks with Nearest Neighbors. EMNLP Workshop on BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, 2018. http://umiacs.umd.edu/~jbg/http://aclweb.org/anthology/W18-5416 Shi Feng, Eric Wallace, Alvin Grissom II, Pedro Rodriguez, Mohit Iyyer, and Jordan Boyd-Graber. Pathologies of Neural Models Make Interpretation Difficult. Empirical Methods in Natural Language Processing, 2018. (26% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2018_emnlp_rs.pdf Jordan Boyd-Graber, Shi Feng, and Pedro Rodriguez. Human-Computer Question Answering: The Case for Quizbowl. The NIPS '17 Competition: Building Intelligent Systems, 2018. http://umiacs.umd.edu/~jbg/docs/2018_nips_qbcomp.pdf Sonya S. Nikolova ------------------------- Sonya S. Nikolova, Jordan Boyd-Graber, and Christiane Fellbaum. Collecting Semantic Similarity Ratings to Connect Concepts in Assistive Communication Tools. Modeling, Learning and Processing of Text Technological Data Structures, 2011. http://umiacs.umd.edu/~jbg/docs/2011_book_chapter_evocation.pdf Sonya S. Nikolova, Jordan Boyd-Graber, Christiane Fellbaum, and Perry Cook. Better Vocabularies for Assistive Communication Aids: Connecting Terms using Semantic Networks and Untrained Annotators. ACM Conference on Computers and Accessibility, 2009. (31% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/evocation-viva.pdf Xiaojuan Ma, Jordan Boyd-Graber, Sonya S. Nikolova, and Perry Cook. Speaking Through Pictures: Images vs. Icons. ACM Conference on Computers and Accessibility, 2009. (31% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/image_icon.pdf Sonya S. Nikolova, Jordan Boyd-Graber, and Perry Cook. The Design of ViVA: A Mixed-initiative Visual Vocabulary for Aphasia. Proceedings of the 27th international conference extended abstracts on Human factors in computing systems, 2009. http://umiacs.umd.edu/~jbg/docs/viva.pdf Jordan Boyd-Graber, Sonya S. Nikolova, Karyn A. Moffatt, Kenrick C. Kin, Joshua Y. Lee, Lester W. Mackey, Marilyn M. Tremaine, and Maria M. Klawe. Participatory design with proxies: Developing a desktop-PDA system to support people with aphasia. Computer-Human Interaction, 2006. (23% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/paper673-boyd-graber.pdf Srikanth, Neha Pundlik ------------------------- Srikanth, Neha Pundlik, Rupak Sarkar, Mane, Heran Y., Aparicio, Elizabeth M., Nguyen, Quynh C., Rachel Rudinger, and Jordan Boyd-Graber. Large Language Models Help Humans Verify Truthfulness---Except When They Are Convincingly Wrong. North American Association for Computational Linguistics, 2024. Mane, Heran Y., Channell Doig, Amara, Marin Gutierrez, Francia Ximena, Jasczynski, Michelle, Yue, Xiaohe, Srikanth, Neha Pundlik, Mane, Sourabh, Sun, Abby, Moats, Rachel Ann, Patel, Pragat, He, Xin, Boyd-Graber, Jordan Lee, Aparicio, Elizabeth M., and Nguyen, Quynh C.. Practical Guidance for the Development of Rosie, a Health Education Question-and-Answer Chatbot for New Mothers. Journal of Public Health Management and Practice, 2023. http://umiacs.umd.edu/~jbg/https://journals.lww.com/jphmp/fulltext/2023/09000/practical_guidance_for_the_development_of_rosie,_a.9.aspx Tak Yeon Lee ------------------------- Tak Yeon Lee, Alison Smith, Kevin Seppi, Niklas Elmqvist, Jordan Boyd-Graber, and Leah Findlater. The Human Touch: How Non-expert Users Perceive, Interpret, and Fix Topic Models. International Journal of Human-Computer Studies, 2017. http://umiacs.umd.edu/~jbg/docs/2017_ijhcs_human_touch.pdf Alison Smith, Tak Yeon Lee, Forough Poursabzi-Sangdeh, Jordan Boyd-Graber, Kevin Seppi, Niklas Elmqvist, and Leah Findlater. Evaluating Visual Representations for Topic Understanding and Their Effects on Manually Generated Labels. Transactions of the Association for Computational Linguistics, 2017. http://umiacs.umd.edu/~jbg/docs/2017_tacl_eval_tm_viz.pdf Alison Smith, Tak Yeon Lee, Forough Poursabzi-Sangdeh, Jordan Boyd-Graber, Kevin Seppi, Niklas Elmqvist, and Leah Findlater. Human-Centered and Interactive: Expanding the Impact of Topic Models. CHI Human Centred Machine Learning Workshop, 2016. Thang Nguyen ------------------------- Philip Resnik, William Armstrong, Leonardo Claudino, Thang Nguyen, Viet-An Nguyen, and Jordan Boyd-Graber. Beyond LDA: Exploring Supervised Topic Modeling for Depression-Related Language in Twitter. NAACL Workshop on Cognitive Modeling and Computational Linguistics, 2015. Thang Nguyen, Jordan Boyd-Graber, Jeff Lund, Kevin Seppi, and Eric Ringger. Is your anchor going up or down? Fast and accurate supervised topic models. North American Association for Computational Linguistics, 2015. (26% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2015_naacl_supervised_anchor.pdf Thang Nguyen, Yuening Hu, and Jordan Boyd-Graber. Anchors Regularized: Adding Robustness and Extensibility to Scalable Topic-Modeling Algorithms. Association for Computational Linguistics, 2014. (26% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2014_acl_anchor_reg.pdf Thang Nguyen, Yuening Hu, and Jordan Boyd-Graber. Evaluating Regularized Anchor Words. NIPS Workshop on Topic Models: Computation, Application, and Evaluation, 2013. Tongshuang Wu ------------------------- Chenglei Si, Navita Goyal, Tongshuang Wu, Chen Zhao, Shi Feng, Hal Daume III, and Jordan Lee Boyd-Graber. Large Language Models Help Humans Verify Truthfulness---Except When They Are Convincingly Wrong. North American Association for Computational Linguistics, 2024. Alison Smith, Jordan Boyd-Graber, Ron Fan, Melissa Birchfield, Tongshuang Wu, Dan Weld, and Leah Findlater. No Explainability without Accountability: An Empirical Study of Explanations and Feedback in Interactive ML. Computer-Human Interaction, 2020. (24% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2020_chi_explanation.pdf Varun Kumar ------------------------- Alison Smith, Varun Kumar, Jordan Boyd-Graber, Kevin Seppi, and Leah Findlater. Digging into User Control: Perceptions of Adherence and Instability in Transparent Models. Intelligent User Interfaces, 2020. (23% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2020_iui_control.pdf Varun Kumar, Alison Smith, Leah Findlater, Kevin Seppi, and Jordan Boyd-Graber. Why Didn't You Listen to Me? Comparing User Control of Human-in-the-Loop Topic Models. Association for Computational Linguistics, 2019. (18% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2019_acl_control.pdf Alison Smith, Varun Kumar, Jordan Boyd-Graber, Kevin Seppi, and Leah Findlater. User-Centered Design and Evaluation of a Human-in-the-Loop Topic Modeling System. Intelligent User Interfaces, 2018.Alison won a best student paper honorable mention (3 out of 300) (23% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2018_iui_itm.pdf Alison Smith, Varun Kumar, Jordan Boyd-Graber, Kevin Seppi, and Leah Findlater. Accounting for Input Uncertainty in Human-in-the-Loop Systems. CHI 2017 Designing for Uncertainty Workshop, 2017. http://umiacs.umd.edu/~jbg/http://visualization.ischool.uw.edu/hci_uncertainty/papers/Paper11.pdf Varun Manjunatha ------------------------- Mohit Iyyer, Varun Manjunatha, Jordan Boyd-Graber, and Larry Davis. Learning to Color from Language. North American Association for Computational Linguistics, 2018. (29% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2018_naacl_colorization.pdf Mohit Iyyer, Varun Manjunatha, Anupam Guha, Yogarshi Vyas, Jordan Boyd-Graber, Hal Daume III, and Larry Davis. The Amazing Mysteries of the Gutter: Drawing Inferences Between Panels in Comic Book Narratives. Computer Vision and Pattern Recognition, 2017. (30% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2017_cvpr_comics.pdf Mohit Iyyer, Varun Manjunatha, Jordan Boyd-Graber, and Hal Daume III. Deep Unordered Composition Rivals Syntactic Methods for Text Classification. Association for Computational Linguistics, 2015. (25% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2015_acl_dan.pdf Viet-An Nguyen ------------------------- Viet-An Nguyen, Jordan Boyd-Graber, Philip Resnik, and Kristina Miler. Tea Party in the House: A Hierarchical Ideal Point Topic Model and Its Application to Republican Legislators in the 112th Congress. Association for Computational Linguistics, 2015. Accessible Abstract: In the mid 2010s, the Republican party in the United States diverged: mainstream conservatives split from the so-called "tea party" caucus. However, the primary statistical tool for analyzing political factions in legislative bodies (ideal point models) fail to account for these changes. This is because the schism is not fully reflected in voting patterns but rather in how politicians present themselves: thus we need to extend these models to capture not just how politicians vote but also how they frame particular issues. This paper proposes a new model to capture framing differences within a voting block to start explaining the new subcoalitions of the republican caucus. (25% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2015_acl_teaparty.pdf Philip Resnik, William Armstrong, Leonardo Claudino, Thang Nguyen, Viet-An Nguyen, and Jordan Boyd-Graber. Beyond LDA: Exploring Supervised Topic Modeling for Depression-Related Language in Twitter. NAACL Workshop on Cognitive Modeling and Computational Linguistics, 2015. Viet-An Nguyen, Jordan Boyd-Graber, and Philip Resnik. Sometimes Average is Best: The Importance of Averaging for Prediction using MCMC Inference in Topic Modeling. Empirical Methods in Natural Language Processing, 2014. (30% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2014_emnlp_howto_gibbs.pdf Viet-An Nguyen, Jordan Boyd-Graber, Philip Resnik, Deborah Cai, Jennifer Midberry, and Yuanxin Wang. Modeling Topic Control to Detect Influence in Conversations using Nonparametric Topic Models. Machine Learning, 2014. http://umiacs.umd.edu/~jbg/docs/2014_mlj_influencer.pdf Viet-An Nguyen, Jordan Boyd-Graber, Philip Resnik, and Jonathan Chang. Learning a Concept Hierarchy from Multi-labeled Documents. Neural Information Processing Systems, 2014. (25% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2014_nips_l2h.pdf Viet-An Nguyen, Jordan Boyd-Graber, and Stephen Altschul. Dirichlet Mixtures, the Dirichlet Process, and the Structure of Protein Space. Journal of Computational Biology, 2013. http://umiacs.umd.edu/~jbg/docs/2013_dp_protein.pdf Viet-An Nguyen, Jordan Boyd-Graber, Jonathan Chang, and Philip Resnik. Tree-Based Label Dependency Topic Models. NIPS Workshop on Topic Models: Computation, Application, and Evaluation, 2013. Viet-An Nguyen, Jordan Boyd-Graber, and Philip Resnik. Lexical and Hierarchical Topic Regression. Neural Information Processing Systems, 2013. (25% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2013_shlda.pdf Viet-An Nguyen, Yuening Hu, Jordan Boyd-Graber, and Philip Resnik. Argviz: Interactive Visualization of Topic Dynamics in Multi-party Conversations. North American Association for Computational Linguistics, 2013. (50% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2013_argviz.pdf Viet-An Nguyen, Jordan Boyd-Graber, and Philip Resnik. SITS: A Hierarchical Nonparametric Model using Speaker Identity for Topic Segmentation in Multiparty Conversations. Association for Computational Linguistics, 2012. (19% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/acl_2012_sits.pdf Viet-An Nguyen, Jordan Boyd-Graber, and Philip Resnik. "I Want to Talk About, Again, My Record On Energy …'': Modeling Topic Control in Conversations using Speaker-centric Nonparametric Topic Models. Mid-Atlantic Student Colloquium on Speech, Language, and Learning, 2012. Weiwei Yang ------------------------- Weiwei Yang, Jordan Boyd-Graber, and Philip Resnik. A Multilingual Topic Model for Learning Weighted Topic Links Across Incomparable Corpora. Empirical Methods in Natural Language Processing, 2019. http://umiacs.umd.edu/~jbg/docs/2019_emnlp_mtm.pdf Weiwei Yang, Jordan Boyd-Graber, and Philip Resnik. Adapting Topic Models using Lexical Associations with Tree Priors. Empirical Methods in Natural Language Processing, 2017. (18% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2017_emnlp_tree_prior.pdf Weiwei Yang, Jordan Boyd-Graber, and Philip Resnik. A Discriminative Topic Model using Document Network Structure. Association for Computational Linguistics, 2016. (28% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2016_acl_docblock.pdf Weiwei Yang, Jordan Boyd-Graber, and Philip Resnik. Birds of a Feather in the Same Nest: A Discriminative Topic Model using Block-based Priors. Mid-Atlantic Student Colloquium on Speech, Language, and Learning, 2016. Weiwei Yang, Jordan Boyd-Graber, and Philip Resnik. Birds of a Feather Linked Together: A Discriminative Topic Model using Link-based Priors. Empirical Methods in Natural Language Processing, 2015. (28% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2015_emnlp_hinge_link.pdf Yoshinari Fujinuma ------------------------- Yoshinari Fujinuma, Jordan Boyd-Graber, and Katharina Kann. How Does Multilingual Pretraining Affect Cross-Lingual Transferability?. Association for Computational Linguistics, 2022. (21% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2022_acl_multilingbert.pdf Mozhi Zhang, Yoshinari Fujinuma, Michael J. Paul, and Jordan Boyd-Graber. Why Overfitting Isn't Always Bad: Retrofitting Cross-Lingual Word Embeddings to Dictionaries. Association for Computational Linguistics, 2020. Accessible Abstract: Computers need to represent words in a computer-readable way. This work talks about how slightly moving these representations for words in different languages to be closer to a small list of translations (like from a dictionary) after doing fancy machine learning works better on downstream tasks (e.g., guessing grammatical category of a word) but hurts on asking the algorithm for translations of unseen words. (17.6% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2020_acl_refine.pdf Mozhi Zhang, Yoshinari Fujinuma, and Jordan Boyd-Graber. Exploiting Cross-Lingual Subword Similarities in Low-Resource Document Classification. Association for the Advancement of Artificial Intelligence, 2020. (20.6% Acceptance Rate) http://umiacs.umd.edu/~jbg/https://arxiv.org/abs/1812.09617 Yoshinari Fujinuma, Michael Paul, and Jordan Boyd-Graber. A Resource-Free Evaluation Metric for Cross-Lingual Word Embeddings Based on Graph Modularity. Association for Computational Linguistics, 2019. (26% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2019_acl_modularity.pdf Dasha Pruss, Yoshinari Fujinuma, Ashlynn Daughton, Michael Paul, Brad Arnot, Danielle Szafir, and Jordan Boyd-Graber. Zika discourse in the Americas: A multilingual topic analysis of Twitter. PlosOne, 2019. http://umiacs.umd.edu/~jbg/https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0216922 Mozhi Zhang, Yoshinari Fujinuma, and Jordan Boyd-Graber. Exploiting Cross-Lingual Subword Similarities in Low-Resource Document Classification. ACL Workshop on Deep Learning Approaches for Low-Resource Natural Language Processing, 2018. Yuening Hu ------------------------- Aaron Gerow, Yuening Hu, Jordan Boyd-Graber, David M. Blei, and James A. Evans. Measuring Discursive Influence Across Scholarship. Proceedings of the National Academies of Science, 2018. Jordan Boyd-Graber, Yuening Hu, and David Mimno. Applications of Topic Models. 2017. http://umiacs.umd.edu/~jbg/http://www.nowpublishers.com/article/Details/INR-030 Alison Smith, Jason Chuang, Yuening Hu, Jordan Boyd-Graber, and Leah Findlater. Concurrent Visualization of Relationships between Words and Topics in Topic Models. ACL Workshop on Workshop on Interactive Language Learning, Visualization, and Interfaces, 2014. Thang Nguyen, Yuening Hu, and Jordan Boyd-Graber. Anchors Regularized: Adding Robustness and Extensibility to Scalable Topic-Modeling Algorithms. Association for Computational Linguistics, 2014. (26% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2014_acl_anchor_reg.pdf Yuening Hu, Ke Zhai, Vlad Eidelman, and Jordan Boyd-Graber. Polylingual Tree-Based Topic Models for Translation Domain Adaptation. Association for Computational Linguistics, 2014. (26% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2014_acl_ptlda_mt.pdf Yuening Hu, Jordan Boyd-Graber, Brianna Satinoff, and Alison Smith. Interactive Topic Modeling. Machine Learning, 2014. http://umiacs.umd.edu/~jbg/docs/2014_mlj_itm.pdf Thang Nguyen, Yuening Hu, and Jordan Boyd-Graber. Evaluating Regularized Anchor Words. NIPS Workshop on Topic Models: Computation, Application, and Evaluation, 2013. Yuening Hu, Ke Zhai, Vlad Edelman, and Jordan Boyd-Graber. Topic Models for Translation Domain Adaptation. NIPS Workshop on Topic Models: Computation, Application, and Evaluation, 2013. Yuening Hu, Jordan Boyd-Graber, Hal Daume III, and Z. Irene Ying. Binary to Bushy: Bayesian Hierarchical Clustering with the Beta Coalescent. Neural Information Processing Systems, 2013. (25% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2013_coalescent.pdf Viet-An Nguyen, Yuening Hu, Jordan Boyd-Graber, and Philip Resnik. Argviz: Interactive Visualization of Topic Dynamics in Multi-party Conversations. North American Association for Computational Linguistics, 2013. (50% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/2013_argviz.pdf Yuening Hu and Jordan Boyd-Graber. Efficient Tree-Based Topic Modeling. Association for Computational Linguistics, 2012. (21% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/acl_2012_fttm.pdf Yuening Hu and Jordan Boyd-Graber. Suggesting Constraints for Interactive Topic Modeling. ICML Workshop on Machine Learning in Human Computation and Crowdsourcing, 2012. Yuening Hu, Ke Zhai, Sinead Williamson, and Jordan Boyd-Graber. Modeling Images using Transformed Indian Buffet Processes. International Conference on Machine Learning, 2012. (27% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/mtibp_icml_2012.pdf Yuening Hu and Jordan Boyd-Graber. Bayesian Hierarchical Clustering with Beta Coalescents. Mid-Atlantic Student Colloquium on Speech, Language, and Learning, 2012. Yuening Hu, Jordan Boyd-Graber, and Brianna Satinoff. Interactive Topic Modeling. Association for Computational Linguistics, 2011. (25% Acceptance Rate) http://umiacs.umd.edu/~jbg/docs/itm.pdf