I am a full professor in the University of Maryland Computer Science Department (tenure home), Institute of Advanced Computer Studies, iSchool, and Language Science Center.

My research focuses on making machine learning more useful, more interpretable, and able to learn and interact from humans. This helps users sift through decades of documents; discover when individuals lie, reframe, or change the topic in a conversation; or to compete against humans in games that are based in natural language.

Book a meeting with me (collaborators and UMD students).

Recent Publications

  • Neha Punklik Srikanth, Rupak Sarkar, Mane, Heran Y., Aparicio, Elizabeth M., Nguyen, Quynh C., Rachel Rudinger, and Jordan Boyd-Graber. Pregnant Questions: The Importance of Pragmatic Awareness in Maternal Health Question Answering. North American Association for Computational Linguistics, 2024. [Code and Data] [Bibtex]
  • Chenglei Si, Navita Goyal, Tongshuang Wu, Chen Zhao, Shi Feng, Hal Daumé III, and Jordan Boyd-Graber. Large Language Models Help Humans Verify Truthfulness---Except When They Are Convincingly Wrong. North American Association for Computational Linguistics, 2024. [Bibtex]
  • Alvin Grissom II, Jo Shoemaker, Benjamin Goldman, Ruikang Shi, Craig Stewart, C. Anton Rytting, Leah Findlater, Jordan Boyd-Graber, Wenyan Li, Alvin Grissom II, and Jordan Boyd-Graber. Rapidly Piloting Real-time Linguistic Assistance for Simultaneous Interpreters with Untrained Bilingual Surrogates. Linguistic Resources and Evaluation Conference, 2024. [Bibtex]
  • Quynh C. Nguyen, Elizabeth M. Aparicio, Michelle Jasczynski, Amara Channell Doig, Xiaohe Yue, Heran Mane, Neha Punklik Srikanth, Francia Ximena Marin Gutierrez, Nataly Delcid, Xin He, and Jordan Boyd-Graber. Randomized Pilot of Rosie, a Health Education Question-and-Answer Chatbot for New Mothers. Journal of Medical Internet Research: Journal of Formative Research, 2024. [Bibtex]
  • Ishani Mondal, Zongxia Li, Yufang Hou, Anandhavelu Natarajan, Aparna Garimella, Sambaran Bandyopadhyay, and Jordan Boyd-Graber. SciDoc2Diagrammer-MAF: Towards Generation of Scientific Diagrams from Documents guided by Multi-Aspect Feedback Refinement. Findings of the Empirical Methods in Natural Language Processing, 2024. [Bibtex]
  • Zongxia Li, Ishani Mondal, Huy Nghiem, Yijun Liang, and Jordan Boyd-Graber. PEDANTS (Precise Evaluations of Diverse Answer Nominee Text for Skinflints): Use Evaluation Metrics Wisely---Efficient Evaluation Analysis and Benchmarking for Open-Domain Question Answering. Findings of the Empirical Methods in Natural Language Processing, 2024. [Bibtex]
  • Xiyang Wu, Tianrui Guan, Dianqi Li, Shuaiyi Huang, Xiaoyu Liu, Xijun Wang, Ruiqi Xian, Abhinav Shrivastava, Furong Huang, Jordan Boyd-Graber, Tianyi Zhou, and Dinesh Manocha. AUTOHALLUSION: Automatic Generation of Hallucination Benchmarks for Vision-Language Models. Findings of the Empirical Methods in Natural Language Processing, 2024. [Bibtex]
  • Zongxia Li, Andrew Mao, Daniel Kofi Stephens, Pranav Goel, Emily Walpole, Juan Francisco Fung, Alden Dima, and Jordan Lee Boyd-Graber. TENOR: Topic Enabled Neural Organization and Recommendation: Evaluating Topic Models in Task Based Settings. European Association for Computational Linguistics, 2024. [Bibtex]
  • Ishani Mondal, Shwetha S, Anandhavelu Natarajan, Aparna Garimella, Sambaran Bandyopadhyay, and Jordan Boyd-Graber. Presentations by the People, for the People: Harnessing LLMs for Generating Persona-Aware Slides from Documents. European Association for Computational Linguistics, 2024. [Bibtex]
  • Tasnim Kabir, Yoo Yeon Sung, Saptarashmi Bandyopadhyay, Hao Zou, Abhranil Chandra, and Jordan Lee Boyd-Graber. You Make me Feel like a Natural Question: Training QA Systems on Transformed Trivia Questions. Empirical Methods in Natural Language Processing, 2024. [ArXiv] [Research Talk] [Bibtex]
    Accessible Abstract: Many of the questions for training AIs how to answer questions come from the queries users type into search engines (like Google's Natural Questions). Is there a cheaper---perhaps even better---way? We propose a "naturalization" technique to turn high-quality, rigorously edited trivia questions into examples that resembles Natural Questions. Training on our naturalized questions and testing on natural questions comes close to the results with using Natural Questions, and we can improve results on MMLU (a standard modern evaluation set) by using our data.
  • Matthew Shu, Nishant Balepur, Shi Feng, and Jordan Boyd-Graber. KARL: Knowledge-Aware Retrieval and Representations aid Retention and Learning in Students. Empirical Methods in Natural Language Processing, 2024. [Code and Data] [Research Talk] [Bibtex]
    Accessible Abstract: Flashcard help students study by figuring out which flashcards to show students and when. However, current systems do not pay attention to what information (the actual text of the flashcards) to make these predictions. This paper introduces KARL, a new flashcard scheduler that uses language models to encode the text of flashcards. We host KARL in our own flashcard app for 500+ learners and show that students using KARL learn more efficiently than when they use other traditional systems that only know, for example, that a student has studied Flashcard \#24601 on Monday and got it wrong.
  • Maharshi Gor, Hal Daumé III Tianyi Zhou, and Jordan Boyd-Graber. Do great minds think alike? Investigating Human-AI Complementarity in Question Answering with CAIMIRA. Empirical Methods in Natural Language Processing, 2024. [Talk] [Code] [Data] [Bibtex]
    Accessible Abstract: CAIMIRA discovers the skills that humans and AIs use to answer questions. By scraping websites where trivia nerds answer really difficult questions and posing those questions to AI models like GPT-4 and LLaMA-3-70B, while humans excel in knowledge-based abductive reasoning, AI outperforms on fact-based historical recall. This research suggests future challenges should focus on more complex reasoning and nuanced language tasks to better align AI development with human cognitive strengths.
  • Nishant Balepur, Matthew Shu, Alexander Hoyle, Alison Robey, Shi Feng, Seraphina Goldfarb-Tarrant, and Jordan Boyd-Graber. A SMART Mnemonic Sounds like "Glue Tonic": Mixing LLMs with Student Feedback to Make Mnemonic Learning Stick. Empirical Methods in Natural Language Processing, 2024. [Code and Data] [Research Talk] [Bibtex]
    Accessible Abstract: Learning vocabulary (e.g., benevolent) can be tedious, but using mnemonics (e.g., benevolent sounds like "benefits," and a kind boss gives benefits) makes it more engaging and effective. This paper introduces SMART, a large language model trained to produce mnemonics based on feedback from flashcard learners. Students struggle to predict which mnemonics will help them most. Still, by training SMART on both student preferences and learning outcomes, we can generate mnemonics as effectively as GPT-4, but at a much lower cost.
  • Wichayaporn Wongkamjan and Feng Gu and Yanze Wang and Ulf Hermjakob and Jonathan May and Brandon M. Stewart and Jonathan K. Kummerfeld and Denis Peskoff and Jordan Lee Boyd-Graber. More Victories, Less Cooperation: Assessing Cicero’s Diplomacy Play. Association for Computational Linguistics, 2024. [Bibtex]
    Accessible Abstract: Meta's recent AI, Cicero, grabbed headlines by its ability to beat humans at the game of Diplomacy: notable because players of the game not just need to make the right moves but also need to negotiate with each other in natural language. This paper investigates why it wins so many games, measuring its ability to persuade and trick other players. While Cicero wins just about every game, this is because of superhuman strategy, not superhuman communication, suggesting there is still further room for developing Diplomacy-playing AIs.
  • Yoo Yeon Sung, Eve Fleisig, Ishani Mondal, and Jordan Lee Boyd-Graber. ADVSCORE: A Metric for the Evaluation and Creation of Adversarial Benchmarks. ArXiv, Preprint. [Bibtex]
  • Benjamin Börschinger, Jordan Boyd-Graber, Christian Buck, Jannis Bulian, Massimiliano Ciaramita, Michelle Chen Huebscher, Wojciech Gajewski, Yannic Kilcher, Rodrigo Nogueira, and Lierni Sestorain Saralegu. Meta Answering for Machine Reading. ArXiv, Preprint. [Preprint] [Bibtex]
  • Pedro Rodriguez, Shi Feng, Mohit Iyyer, He He, and Jordan Boyd-Graber. Quizbowl: The Case for Incremental Question Answering. ArXiv, Preprint. [Webpage] [Bibtex]
Jordan Boyd-Graber