I am an associate professor in the University of Maryland Computer Science Department (tenure home), Institute of Advanced Computer Studies, iSchool, and Language Science Center. Previously, I was an assistant professor at Colorado's Department of Computer Science (tenure granted in 2017). I was a graduate student at Princeton with David Blei.

My research focuses on making machine learning more useful, more interpretable, and able to learn and interact from humans. This helps users sift through decades of documents; discover when individuals lie, reframe, or change the topic in a conversation; or to compete against humans in games that are based in natural language.

Sign up for an appointment

Recent Publications

  • Fenfei Guo, Chen Zhang, Zhirui Zhang, Qixin He, Kejun Zhang, Jun Xie, and Jordan Boyd-Graber. Automatic Song Translation for Tonal Languages. Findings of the Association for Computational Linguistics, 2022. [Translation Examples (with sound)] [Code] [Bibtex]
  • Chenglei Si, Chen Zhao, Sewon Min, and Jordan Boyd-Graber. Re-Examining Calibration: The Case of Question Answering. Findings of Empirical Methods in Natural Language Processing, 2022. [Code] [Research Talk] [Bibtex]
    Accessible Abstract: Calibration is an important problem in question answering: if a search engine or virtual assistant doesn't know the answer to a question, you should probably abstain from showing an answer (to save embarassment, as when Google said a horse had six legs). This EMNLP Findings paper shows that existing metrics to test how good a QA calibration push calibrated confidence toward the average confidence. We proposed an alternate method both for evaluation and to generate better calibration by looking how models change as they learn.
  • Wanrong He, Andrew Mao, and Jordan Boyd-Graber. Cheater's Bowl: Human vs. Computer Search Strategies for Open-Domain QA. Findings of Empirical Methods in Natural Language Processing, 2022. [Code] [Data] [Research Talk] [Bibtex]
    Accessible Abstract: When the Covid pandemic it, trivia games moved online. With it came cheating: people tried to quickly Google answers. This is bad for sportsmanship, but a good source of training data for helping teach computers how to find answers. We built an interface to harvest this training data from trivia players, fed these into retrieval-based QA systems, showing that these queries were better than the automatically generated queries used by the current state of the art.
  • Peter Jansen and Jordan Boyd-Graber. Picard understanding Darmok: A Dataset and Model for Metaphor-Rich Translation in a Constructed Language. Figurative Language Workshop 2022 @EMNLP, 2022. [Code and Data] [Research Talk] [Bibtex]
  • HyoJung Han, Marine Carpuat, and Jordan Boyd-Graber. SimQA: Detecting Simultaneous MT Errors through Word-by-Word Question Answering. Empirical Methods in Natural Language Processing, 2022. [Code] [Research Talk] [Bibtex]
    Accessible Abstract: Simultaneous interpretation (where a translation happens word by word before the source sentence is finished) is difficult to evaluate. We created a new evaluation framework based on the following scenario: imagine that you're thrown into a trivia gameshow where you don't know the language. Specifically, it's a game format where you interrupt the question word by word as soon as possible. Our hypothesis is that a monolingual player (who doesn't speak the source language) will be able to do better in the game with a better simultaneous translation system. In this 2022 EMNLP publication, we show that this evaluation is not only cheaper (you just need to translate the answer) but can also detect hallucinations and undertranslations better than existing evaluation methods.
  • Shi Feng and Jordan Boyd-Graber. Learning to Explain Selectively: A Case Study on Question Answering. Empirical Methods in Natural Language Processing, 2022. [Research Teaser] [Code and Data] [Bibtex]
    Accessible Abstract: Many AI methods are a black box: input goes in, predictions come out. While there are many AI explanation tools that you can add to these predictions, how do you know if they are any good. In this work presented at EMNLP, if you put a human in front of a AI that's trying to answer questions, our hypothesis is that you can measure how good the underlying explanations are by how much the human's score goes up. This 2022 EMNLP publication not just measures which combinations of explanations are most effective for an individual. We use bandit exploration to quickly figure out what set of explanations best help a specific user.
  • Yoshinari Fujinuma, Jordan Boyd-Graber, and Katharina Kann. How Does Multilingual Pretraining Affect Cross-Lingual Transferability?. Association for Computational Linguistics, 2022. [Code] [Bibtex]
  • Michelle Yuan, Patrick Xia, Chandler May, Benjamin Van Durme, and Jordan Boyd-Graber. Adapting Coreference Resolution Models through Active Learning. Association for Computational Linguistics, 2022. [Code] [Bibtex]
  • Benjamin Börschinger, Jordan Boyd-Graber, Christian Buck, Jannis Bulian, Massimiliano Ciaramita, Michelle Chen Huebscher, Wojciech Gajewski, Yannic Kilcher, Rodrigo Nogueira, and Lierni Sestorain Saralegu. Meta Answering for Machine Reading. ArXiv, Preprint. [Preprint] [Bibtex]
  • Pedro Rodriguez, Shi Feng, Mohit Iyyer, He He, and Jordan Boyd-Graber. Quizbowl: The Case for Incremental Question Answering. ArXiv, Preprint. [Webpage] [Bibtex]
Jordan Boyd-Graber