Hal Daumé III - about me

You enter a dark forest. Standing in front of you is:

A professor named Hal Daumé III (he/him). He wields appointments in Computer Science where he is a Volpi-Cupal Professor, as well as Language Science at UMD where he leads the TRAILS, the Institute for Trustworthy AI in Law & Society (in Fall 2025 he's teaching a grad seminar AI Agents (past: You and I, and Generative AI (S24), Trustworthy ML (F23), AI (S23), Human-AI Interaction (F22), Just ML (F21)); he was formerly also a Senior Principal Researcher at Microsoft Research NYC. He and his wonderful advisees like to study questions related to how to get machines to becomes more adept at human language (and artificial intelligence tasks more broadly), by developing models and algorithms that allow them to learn from data. (Keywords: natural language processing and machine learning.) The two major questions that really drive their research these days are:

    (1) how can we get computers to learn
        through natural interaction with people/users?

and (2) how can we do this in a way that minimize harms
        in the learned models?

He's discussed interactive learning informally in a Talking Machines Podcast and more technically in recent talks; and has discussed fairness/bias in broad terms in a (now somewhat outdated) blog post. He is the author of the online textbook A Course in Machine Learning, which is fully open source.

Hal is super fortunate to be a member of, and have awesome colleagues in the Computional Linguistics and Information Processing Lab (which he formerly directed), the Human-Computer Interaction Lab, and the Center for Machine Learning. If you want to contact him, email is your best bet; you can also find him on @haldaume3 on Twitter. Or, in person, in his office (IRB 4134).

If you're a prospective grad student or grad applicant, please read his FAQ to answer some common questions. If you're thinking of inviting him for a talk or event, please ensure that the event is organized in an inclusive manner (inclusion rider). More generally, if you are organizing a conference, workshop or other event, you may wish to read the NeurIPS D&I survey results (joint with Katherine Heller), Humberto Corona's collection of resources/advice, or two blog posts on this topic.

I acknowledge that I live and work on the ancestral and unceded lands of the Piscataway People, who were among the first in the Western Hemisphere to encounter European colonists, as well as the lands of the Lenape and Nacotchtank people.

Recent Publications:

Comment mesurer les biais politiques des grands modèles de langue multilingues?
Paul Lerner, Laurène Cave, Hal Daumé III, Léo Labat, Gaël Lejeune, Pierre-Antoine Lequeu, Benjamin Piwowarski, Nazanin Shafiabadi and François Yvon
EALM, 2025
[Abstract] [BibTeX]

Nous proposons une nouvelle méthode pour mesurer les biais politiques des grands modèles de langue multilingues pour la traduction automatique, l’aide à la rédaction et le résumé automatique. Nous nous appuyons sur une représentation dense des opinions politiques exprimées dans les textes, apprise de façon faiblement supervisée.

@inproceedings{daume25biais, title = {Comment mesurer les biais politiques des grands modèles de langue multilingues?}, author = {Paul Lerner and Laurène Cave and Daum\'e, III, Hal and Léo Labat and Gaël Lejeune and Pierre-Antoine Lequeu and Benjamin Piwowarski and Nazanin Shafiabadi and François Yvon}, booktitle = {EALM}, year = {2025}, url = {http://hal3.name/docs/#daume25biais}, }

My LLM might Mimic AAE -- But When Should it?
Sandra C. Sandoval, Christabel Acquaye, Kwesi Cobbina, Mohammad Nayeem Teli and Hal Daumé III
Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 2025
[Abstract] [BibTeX]

We examine the representation of African American English (AAE) in large language models (LLMs), exploring (a) the perceptions Black Americans have of how effective these technologies are at producing authentic AAE, and (b) in what contexts Black Americans find this desirable. Through both a survey of Black Americans ( 104) and annotation of LLM-produced AAE by Black Americans ( 228), we find that Black Americans favor choice and autonomy in determining when AAE is appropriate in LLM output. They tend to prefer that LLMs default to communicating in Mainstream U.S. English in formal settings, with greater interest in AAE production in less formal settings. When LLMs were appropriately prompted and provided in context examples, our participants found their outputs to have a level of AAE authenticity on par with transcripts of Black American speech. Select code and data for our project can be found here: this https URL

@inproceedings{daume25aae, title = {My LLM might Mimic AAE -- But When Should it?}, author = {Sandra C. Sandoval and Christabel Acquaye and Kwesi Cobbina and Mohammad Nayeem Teli and Daum\'e, III, Hal}, booktitle = {Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics (NAACL)}, year = {2025}, url = {http://hal3.name/docs/#daume25aae}, }

Exploring Collaboration to Center the Deaf Community in Sign Language AI
Rie Kamikubo, Abraham Glasser, Alex X Lu, Hal Daumé III, Hernisa Kacorri and Danielle Bragg
ASSETS, 2025
[Abstract] [BibTeX]

Sign language processing holds great promise for advancing societal inclusivity, yet it often excludes meaningful participation from the Deaf community, raising ethical and practical concerns about the applicability of AI solutions to their needs. This paper addresses these gaps through two interrelated studies. First, surveys identify differences in priorities and expectations between machine learning (ML) practitioners and Deaf American Sign Language (ASL) signers. Second, paired co-design sessions bring ML and ASL experts together to generate guiding questions that support practices for aligning AI development with community goals. Our findings reveal critical points of friction that reflect deeper systemic and epistemic barriers to effective collaboration. By synthesizing unique and shared insights from both groups, we provide empirically grounded resources to guide collaborative frameworks that promote the agency and expertise of the Deaf community. This research paves actionable pathways toward equitable, community-centered advancements in AI.

@inproceedings{daume25collaboration, title = {Exploring Collaboration to Center the Deaf Community in Sign Language AI}, author = {Rie Kamikubo and Abraham Glasser and Alex X Lu and Daum\'e, III, Hal and Hernisa Kacorri and Danielle Bragg}, booktitle = {ASSETS}, year = {2025}, url = {http://hal3.name/docs/#daume25collaboration}, }

Beyond Blanket Masking: Examining Granularity for Privacy Protection in Images Captured by Blind and Low Vision Users
Jeffri Murrugarra-Llerena, Haoran Niu, Suzanne Barber, Hal Daumé III, Yang Trista Cao and Paola Cascante-Bonilla
COLM, 2025
[Abstract] [BibTeX]

As visual assistant systems powered by visual language models (VLMs) become more prevalent, concerns over user privacy have grown, particularly for blind and low vision users who may unknowingly capture personal private information in their images. Existing privacy protection methods rely on coarse-grained segmentation, which uniformly masks entire private objects, often at the cost of usability. In this work, we propose FiGPriv, a fine-grained privacy protection framework that selectively masks only high-risk private information while preserving low-risk information. Our approach integrates fine-grained segmentation with a data-driven risk scoring mechanism. We evaluate our framework using the BIV-Priv-Seg dataset and show that FiG-Priv preserves +26% of image content, enhancing the ability of VLMs to provide useful responses by 11% and identify the image content by 45%, while ensuring privacy protection. Project Page: this https URL

@inproceedings{daume25masking, title = {Beyond Blanket Masking: Examining Granularity for Privacy Protection in Images Captured by Blind and Low Vision Users}, author = {Jeffri Murrugarra-Llerena and Haoran Niu and Suzanne Barber and Daum\'e, III, Hal and Yang Trista Cao and Paola Cascante-Bonilla}, booktitle = {COLM}, year = {2025}, url = {http://hal3.name/docs/#daume25masking}, }

Language Models Predict Empathy Gaps Between Social In-groups and Out-groups
Yu Hou, Hal Daumé III and Rachel Rudinger
Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 2025
[Abstract] [BibTeX]

Studies of human psychology have demonstrated that people are more motivated to extend empathy to in-group members than out-group members (Cikara et al., 2011). In this study, we investigate how this aspect of intergroup relations in humans is replicated by LLMs in an emotion intensity prediction task. In this task, the LLM is given a short description of an experience a person had that caused them to feel a particular emotion; the LLM is then prompted to predict the intensity of the emotion the person experienced on a numerical scale. By manipulating the group identities assigned to the LLM's persona (the "perceiver") and the person in the narrative (the "experiencer"), we measure how predicted emotion intensities differ between in-group and out-group settings. We observe that LLMs assign higher emotion intensity scores to in-group members than out-group members. This pattern holds across all three types of social groupings we tested: race/ethnicity, nationality, and religion. We perform an in-depth analysis on Llama-3.1-8B, the model which exhibited strongest intergroup bias among those tested.

@inproceedings{daume25empathygap, title = {Language Models Predict Empathy Gaps Between Social In-groups and Out-groups}, author = {Yu Hou and Daum\'e, III, Hal and Rachel Rudinger}, booktitle = {Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics (NAACL)}, year = {2025}, url = {http://hal3.name/docs/#daume25empathygap}, }

More papers please!

Recent Talks:

AI UK: Doing better in data science – from algorithmic fairness to diversity
Anjali Mazumder, Shakir Mohamed, Danielle Belgrave, Maria De-Arteaga, and Hal Daumé III
The Alan Turing Institute AI UK Roadmap, March 2021
[Video]

Coded Bias Panel Discussion at the University of Maryland
Margrét Bjarnadóttir, Nicol Turner Lee, Deborah Raji, Adam Wenchel, and Hal Daumé III (moderator)
March, 2021
[Video]

Responsible AI Systems and Experiences
Abolfazl Asudeh (moderator), Hal Daumé III, Golnoosh Farnadi, Bernease Herman, Bill Howe (moderator), Yuval Moskovitch, Katie Shilton, and Jenn Wortman Vaughan
Panel at VLDB 2021
[Video]

Tech Ethics in a Changing World
Catherine Bannister, Mary Lacity, Cindy Moehring, and Hal Daumé III
Northwest Arkansas Tech Summit, 2021
[Video]

Language (Technology) Is Power: Exploring the Inherent Complexity of NLP Systems
Hal Daumé III and Sam Charrington (host)
TWIML AI Podcast, 2020
[Video]

More talks please!

Contact information:

    email: me AT hal3 DOT name               skype: haldaume3
    phone: 301-405-1073                    twitter: haldaume3
   office: IRB 4150                         github: hal3

I can't reply to all prospective students email; please read this before emailing me.

credits: design and font inspired by Seth Able's LoRD, some images converted to ANSI using ManyTools, original drawing of me by anonymous.
last updated on thirty one october, two thousand twenty five.