Hal Daumé III - about me

You enter a dark forest. Standing in front of you is:

A professor named Hal Daumé III (he/him). He wields appointments in Computer Science where he is a Volpi-Cupal Professor, as well as Language Science at UMD where he leads the TRAILS, the Institute for Trustworthy AI in Law & Society (in Fall 2025 he's teaching a grad seminar AI Agents (past: You and I, and Generative AI (S24), Trustworthy ML (F23), AI (S23), Human-AI Interaction (F22), Just ML (F21)); he was formerly also a Senior Principal Researcher at Microsoft Research NYC. He and his wonderful advisees like to study questions related to how to get machines to becomes more adept at human language (and artificial intelligence tasks more broadly), by developing models and algorithms that allow them to learn from data. (Keywords: natural language processing and machine learning.) The two major questions that really drive their research these days are:

    (1) how can we get computers to learn
        through natural interaction with people/users?

and (2) how can we do this in a way that minimize harms
        in the learned models?

He's discussed interactive learning informally in a Talking Machines Podcast and more technically in recent talks; and has discussed fairness/bias in broad terms in a (now somewhat outdated) blog post. He is the author of the online textbook A Course in Machine Learning, which is fully open source.

Hal is super fortunate to be a member of, and have awesome colleagues in the Computional Linguistics and Information Processing Lab (which he formerly directed), the Human-Computer Interaction Lab, and the Center for Machine Learning. If you want to contact him, email is your best bet; you can also find him on @haldaume3 on Twitter. Or, in person, in his office (IRB 4134).

If you're a prospective grad student or grad applicant, please read his FAQ to answer some common questions. If you're thinking of inviting him for a talk or event, please ensure that the event is organized in an inclusive manner (inclusion rider). More generally, if you are organizing a conference, workshop or other event, you may wish to read the NeurIPS D&I survey results (joint with Katherine Heller), Humberto Corona's collection of resources/advice, or two blog posts on this topic.

I acknowledge that I live and work on the ancestral and unceded lands of the Piscataway People, who were among the first in the Western Hemisphere to encounter European colonists, as well as the lands of the Lenape and Nacotchtank people.

Recent Publications:

Say It My Way: Exploring Control in Conversational Visual Question Answering with Blind Users
Farnaz Zamiri Zeraati, Yang Cao, Yuehan Qiao, Hal Daumé III and Hernisa Kacorri
CHI, 2026
[Abstract] [BibTeX]

Prompting and steering techniques are well established in general-purpose generative AI, yet assistive visual question answering (VQA) tools for blind users still follow rigid interaction patterns with limited opportunities for customization. User control can be helpful when system responses are misaligned with their goals and contexts, a gap that becomes especially consequential for blind users that may rely on these systems for access. We invite 11 blind users to customize their interactions with a real-world conversational VQA system. Drawing on 418 interactions, reflections, and post-study interviews, we analyze prompting-based techniques participants adopted, including those introduced in the study and those developed independently in real-world settings. VQA interactions were often lengthy: participants averaged 3 turns, sometimes up to 21, with input text typically tenfold shorter than the responses they …

@inproceedings{daume26sayit, title = {Say It My Way: Exploring Control in Conversational Visual Question Answering with Blind Users}, author = {Farnaz Zamiri Zeraati and Yang Cao and Yuehan Qiao and Daum\'e, III, Hal and Hernisa Kacorri}, booktitle = {CHI}, year = {2026}, url = {http://hal3.name/docs/#daume26sayit}, }

Surveilling Suitability: How AI Hiring Interviews Impact Job Seekers with Disabilities
Vaishnav Kameswaran, Valentina Hong, Jazmin Clark, Yu Hou, Hal Daumé III and Katie Shilton
CHI, 2026
[Abstract] [BibTeX]

AI hiring interviews, asynchronous video recording platforms that use AI to assess candidate suitability, are increasingly used by employers to streamline hiring processes. These platforms often promise to standardize assessments and mitigate subjective biases in hiring decisions. Yet, little is known about how these technologies are perceived and experienced by people with disabilities, a group historically underrepresented in the workforce and particularly vulnerable to injustices perpetuated by technology. To address this gap, we conducted focus groups and semi-structured interviews with 19 people with disabilities. We found that people with disabilities perceive and experience discrimination by AI hiring interviews that: 1) center normative characteristics, 2) exacerbate information asymmetries, 3) undermine autonomy, and 4) intrude on privacy. We use the analytical frame of surveillance to interrogate the role of AI in reconfiguring social relations between job seekers and employers. We discuss implications of our work for design and policy.

@inproceedings{daume26surveilling, title = {Surveilling Suitability: How AI Hiring Interviews Impact Job Seekers with Disabilities}, author = {Vaishnav Kameswaran and Valentina Hong and Jazmin Clark and Yu Hou and Daum\'e, III, Hal and Katie Shilton}, booktitle = {CHI}, year = {2026}, url = {http://hal3.name/docs/#daume26surveilling}, }

SMARTER: A Data-efficient Framework to Improve Toxicity Detection with Explanation via Self-augmenting Large Language Models
Huy Nghiem, Advik Sachdeva and Hal Daumé III
Conference of the Association for Computational Linguistics (ACL), 2026
[Abstract] [BibTeX]

To address the proliferation of toxic content on social media, we introduce SMARTER, we introduce SMARTER, a data-efficient two-stage framework for explainable content moderation using Large Language Models (LLMs). In Stage 1, we leverage LLMs' own outputs to generate synthetic explanations for both correct and incorrect labels, enabling alignment via preference optimization with minimal human supervision. In Stage 2, we refine explanation quality through cross-model training, allowing weaker models to align stylistically and semantically with stronger ones. Experiments on three benchmark tasks -- HateXplain, Latent Hate, and Implicit Hate -- demonstrate that SMARTER enables LLMs to achieve up to a 13% macro-F1 improvement over standard few-shot baselines while using only a fraction of the full training data. Our framework offers a scalable strategy for low-resource settings by harnessing LLMs' self-improving capabilities for both classification and explanation.

@inproceedings{daume26smarter, title = {SMARTER: A Data-efficient Framework to Improve Toxicity Detection with Explanation via Self-augmenting Large Language Models}, author = {Huy Nghiem and Advik Sachdeva and Daum\'e, III, Hal}, booktitle = {Proceedings of the Conference of the Association for Computational Linguistics (ACL)}, year = {2026}, url = {http://hal3.name/docs/#daume26smarter}, }

Steering Safely or Off a Cliff? Rethinking Specificity and Robustness in Inference-Time Interventions
Navita Goyal and Hal Daumé III
EACL, 2026
[Abstract] [BibTeX]

Model steering, which involves intervening on hidden representations at inference time, has emerged as a lightweight alternative to fine-tuning for precisely controlling large language models. While steering efficacy has been widely studied, evaluations of specificity—whether interventions alter only the intended property—remain limited, especially for potential degradation in behaviors related to the target one. We propose a framework that distinguishes three dimensions of specificity: general (preserving fluency and unrelated abilities), control (preserving related control properties), and robustness (preserving control properties under distribution shifts). We use overrefusal steering as a safety-critical case study and show that while steering consistently reduces overrefusal without harming general abilities and often preserves refusal on harmful queries, it fails on robustness: interventions substantially increase jailbreak vulnerability, even when safety is explicitly controlled. Our work provides the first systematic evaluation of specificity robustness in model steering, showing that standard efficacy and specificity checks are insufficient. Without robustness evaluation, steering methods that appear safe in-distribution may in fact compromise model safety.

@inproceedings{daume26steering, title = {Steering Safely or Off a Cliff? Rethinking Specificity and Robustness in Inference-Time Interventions}, author = {Navita Goyal and Daum\'e, III, Hal}, booktitle = {EACL}, year = {2026}, url = {http://hal3.name/docs/#daume26steering}, }

Can You Make It Sound Like You? Post-Editing LLM-Generated Text for Personal Style
Connor Baumler, Calvin Bao, Huy Nghiem, Xinchen Yang, Marine Carpuat and Hal Daumé III
Conference of the Association for Computational Linguistics (ACL), 2026
[Abstract] [BibTeX]

Despite the growing use of large language models (LLMs) for writing tasks, users may hesitate to rely on LLMs when personal style is important. Post-editing LLM-generated drafts or translations is a common collaborative writing strategy, but it remains unclear whether users can effectively reshape LLM-generated text to reflect their personal style. We conduct a pre-registered online study (n = 81) in which participants post-edit LLM-generated drafts for writing tasks where personal style matters to them. Using embedding-based style similarity metrics, we find that post-editing increases stylistic similarity to participants’ unassisted writing and reduces similarity to fully LLM-generated output. However, post-edited text still remains stylistically closer in style to LLM text than to participants’ unassisted control text, and it exhibits reduced stylistic diversity compared to unassisted human text. We find a gap between perceived stylistic authenticity and model-measured stylistic similarity, with post-edited text often perceived as representative of participants’ personal style despite remaining detectable LLM stylistic traces.

@inproceedings{daume26postedit, title = {Can You Make It Sound Like You? Post-Editing LLM-Generated Text for Personal Style}, author = {Connor Baumler and Calvin Bao and Huy Nghiem and Xinchen Yang and Marine Carpuat and and Daum\'e, III, Hal}, booktitle = {Proceedings of the Conference of the Association for Computational Linguistics (ACL)}, year = {2026}, url = {http://hal3.name/docs/#daume26postedit}, }

More papers please!

Recent Talks:

AI UK: Doing better in data science – from algorithmic fairness to diversity
Anjali Mazumder, Shakir Mohamed, Danielle Belgrave, Maria De-Arteaga, and Hal Daumé III
The Alan Turing Institute AI UK Roadmap, March 2021
[Video]

Coded Bias Panel Discussion at the University of Maryland
Margrét Bjarnadóttir, Nicol Turner Lee, Deborah Raji, Adam Wenchel, and Hal Daumé III (moderator)
March, 2021
[Video]

Responsible AI Systems and Experiences
Abolfazl Asudeh (moderator), Hal Daumé III, Golnoosh Farnadi, Bernease Herman, Bill Howe (moderator), Yuval Moskovitch, Katie Shilton, and Jenn Wortman Vaughan
Panel at VLDB 2021
[Video]

Tech Ethics in a Changing World
Catherine Bannister, Mary Lacity, Cindy Moehring, and Hal Daumé III
Northwest Arkansas Tech Summit, 2021
[Video]

Language (Technology) Is Power: Exploring the Inherent Complexity of NLP Systems
Hal Daumé III and Sam Charrington (host)
TWIML AI Podcast, 2020
[Video]

More talks please!

Contact information:

    email: me AT hal3 DOT name               skype: haldaume3
    phone: 301-405-1073                    twitter: haldaume3
   office: IRB 4150                         github: hal3

I can't reply to all prospective students email; please read this before emailing me.

credits: design and font inspired by Seth Able's LoRD, some images converted to ANSI using ManyTools, original drawing of me by anonymous.
last updated on eleven may, two thousand twenty six.