You enter a
dark forest. Standing in front of you is:
A professor named Hal Daumé III (he/him).
He wields appointments in
Computer Science where he is a
Volpi-Cupal Professor, as well as
Language Science at
UMD
where he leads the TRAILS, the Institute for Trustworthy AI in Law & Society
(in Fall 2023 he's teaching Trustworthy ML;
past: AI (S23),
Human-AI Interaction (F22),
Just ML (F21)); he is also a Senior Principal Researcher the machine learning and fairness
groups at Microsoft Research NYC.
He and his wonderful advisees
like to study
questions related to how to get machines to becomes more adept at
human language (and artificial intelligence tasks more broadly),
by developing models and algorithms that allow them
to learn from data. (Keywords: natural language processing and machine
learning.)
The two major questions that really drive their research these days are:
(1) how can we get computers to learn
through natural interaction with people/users?
and (2) how can we do this in a way that minimize harms
in the learned models?
He's discussed interactive learning informally in a Talking Machines Podcast
and more technically in recent talks;
and has discussed fairness/bias in broad terms in a (now somewhat outdated) blog post.
He is the author of the online textbook A Course in Machine Learning,
which is fully open source.
Hal is super fortunate to be a member of, and have awesome colleagues in the Computional
Linguistics and Information Processing Lab (which he formerly
directed),
the Human-Computer Interaction Lab,
and the Center for Machine Learning.
If you want to contact him, email is your best bet; you can
also find him on @haldaume3
on Twitter. Or, in person, in his office
(IRB 4150).
If you're a prospective grad student or grad applicant, please read
his FAQ to answer some common questions.
If you're thinking of inviting him for a talk or event, please ensure
that the event is organized in an inclusive manner (inclusion rider).
More generally, if you are organizing a conference, workshop or other
event, you may wish to read the NeurIPS D&I survey
results (joint with Katherine Heller),
Humberto Corona's collection of resources/advice,
or two blog posts on this topic.
I acknowledge that I live and work on the ancestral and unceded lands of the Piscataway People, who were among the first in the Western Hemisphere to encounter European colonists, as well as the lands of the Lenape and Nacotchtank people.
Recent Publications:
It Takes Two to Tango: Navigating Conceptualizations of NLP Tasks and Measurements of Performance
Arjun Subramonian, Xingdi Yuan, Hal Daumé III and Su Lin Blodgett
Conference of the Association for Computational Linguistics (ACL), 2023
[Abstract] [BibTeX]
Progress in NLP is increasingly measured through benchmarks; hence, contextualizing progress requires understanding when and why practitioners may disagree about the validity of benchmarks. We develop a taxonomy of disagreement, drawing on tools from measurement modeling, and distinguish between two types of disagreement: 1) how tasks are conceptualized and 2) how measurements of model performance are operationalized. To provide evidence for our taxonomy, we conduct a meta-analysis of relevant literature to understand how NLP tasks are conceptualized, as well as a survey of practitioners about their impressions of different factors that affect benchmark validity. Our meta-analysis and survey across eight tasks, ranging from coreference resolution to question answering, uncover that tasks are generally not clearly and consistently conceptualized and benchmarks suffer from operationalization disagreements. These findings support our proposed taxonomy of disagreement. Finally, based on our taxonomy, we present a framework for constructing benchmarks and documenting their limitations.
@inproceedings{daume23conceptualizations,
title = {It Takes Two to Tango: Navigating Conceptualizations of NLP Tasks and
Measurements of Performance},
author = {Arjun Subramonian and Xingdi Yuan and Daum\'e, III, Hal and Su Lin
Blodgett},
booktitle = {Proceedings of the Conference of the Association for Computational
Linguistics (ACL)},
year = {2023},
url = {http://hal3.name/docs/#daume23conceptualizations},
}
FairPrism: Evaluating Fairness-Related Harms in Text Generation
Eve Fleisig, Aubrie Amstutz, Chad Atalla, Su Lin Blodgett, Hal Daumé III, Alexandra Olteanu, Emily Sheng, Dan Vann and Hanna Wallach
Conference of the Association for Computational Linguistics (ACL), 2023
[Abstract] [BibTeX]
It is critical to measure and mitigate fairness- related harms caused by AI text generation systems, including stereotyping and demeaning harms. To that end, we introduce FairPrism, a dataset of 5,000 examples of AI-generated English text with detailed human annotations covering a diverse set of harms relating to gender and sexuality. FairPrism aims to address several limitations of existing datasets for measuring and mitigating fairness-related harms, including improved transparency, clearer specification of dataset coverage, and accounting for annotator disagreement and harms that are context-dependent. FairPrism’s annotations include the extent of stereotyping and demeaning harms, the demographic groups targeted, and appropriateness for different applications. The annotations also include specific harms that occur in interactive contexts and harms that raise normative concerns when the “speaker” is an AI system. Due to its precision and granularity, FairPrism can be used to diagnose (1) the types of fairness- related harms that AI text generation systems cause, and (2) the potential limitations of mitigation methods, both of which we illustrate through case studies. Finally, the process we followed to develop FairPrism offers a recipe for building improved datasets for measuring and mitigating harms caused by AI systems.
@inproceedings{daume23fairprism,
title = {FairPrism: Evaluating Fairness-Related Harms in Text Generation},
author = {Eve Fleisig and Aubrie Amstutz and Chad Atalla and Su Lin Blodgett and
Daum\'e, III, Hal and Alexandra Olteanu and Emily Sheng and Dan
Vann and Hanna Wallach},
booktitle = {Proceedings of the Conference of the Association for Computational
Linguistics (ACL)},
year = {2023},
url = {http://hal3.name/docs/#daume23fairprism},
}
Evaluating the Social Impact of Generative AI Systems in Systems and Society
Irene Solaiman, Zeerak Talat, William Agnew, Lama Ahmad, Dylan Baker, Su Lin Blodgett, Hal Daumé III, Jesse Dodge, Ellie Evans, Sara Hooker, Yacine Jernite, Alexandra Sasha Luccioni, Alberto Lusoli, Margaret Mitchell, Jessica Newman, Marie-Therese Png, Andrew Strait and Aposotol Vassilev
Preprint, 2023
[Abstract] [BibTeX]
Generative AI systems across modalities, ranging from text, image, audio, and video, have broad social impacts, but there exists no official standard for means of evaluating those impacts and which impacts should be evaluated. We move toward a standard approach in evaluating a generative AI system for any modality, in two overarching categories: what is able to be evaluated in a base system that has no predetermined application and what is able to be evaluated in society. We describe specific social impact categories and how to approach and conduct evaluations in the base technical system, then in people and society. Our framework for a base system defines seven categories of social impact: bias, stereotypes, and representational harms; cultural values and sensitive content; disparate performance; privacy and data protection; financial costs; environmental costs; and data and content moderation labor costs. Suggested methods for evaluation apply to all modalities and analyses of the limitations of existing evaluations serve as a starting point for necessary investment in future evaluations. We offer five overarching categories for what is able to be evaluated in society, each with their own subcategories: trustworthiness and autonomy; inequality, marginalization, and violence; concentration of authority; labor and creativity; and ecosystem and environment. Each subcategory includes recommendations for mitigating harm. We are concurrently crafting an evaluation repository for the AI research community to contribute existing evaluations along the given categories. This version will be updated following a CRAFT session at ACM FAccT 2023.
@inproceedings{daume23impact,
title = {Evaluating the Social Impact of Generative AI Systems in Systems and
Society},
author = {Irene Solaiman and Zeerak Talat and William Agnew and Lama Ahmad and
Dylan Baker and Su Lin Blodgett and Daum\'e, III, Hal and Jesse
Dodge and Ellie Evans and Sara Hooker and Yacine Jernite and
Alexandra Sasha Luccioni and Alberto Lusoli and Margaret Mitchell
and Jessica Newman and Marie-Therese Png and Andrew Strait and
Aposotol Vassilev},
booktitle = {Preprint},
year = {2023},
url = {http://hal3.name/docs/#daume23impact},
}
Which Examples Should be Multiply Annotated? Active Learning When Annotators May Disagree
Connor Baumler, Anna Sotnikova and Hal Daumé III
Conference of the Association for Computational Linguistics (ACL), 2023
[Abstract] [BibTeX]
Linguistic annotations, especially for controver-sial topics like hate speech detection, are fre-quently contested due to annotator backgroundsand positionalities. In such situations, pre-serving this disagreement through the machinelearning pipeline can be important for down-stream use cases. However, capturing disagree-ment can increase annotation time and expense.Fortunately, for many tasks, not all examplesare equally controversial; we develop an ac-tive learning approach, Disagreement AwareActive Learning (DAAL) that concentrates an-notations on examples where model entropyand annotator entropy are the most different.Because we cannot know the true entropy of an-notations on unlabeled examples, we estimatea model that predicts annotator entropy trainedusing very few multiply-labeled examples. Wefind that traditional uncertainty-based activelearning underperforms simple passive learn-ing on tasks with high levels of disagreement,but that our active learning approach is able tosuccessfully improve on passive learning, re-ducing the number of annotations required byat least 24\% on average across several datasets.
@inproceedings{daume23daal,
title = {Which Examples Should be Multiply Annotated? Active Learning When
Annotators May Disagree},
author = {Connor Baumler and Anna Sotnikova and Daum\'e, III, Hal},
booktitle = {Proceedings of the Conference of the Association for Computational
Linguistics (ACL)},
year = {2023},
url = {http://hal3.name/docs/#daume23daal},
}
Factual or Contextual? Disentangling Error Types in Entity Description Generation
Navita Goyal, Ani Nenkova and Hal Daumé III
Conference of the Association for Computational Linguistics (ACL), 2023
[Abstract] [BibTeX]
In the task of entity description generation, given a context and a specified entity, a model must describe that entity correctly and in a contextually-relevant way. In this task, as well as broader language generation tasks, the generation of a nonfactual description (factual error) versus an incongruous description (contextual error) is fundamentally different, yet often conflated. We develop an evaluation paradigm that enables us to disentangle these two types of errors in naturally occurring textual contexts. We find that factuality and congruity are often at odds, and that models specifically struggle with accurate descriptions of entities that are less familiar to people. This shortcoming of language models raises concerns around the trustworthiness of such models, since factual errors on less well-known entities are exactly those that a human reader will not recognize.
@inproceedings{daume23factual,
title = {Factual or Contextual? Disentangling Error Types in Entity Description
Generation},
author = {Navita Goyal and Ani Nenkova and Daum\'e, III, Hal},
booktitle = {Proceedings of the Conference of the Association for Computational
Linguistics (ACL)},
year = {2023},
url = {http://hal3.name/docs/#daume23factual},
}
More papers please!
Recent Talks:
AI UK: Doing better in data science – from algorithmic fairness to diversity
Anjali Mazumder, Shakir Mohamed, Danielle Belgrave, Maria De-Arteaga, and Hal Daumé III
The Alan Turing Institute AI UK Roadmap, March 2021
[Video]
Coded Bias Panel Discussion at the University of Maryland
Margrét Bjarnadóttir, Nicol Turner Lee, Deborah Raji, Adam Wenchel, and Hal Daumé III (moderator)
March, 2021
[Video]
Responsible AI Systems and Experiences
Abolfazl Asudeh (moderator), Hal Daumé III, Golnoosh Farnadi, Bernease Herman, Bill Howe (moderator), Yuval Moskovitch, Katie Shilton, and Jenn Wortman Vaughan
Panel at VLDB 2021
[Video]
Tech Ethics in a Changing World
Catherine Bannister, Mary Lacity, Cindy Moehring, and Hal Daumé III
Northwest Arkansas Tech Summit, 2021
[Video]
Language (Technology) Is Power: Exploring the Inherent Complexity of NLP Systems
Hal Daumé III and Sam Charrington (host)
TWIML AI Podcast, 2020
[Video]
More talks please!
Contact information:
email: me AT hal3 DOT name skype: haldaume3
phone: 301-405-1073 twitter: haldaume3
office: IRB 4150 github: hal3
I can't reply to all
prospective students email; please
read this before emailing me.
credits: design and font inspired by Seth Able's LoRD, some images converted to ANSI using ManyTools, original drawing of me by anonymous.
last updated on three july, two thousand twenty three.