You enter a
dark forest. Standing in front of you is:
A professor named Hal Daumé III (he/him).
He wields appointments in
Computer Science where he is a
Perotto Professor, as well as
Language Science at
UMD (
in Spring 2023 he's teaching AI;
in Fall 2022 he taught Human-AI Interaction and
in Fall 2021, Just
Machine Learning); he is also a Senior Principal Researcher the machine learning and fairness
groups at Microsoft Research NYC.
He and his wonderful advisees
like to study
questions related to how to get machines to becomes more adept at
human language (and artificial intelligence tasks more broadly),
by developing models and algorithms that allow them
to learn from data. (Keywords: natural language processing and machine
learning.)
The two major questions that really drive their research these days are:
(1) how can we get computers to learn
through natural interaction with people/users?
and (2) how can we do this in a way that minimize harms
in the learned models?
He's discussed interactive learning informally in a Talking Machines Podcast
and more technically in recent talks;
and has discussed fairness/bias in broad terms in a (now somewhat outdated) blog post.
He is the author of the online textbook A Course in Machine Learning,
which is fully open source.
Hal is super fortunate to be a member of, and have awesome colleagues in the Computional
Linguistics and Information Processing Lab (which he formerly
directed),
the Human-Computer Interaction Lab,
and the Center for Machine Learning.
If you want to contact him, email is your best bet; you can
also find him on @haldaume3
on Twitter. Or, in person, in his office
(IRB 4150).
If you're a prospective grad student or grad applicant, please read
his FAQ to answer some common questions.
If you're thinking of inviting him for a talk or event, please ensure
that the event is organized in an inclusive manner (inclusion rider).
More generally, if you are organizing a conference, workshop or other
event, you may wish to read the NeurIPS D&I survey
results (joint with Katherine Heller),
Humberto Corona's collection of resources/advice,
or two blog posts on this topic.
I acknowledge that I live and work on the ancestral and unceded lands of the Piscataway People, who were among the first in the Western Hemisphere to encounter European colonists, as well as the lands of the Lenape and Nacotchtank people.
Recent Publications:
Promoting Fairness in Learned Models by Learning to Active Learn under Parity Constraints
Amr Sharaf and Hal Daumé III
FAccT, 2022
[Abstract] [BibTeX]
Machine learning models can have consequential effects when used to automate decisions, and disparities between groups of people in the error rates of those decisions can lead to harms suffered more by some groups than others. Past algorithmic approaches aim to enforce parity across groups given a fixed set of training data; instead, we ask: what if we can gather more data to mitigate disparities? We develop a meta-learning algorithm for parity-constrained active learning that learns a policy to decide which labels to query so as to maximize accuracy subject to parity constraints. To optimize the active learning policy, our proposed algorithm formulates the parity-constrained active learning task as a bi-level optimization problem. The inner level corresponds to training a classifier on a subset of labeled examples. The outer level corresponds to updating the selection policy choosing this subset to achieve a desired fairness and accuracy behavior on the trained classifier. To solve this constrained bi-level optimization problem, we employ the Forward-Backward Splitting optimization method. Empirically, across several parity metrics and classification tasks, our approach outperforms alternatives by a large margin.
@inproceedings{daume22panda,
title = {Promoting Fairness in Learned Models by Learning to Active Learn under
Parity Constraints},
author = {Amr Sharaf and Daum\'e, III, Hal},
booktitle = {FAccT},
year = {2022},
url = {http://hal3.name/docs/#daume22panda},
}
Toward Gender-Inclusive Coreference Resolution: An Analysis of Gender and Bias Throughout the Machine Learning Lifecycle
Yang Trista Cao and Hal Daumé III
Computational Linguistics, 2022
[Abstract] [BibTeX]
Correctly resolving textual mentions of people fundamentally entails making inferences about those people. Such inferences raise the risk of systematic biases in coreference resolution systems, including biases that can harm binary and non-binary trans and cis stakeholders. To better understand such biases, we foreground nuanced conceptualizations of gender from sociology and sociolinguistics, and investigate where in the machine learning pipeline such biases can enter a coreference resolution system. We inspect many existing data sets for trans-exclusionary biases, and develop two new data sets for interrogating bias in both crowd annotations and in existing coreference resolution systems. Through these studies, conducted on English text, we confirm that without acknowledging and building systems that recognize the complexity of gender, we will build systems that fail for: quality of service …
@article{daume22coref,
title = {Toward Gender-Inclusive Coreference Resolution: An Analysis of Gender
and Bias Throughout the Machine Learning Lifecycle},
author = {Yang Trista Cao and Daum\'e, III, Hal},
journal = {Computational Linguistics},
year = {2022},
url = {http://hal3.name/docs/#daume22coref},
}
Heterogeneous Supervised Topic Models
Dhanya Sridhar, Hal Daumé III and David Blei
TACL, 2022
[Abstract] [BibTeX]
Researchers in the social sciences are often interested in the relationship between text and an outcome of interest, where the goal is to both uncover latent patterns in the text and predict outcomes for unseen texts. To this end, this paper develops the heterogeneous supervised topic models (HSTM), a probabilistic approach to text analysis and prediction. HSTMs posit a joint model of text and outcomes to find heterogeneous patterns that help with both text analysis and prediction. The main benefit of HSTMs is that they capture heterogeneity in the relationship between text and the outcome across latent topics. To fit HSTMs, we develop a variational inference algorithm based on the auto-encoding variational Bayes framework. We study the performance of HSTMs on eight datasets and find that they consistently outperform related methods, including fine-tuned black box models. Finally, we apply HSTMs to analyze news articles labeled with pro- or anti-tone. We find evidence of differing language used to signal a pro- and anti-tone.
@article{daume22hstm,
title = {Heterogeneous Supervised Topic Models},
author = {Dhanya Sridhar and Daum\'e, III, Hal and David Blei},
journal = {TACL},
year = {2022},
url = {http://hal3.name/docs/#daume22hstm},
}
What's Different between Visual Question Answering for Machine "Understanding" Versus for Accessibility?
Yang Trista Cao, Kyle Seelman, Kyungjun Lee and Hal Daumé III
AACL-IJCNLP, 2022
🏆 Best Theme Paper
[Abstract] [BibTeX]
In visual question answering (VQA), a machine must answer a question given an associated image. Recently, accessibility researchers have explored whether VQA can be deployed in a real-world setting where users with visual impairments learn about their environment by capturing their visual surroundings and asking questions. However, most of the existing benchmarking datasets for VQA focus on machine “understanding” and it remains unclear how progress on those datasets corresponds to improvements in this real-world use case. We aim to answer this question by evaluating discrepancies between machine “understanding” datasets (VQA-v2) and accessibility datasets (VizWiz) by evaluating a variety of VQA models. Based on our findings, we discuss opportunities and challenges in VQA for accessibility and suggest directions for future work.
@inproceedings{daume22vqa,
title = {What's Different between Visual Question Answering for Machine
"Understanding" Versus for Accessibility?},
author = {Yang Trista Cao and Kyle Seelman and Kyungjun Lee and Daum\'e, III,
Hal},
booktitle = {AACL-IJCNLP},
year = {2022},
url = {http://hal3.name/docs/#daume22vqa},
}
A framework for learning to request rich and contextually useful information from humans
Khanh Nguyen, Yonatan Bisk and Hal Daumé III
International Conference on Machine Learning (ICML), 2022
[Abstract] [BibTeX]
When deployed, AI agents will encounter problems that are beyond their autonomous problem-solving capabilities. Leveraging human assistance can help agents overcome their inherent limitations and robustly cope with unfamiliar situations. We present a general interactive framework that enables an agent to request and interpret rich, contextually useful information from an assistant that has knowledge about the task and the environment. We demonstrate the practicality of our framework on a simulated human-assisted navigation problem. Aided with an assistance-requesting policy learned by our method, a navigation agent achieves up to a 7× improvement in success rate on tasks that take place in previously unseen environments, compared to fully autonomous behavior. We show that the agent can take advantage of different types of information depending on the context, and analyze the benefits and challenges of learning the assistance-requesting policy when the assistant can recursively decompose tasks into subtasks.
@inproceedings{daume22request,
title = {A framework for learning to request rich and contextually useful
information from humans},
author = {Khanh Nguyen and Yonatan Bisk and Daum\'e, III, Hal},
booktitle = {Proceedings of the International Conference on Machine Learning
(ICML)},
year = {2022},
url = {http://hal3.name/docs/#daume22request},
}
More papers please!
Recent Talks:
AI UK: Doing better in data science – from algorithmic fairness to diversity
Anjali Mazumder, Shakir Mohamed, Danielle Belgrave, Maria De-Arteaga, and Hal Daumé III
The Alan Turing Institute AI UK Roadmap, March 2021
[Video]
Coded Bias Panel Discussion at the University of Maryland
Margrét Bjarnadóttir, Nicol Turner Lee, Deborah Raji, Adam Wenchel, and Hal Daumé III (moderator)
March, 2021
[Video]
Responsible AI Systems and Experiences
Abolfazl Asudeh (moderator), Hal Daumé III, Golnoosh Farnadi, Bernease Herman, Bill Howe (moderator), Yuval Moskovitch, Katie Shilton, and Jenn Wortman Vaughan
Panel at VLDB 2021
[Video]
Tech Ethics in a Changing World
Catherine Bannister, Mary Lacity, Cindy Moehring, and Hal Daumé III
Northwest Arkansas Tech Summit, 2021
[Video]
Language (Technology) Is Power: Exploring the Inherent Complexity of NLP Systems
Hal Daumé III and Sam Charrington (host)
TWIML AI Podcast, 2020
[Video]
More talks please!
Contact information:
email: me AT hal3 DOT name skype: haldaume3
phone: 301-405-1073 twitter: haldaume3
office: IRB 4150 github: hal3
I can't reply to all
prospective students email; please
read this before emailing me.
credits: design and font inspired by Seth Able's LoRD, some images converted to ANSI using ManyTools, original drawing of me by anonymous.
last updated on twenty three february, two thousand twenty three.