You enter a
dark forest. Standing in front of you is:
A professor named Hal Daumé III (he/him).
He wields appointments in
Computer Science where he is a
Volpi-Cupal Professor, as well as
Language Science at
UMD
where he leads the TRAILS, the Institute for Trustworthy AI in Law & Society
(in Fall 2023 he's teaching Trustworthy ML;
past: AI (S23),
Human-AI Interaction (F22),
Just ML (F21)); he is also a Senior Principal Researcher the machine learning and fairness
groups at Microsoft Research NYC.
He and his wonderful advisees
like to study
questions related to how to get machines to becomes more adept at
human language (and artificial intelligence tasks more broadly),
by developing models and algorithms that allow them
to learn from data. (Keywords: natural language processing and machine
learning.)
The two major questions that really drive their research these days are:
(1) how can we get computers to learn
through natural interaction with people/users?
and (2) how can we do this in a way that minimize harms
in the learned models?
He's discussed interactive learning informally in a Talking Machines Podcast
and more technically in recent talks;
and has discussed fairness/bias in broad terms in a (now somewhat outdated) blog post.
He is the author of the online textbook A Course in Machine Learning,
which is fully open source.
Hal is super fortunate to be a member of, and have awesome colleagues in the Computional
Linguistics and Information Processing Lab (which he formerly
directed),
the Human-Computer Interaction Lab,
and the Center for Machine Learning.
If you want to contact him, email is your best bet; you can
also find him on @haldaume3
on Twitter. Or, in person, in his office
(IRB 4150).
If you're a prospective grad student or grad applicant, please read
his FAQ to answer some common questions.
If you're thinking of inviting him for a talk or event, please ensure
that the event is organized in an inclusive manner (inclusion rider).
More generally, if you are organizing a conference, workshop or other
event, you may wish to read the NeurIPS D&I survey
results (joint with Katherine Heller),
Humberto Corona's collection of resources/advice,
or two blog posts on this topic.
I acknowledge that I live and work on the ancestral and unceded lands of the Piscataway People, who were among the first in the Western Hemisphere to encounter European colonists, as well as the lands of the Lenape and Nacotchtank people.
Recent Publications:
ASL Citizen: A Community-Sourced Dataset for Advancing Isolated Sign Language Recognition
Aashaka Desai, Lauren Berger, Fyodor O. Minakov, Vanessa Milan, Chinmay Singh, Kriston Pumphrey, Richard E. Ladner, Hal Daumé III, Alex X. Lu, Naomi Caselli and Danielle Bragg
preprint, 2023
[Abstract] [BibTeX]
Sign languages are used as a primary language by approximately 70 million D/deaf people world-wide. However, most communication technologies operate in spoken and written languages, creating inequities in access. To help tackle this problem, we release ASL Citizen, the largest Isolated Sign Language Recognition (ISLR) dataset to date, collected with consent and containing 83,912 videos for 2,731 distinct signs filmed by 52 signers in a variety of environments. We propose that this dataset be used for sign language dictionary retrieval for American Sign Language (ASL), where a user demonstrates a sign to their own webcam with the aim of retrieving matching signs from a dictionary. We show that training supervised machine learning classifiers with our dataset greatly advances the state-of-the-art on metrics relevant for dictionary retrieval, achieving, for instance, 62\% accuracy and a recall-at-10 of 90\%, evaluated entirely on videos of users who are not present in the training or validation sets.
@inproceedings{daume23aslcitizen,
title = {ASL Citizen: A Community-Sourced Dataset for Advancing Isolated Sign
Language Recognition},
author = {Aashaka Desai and Lauren Berger and Fyodor O. Minakov and Vanessa Milan
and Chinmay Singh and Kriston Pumphrey and Richard E. Ladner and
Daum\'e, III, Hal and Alex X. Lu and Naomi Caselli and Danielle
Bragg},
booktitle = {preprint},
year = {2023},
url = {http://hal3.name/docs/#daume23aslcitizen},
}
Promoting Fairness in Learned Models by Learning to Active Learn under Parity Constraints
Amr Sharaf and Hal Daumé III
FAccT, 2022
[Abstract] [BibTeX]
Machine learning models can have consequential effects when used to automate decisions, and disparities between groups of people in the error rates of those decisions can lead to harms suffered more by some groups than others. Past algorithmic approaches aim to enforce parity across groups given a fixed set of training data; instead, we ask: what if we can gather more data to mitigate disparities? We develop a meta-learning algorithm for parity-constrained active learning that learns a policy to decide which labels to query so as to maximize accuracy subject to parity constraints. To optimize the active learning policy, our proposed algorithm formulates the parity-constrained active learning task as a bi-level optimization problem. The inner level corresponds to training a classifier on a subset of labeled examples. The outer level corresponds to updating the selection policy choosing this subset to achieve a desired fairness and accuracy behavior on the trained classifier. To solve this constrained bi-level optimization problem, we employ the Forward-Backward Splitting optimization method. Empirically, across several parity metrics and classification tasks, our approach outperforms alternatives by a large margin.
@inproceedings{daume22panda,
title = {Promoting Fairness in Learned Models by Learning to Active Learn under
Parity Constraints},
author = {Amr Sharaf and Daum\'e, III, Hal},
booktitle = {FAccT},
year = {2022},
url = {http://hal3.name/docs/#daume22panda},
}
Toward Gender-Inclusive Coreference Resolution: An Analysis of Gender and Bias Throughout the Machine Learning Lifecycle
Yang Trista Cao and Hal Daumé III
Computational Linguistics, 2022
[Abstract] [BibTeX]
Correctly resolving textual mentions of people fundamentally entails making inferences about those people. Such inferences raise the risk of systematic biases in coreference resolution systems, including biases that can harm binary and non-binary trans and cis stakeholders. To better understand such biases, we foreground nuanced conceptualizations of gender from sociology and sociolinguistics, and investigate where in the machine learning pipeline such biases can enter a coreference resolution system. We inspect many existing data sets for trans-exclusionary biases, and develop two new data sets for interrogating bias in both crowd annotations and in existing coreference resolution systems. Through these studies, conducted on English text, we confirm that without acknowledging and building systems that recognize the complexity of gender, we will build systems that fail for: quality of service …
@article{daume22coref,
title = {Toward Gender-Inclusive Coreference Resolution: An Analysis of Gender
and Bias Throughout the Machine Learning Lifecycle},
author = {Yang Trista Cao and Daum\'e, III, Hal},
journal = {Computational Linguistics},
year = {2022},
url = {http://hal3.name/docs/#daume22coref},
}
Heterogeneous Supervised Topic Models
Dhanya Sridhar, Hal Daumé III and David Blei
TACL, 2022
[Abstract] [BibTeX]
Researchers in the social sciences are often interested in the relationship between text and an outcome of interest, where the goal is to both uncover latent patterns in the text and predict outcomes for unseen texts. To this end, this paper develops the heterogeneous supervised topic models (HSTM), a probabilistic approach to text analysis and prediction. HSTMs posit a joint model of text and outcomes to find heterogeneous patterns that help with both text analysis and prediction. The main benefit of HSTMs is that they capture heterogeneity in the relationship between text and the outcome across latent topics. To fit HSTMs, we develop a variational inference algorithm based on the auto-encoding variational Bayes framework. We study the performance of HSTMs on eight datasets and find that they consistently outperform related methods, including fine-tuned black box models. Finally, we apply HSTMs to analyze news articles labeled with pro- or anti-tone. We find evidence of differing language used to signal a pro- and anti-tone.
@article{daume22hstm,
title = {Heterogeneous Supervised Topic Models},
author = {Dhanya Sridhar and Daum\'e, III, Hal and David Blei},
journal = {TACL},
year = {2022},
url = {http://hal3.name/docs/#daume22hstm},
}
What's Different between Visual Question Answering for Machine "Understanding" Versus for Accessibility?
Yang Trista Cao, Kyle Seelman, Kyungjun Lee and Hal Daumé III
AACL-IJCNLP, 2022
🏆 Best Theme Paper
[Abstract] [BibTeX]
In visual question answering (VQA), a machine must answer a question given an associated image. Recently, accessibility researchers have explored whether VQA can be deployed in a real-world setting where users with visual impairments learn about their environment by capturing their visual surroundings and asking questions. However, most of the existing benchmarking datasets for VQA focus on machine “understanding” and it remains unclear how progress on those datasets corresponds to improvements in this real-world use case. We aim to answer this question by evaluating discrepancies between machine “understanding” datasets (VQA-v2) and accessibility datasets (VizWiz) by evaluating a variety of VQA models. Based on our findings, we discuss opportunities and challenges in VQA for accessibility and suggest directions for future work.
@inproceedings{daume22vqa,
title = {What's Different between Visual Question Answering for Machine
"Understanding" Versus for Accessibility?},
author = {Yang Trista Cao and Kyle Seelman and Kyungjun Lee and Daum\'e, III,
Hal},
booktitle = {AACL-IJCNLP},
year = {2022},
url = {http://hal3.name/docs/#daume22vqa},
}
More papers please!
Recent Talks:
AI UK: Doing better in data science – from algorithmic fairness to diversity
Anjali Mazumder, Shakir Mohamed, Danielle Belgrave, Maria De-Arteaga, and Hal Daumé III
The Alan Turing Institute AI UK Roadmap, March 2021
[Video]
Coded Bias Panel Discussion at the University of Maryland
Margrét Bjarnadóttir, Nicol Turner Lee, Deborah Raji, Adam Wenchel, and Hal Daumé III (moderator)
March, 2021
[Video]
Responsible AI Systems and Experiences
Abolfazl Asudeh (moderator), Hal Daumé III, Golnoosh Farnadi, Bernease Herman, Bill Howe (moderator), Yuval Moskovitch, Katie Shilton, and Jenn Wortman Vaughan
Panel at VLDB 2021
[Video]
Tech Ethics in a Changing World
Catherine Bannister, Mary Lacity, Cindy Moehring, and Hal Daumé III
Northwest Arkansas Tech Summit, 2021
[Video]
Language (Technology) Is Power: Exploring the Inherent Complexity of NLP Systems
Hal Daumé III and Sam Charrington (host)
TWIML AI Podcast, 2020
[Video]
More talks please!
Contact information:
email: me AT hal3 DOT name skype: haldaume3
phone: 301-405-1073 twitter: haldaume3
office: IRB 4150 github: hal3
I can't reply to all
prospective students email; please
read this before emailing me.
credits: design and font inspired by Seth Able's LoRD, some images converted to ANSI using ManyTools, original drawing of me by anonymous.
last updated on seven may, two thousand twenty three.