You enter a
dark forest. Standing in front of you is:
A professor named Hal Daumé III (he/him).
He wields appointments in
Computer Science where he is a
Volpi-Cupal Professor, as well as
Language Science at
UMD
where he leads the TRAILS, the Institute for Trustworthy AI in Law & Society
(in Spring 2024 he's teaching a gen-ed course You and I, and Generative AI;
past: Trustworthy ML (F23), AI (S23),
Human-AI Interaction (F22),
Just ML (F21)); he is also a Senior Principal Researcher the machine learning and fairness
groups at Microsoft Research NYC.
He and his wonderful advisees
like to study
questions related to how to get machines to becomes more adept at
human language (and artificial intelligence tasks more broadly),
by developing models and algorithms that allow them
to learn from data. (Keywords: natural language processing and machine
learning.)
The two major questions that really drive their research these days are:
(1) how can we get computers to learn
through natural interaction with people/users?
and (2) how can we do this in a way that minimize harms
in the learned models?
He's discussed interactive learning informally in a Talking Machines Podcast
and more technically in recent talks;
and has discussed fairness/bias in broad terms in a (now somewhat outdated) blog post.
He is the author of the online textbook A Course in Machine Learning,
which is fully open source.
Hal is super fortunate to be a member of, and have awesome colleagues in the Computional
Linguistics and Information Processing Lab (which he formerly
directed),
the Human-Computer Interaction Lab,
and the Center for Machine Learning.
If you want to contact him, email is your best bet; you can
also find him on @haldaume3
on Twitter. Or, in person, in his office
(IRB 4134).
If you're a prospective grad student or grad applicant, please read
his FAQ to answer some common questions.
If you're thinking of inviting him for a talk or event, please ensure
that the event is organized in an inclusive manner (inclusion rider).
More generally, if you are organizing a conference, workshop or other
event, you may wish to read the NeurIPS D&I survey
results (joint with Katherine Heller),
Humberto Corona's collection of resources/advice,
or two blog posts on this topic.
I acknowledge that I live and work on the ancestral and unceded lands of the Piscataway People, who were among the first in the Western Hemisphere to encounter European colonists, as well as the lands of the Lenape and Nacotchtank people.
Recent Publications:
Large Language Models Help Humans Verify Truthfulness -- Except When They Are Convincingly Wrong
Chenglei Si, Navita Goyal, Sherry Tongshuang Wu, Chen Zhao, Shi Feng, Hal Daumé III and Jordan Boyd-Graber
NAACL, 2024
[Abstract] [BibTeX]
Large Language Models (LLMs) are increasingly used for accessing information on the web. Their truthfulness and factuality are thus of great interest. To help users make the right decisions about the information they're getting, LLMs should not only provide but also help users fact-check information. In this paper, we conduct experiments with 80 crowdworkers in total to compare language models with search engines (information retrieval systems) at facilitating fact-checking by human users. We prompt LLMs to validate a given claim and provide corresponding explanations. Users reading LLM explanations are significantly more efficient than using search engines with similar accuracy. However, they tend to over-rely the LLMs when the explanation is wrong. To reduce over-reliance on LLMs, we ask LLMs to provide contrastive information - explain both why the claim is true and false, and then we present both sides of the explanation to users. This contrastive explanation mitigates users' over-reliance on LLMs, but cannot significantly outperform search engines. However, showing both search engine results and LLM explanations offers no complementary benefits as compared to search engines alone. Taken together, natural language explanations by LLMs may not be a reliable replacement for reading the retrieved passages yet, especially in high-stakes settings where over-relying on wrong AI explanations could lead to critical consequences.
@inproceedings{daume24truthfulness,
title = {Large Language Models Help Humans Verify Truthfulness -- Except When
They Are Convincingly Wrong},
author = {Chenglei Si and Navita Goyal and Sherry Tongshuang Wu and Chen Zhao and
Shi Feng and Daum\'e, III, Hal and Jordan Boyd-Graber},
booktitle = {NAACL},
year = {2024},
url = {http://hal3.name/docs/#daume24truthfulness},
}
How do Authors' Perceptions of their Papers Compare with Co-authors' Perceptions and Peer-review Decisions?
Charvi Rastogi, Ivan Stelmakh, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, Jennifer Wortman Vaughan, Zhenyu Xue, Hal Daumé III, Emma Pierson and Nihar B. Shah
PLOS One, 2024
[Abstract] [BibTeX]
How do author perceptions match up to the outcomes of the peer-review process and perceptions of others? In a top-tier computer science conference (NeurIPS 2021) with more than 23,000 submitting authors and 9,000 submitted papers, we survey the authors on three questions: (i) their predicted probability of acceptance for each of their papers, (ii) their perceived ranking of their own papers based on scientific contribution, and (iii) the change in their perception about their own papers after seeing the reviews. The salient results are: (1) Authors have roughly a three-fold overestimate of the acceptance probability of their papers: The median prediction is 70% for an approximately 25% acceptance rate. (2) Female authors exhibit a marginally higher (statistically significant) miscalibration than male authors; predictions of authors invited to serve as meta-reviewers or reviewers are similarly calibrated, but better than authors who were not invited to review. (3) Authors' relative ranking of scientific contribution of two submissions they made generally agree (93%) with their predicted acceptance probabilities, but there is a notable 7% responses where authors think their better paper will face a worse outcome. (4) The author-provided rankings disagreed with the peer-review decisions about a third of the time; when co-authors ranked their jointly authored papers, co-authors disagreed at a similar rate -- about a third of the time. (5) At least 30% of respondents of both accepted and rejected papers said that their perception of their own paper improved after the review process. The stakeholders in peer review should take these findings into account in setting their expectations from peer review.
@inproceedings{daume24perceptions,
title = {How do Authors' Perceptions of their Papers Compare with Co-authors'
Perceptions and Peer-review Decisions?},
author = {Charvi Rastogi and Ivan Stelmakh and Alina Beygelzimer and Yann N.
Dauphin and Percy Liang and Jennifer Wortman Vaughan and Zhenyu
Xue and Daum\'e, III, Hal and Emma Pierson and Nihar B. Shah},
booktitle = {PLOS One},
year = {2024},
url = {http://hal3.name/docs/#daume24perceptions},
}
A Randomized Controlled Trial on Anonymizing Reviewers to Each Other in Peer Review Discussions
Charvi Rastogi, Xiangchen Song, Zhijing Jin, Ivan Stelmakh, Hal Daumé III, Kun Zhang and Nihar B. Shah
PLOS One, 2024
[Abstract] [BibTeX]
Peer review often involves reviewers submitting their independent reviews, followed by a discussion among reviewers of each paper. A question among policymakers is whether the reviewers of a paper should be anonymous to each other during the discussion. We shed light on this by conducting a randomized controlled trial at the UAI 2022 conference. We randomly split the reviewers and papers into two conditions--one with anonymous discussions and the other with non-anonymous discussions, and conduct an anonymous survey of all reviewers, to address the following questions: 1. Do reviewers discuss more in one of the conditions? Marginally more in anonymous (n = 2281, p = 0.051). 2. Does seniority have more influence on final decisions when non-anonymous? Yes, the decisions are closer to senior reviewers' scores in the non-anonymous condition than in anonymous (n = 484, p = 0.04). 3. Are reviewers more polite in one of the conditions? No significant difference in politeness of reviewers' text-based responses (n = 1125, p = 0.72). 4. Do reviewers' self-reported experiences differ across the two conditions? No significant difference for each of the five questions asked (n = 132 and p > 0.3). 5. Do reviewers prefer one condition over the other? Yes, there is a weak preference for anonymous discussions (n = 159 and Cohen's d= 0.25). 6. What do reviewers consider important to make policy on anonymity among reviewers? Reviewers' feeling of safety in expressing their opinions was rated most important, while polite communication among reviewers was rated least important (n = 159). 7. Have reviewers experienced dishonest behavior due to non-anonymity in discussions? Yes, roughly 7% of respondents answered affirmatively (n = 167). Overall, this experiment reveals evidence supporting an anonymous discussion setup in the peer-review process, in terms of the evaluation criteria considered.
@inproceedings{daume24anonreview,
title = {A Randomized Controlled Trial on Anonymizing Reviewers to Each Other in
Peer Review Discussions},
author = {Charvi Rastogi and Xiangchen Song and Zhijing Jin and Ivan Stelmakh and
Daum\'e, III, Hal and Kun Zhang and Nihar B. Shah},
booktitle = {PLOS One},
year = {2024},
url = {http://hal3.name/docs/#daume24anonreview},
}
Multilingual large language models leak human stereotypes across language boundaries
Yang Trista Cao, Anna Sotnikova, Jieyu Zhao, Linda X. Zou, Rachel Rudinger and Hal Daumé III
Preprint, 2024
[Abstract] [BibTeX]
Multilingual large language models have been increasingly popular for their proficiency in processing and generating text across various languages. Previous research has shown that the presence of stereotypes and biases in monolingual large language models can be attributed to the nature of their training data, which is collected from humans and reflects societal biases. Multilingual language models undergo the same training procedure as monolingual ones, albeit with training data sourced from various languages. This raises the question: do stereotypes present in one social context leak across languages within the model? In our work, we first define the term "stereotype leakage" and propose a framework for its measurement. With this framework, we investigate how stereotypical associations leak across four languages: English, Russian, Chinese, and Hindi. To quantify the stereotype leakage, we employ an approach from social psychology, measuring stereotypes via group-trait associations. We evaluate human stereotypes and stereotypical associations manifested in multilingual large language models such as mBERT, mT5, and GPT-3.5. Our findings show a noticeable leakage of positive, negative, and non-polar associations across all languages. Notably, Hindi within multilingual models appears to be the most susceptible to influence from other languages, while Chinese is the least. Additionally, GPT-3.5 exhibits a better alignment with human scores than other models. WARNING: This paper contains model outputs which could be offensive in nature.
@inproceedings{daume24leakage,
title = {Multilingual large language models leak human stereotypes across
language boundaries},
author = {Yang Trista Cao and Anna Sotnikova and Jieyu Zhao and Linda X. Zou and
Rachel Rudinger and Daum\'e, III, Hal},
booktitle = {Preprint},
year = {2024},
url = {http://hal3.name/docs/#daume24leakage},
}
Premier-TACO is a Few-Shot Policy Learner: Pretraining Multitask Representation via Temporal Action-Driven Contrastive Loss
Ruijie Zheng, Yongyuan Liang, Xiyao Wang, Shuang Ma, Hal Daumé III, Huazhe Xu, John Langford, Praveen Palanisamy, Kalyan Shankar Basu and Furong Huang
International Conference on Machine Learning (ICML), 2024
[Abstract] [BibTeX]
We present Premier-TACO, a multitask feature representation learning approach designed to improve few-shot policy learning efficiency in sequential decision-making tasks. Premier-TACO leverages a subset of multitask offline datasets for pretraining a general feature representation, which captures critical environmental dynamics and is fine-tuned using minimal expert demonstrations. It advances the temporal action contrastive learning (TACO) objective, known for state-of-the-art results in visual control tasks, by incorporating a novel negative example sampling strategy. This strategy is crucial in significantly boosting TACO's computational efficiency, making large-scale multitask offline pretraining feasible. Our extensive empirical evaluation in a diverse set of continuous control benchmarks including Deepmind Control Suite, MetaWorld, and LIBERO demonstrate Premier-TACO's effectiveness in pretraining visual representations, significantly enhancing few-shot imitation learning of novel tasks. Our code, pretraining data, as well as pretrained model checkpoints will be released at this https URL. Our project webpage is at this https URL.
@inproceedings{daume24premier,
title = {Premier-TACO is a Few-Shot Policy Learner: Pretraining Multitask
Representation via Temporal Action-Driven Contrastive Loss},
author = {Ruijie Zheng and Yongyuan Liang and Xiyao Wang and Shuang Ma and
Daum\'e, III, Hal and Huazhe Xu and John Langford and Praveen
Palanisamy and Kalyan Shankar Basu and Furong Huang},
booktitle = {Proceedings of the International Conference on Machine Learning
(ICML)},
year = {2024},
url = {http://hal3.name/docs/#daume24premier},
}
More papers please!
Recent Talks:
AI UK: Doing better in data science – from algorithmic fairness to diversity
Anjali Mazumder, Shakir Mohamed, Danielle Belgrave, Maria De-Arteaga, and Hal Daumé III
The Alan Turing Institute AI UK Roadmap, March 2021
[Video]
Coded Bias Panel Discussion at the University of Maryland
Margrét Bjarnadóttir, Nicol Turner Lee, Deborah Raji, Adam Wenchel, and Hal Daumé III (moderator)
March, 2021
[Video]
Responsible AI Systems and Experiences
Abolfazl Asudeh (moderator), Hal Daumé III, Golnoosh Farnadi, Bernease Herman, Bill Howe (moderator), Yuval Moskovitch, Katie Shilton, and Jenn Wortman Vaughan
Panel at VLDB 2021
[Video]
Tech Ethics in a Changing World
Catherine Bannister, Mary Lacity, Cindy Moehring, and Hal Daumé III
Northwest Arkansas Tech Summit, 2021
[Video]
Language (Technology) Is Power: Exploring the Inherent Complexity of NLP Systems
Hal Daumé III and Sam Charrington (host)
TWIML AI Podcast, 2020
[Video]
More talks please!
Contact information:
email: me AT hal3 DOT name skype: haldaume3
phone: 301-405-1073 twitter: haldaume3
office: IRB 4150 github: hal3
I can't reply to all
prospective students email; please
read this before emailing me.
credits: design and font inspired by Seth Able's LoRD, some images converted to ANSI using ManyTools, original drawing of me by anonymous.
last updated on twelve may, two thousand twenty four.