Natural Language Processing
Understanding the Impacts of Language Technologies’ Performance Disparities on African American Language Speakers
Jay Cunningham, Su Lin Blodgett, Michael Madaio, Hal Daumé III, Christina Harrington and Hanna Wallach
Conference of the Association for Computational Linguistics (ACL), 2024
[Abstract] [BibTeX]
This paper examines the experiences of African American Language (AAL) speakers when using language technologies. Previous work has used quantitative methods to uncover performance disparities between AAL speakers and White Mainstream English speakers when using language technologies, but has not sought to understand the impacts of these performance disparities on AAL speakers. Through interviews with 19 AAL speakers, we focus on understanding such impacts in a contextualized and human-centered manner. We find that AAL speakers often undertake invisible labor of adapting their speech patterns to successfully use language technologies, and they make connections between failures of language technologies for AAL speakers and a lack of inclusion of AAL speakers in language technology design processes and datasets. Our findings suggest that NLP researchers and practitioners should invest in developing contextualized and human-centered evaluations of language technologies that seek to understand the impacts of performance disparities on speakers of underrepresented languages and language varieties.
@inproceedings{daume24aal,
title = {Understanding the Impacts of Language Technologies’ Performance
Disparities on African American Language Speakers},
author = {Jay Cunningham and Su Lin Blodgett and Michael Madaio and Daum\'e, III,
Hal and Christina Harrington and Hanna Wallach},
booktitle = {Proceedings of the Conference of the Association for Computational
Linguistics (ACL)},
year = {2024},
url = {http://hal3.name/docs/#daume24aal},
}
What Else Do I Need to Know? The Effect of Background Information on Users' Reliance on QA Systems
Navita Goyal, Eleftheria Briakou, Amanda Liu, Connor Baumler, Claire Bonial, Jeffrey Micher, Clare R. Voss, Marine Carpuat and Hal Daumé III
EMNLP, 2023
[Abstract] [BibTeX]
NLP systems have shown impressive performance at answering questions by retrieving relevant context. However, with the increasingly large models, it is impossible and often undesirable to constrain models' knowledge or reasoning to only the retrieved context. This leads to a mismatch between the information that the models access to derive the answer and the information that is available to the user to assess the model predicted answer. In this work, we study how users interact with QA systems in the absence of sufficient information to assess their predictions. Further, we ask whether adding the requisite background helps mitigate users' over-reliance on predictions. Our study reveals that users rely on model predictions even in the absence of sufficient information needed to assess the model's correctness. Providing the relevant background, however, helps users better catch model errors, reducing over-reliance on incorrect predictions. On the flip side, background information also increases users' confidence in their accurate as well as inaccurate judgments. Our work highlights that supporting users' verification of QA predictions is an important, yet challenging, problem.
@inproceedings{daume23background,
title = {What Else Do I Need to Know? The Effect of Background Information on
Users' Reliance on QA Systems},
author = {Navita Goyal and Eleftheria Briakou and Amanda Liu and Connor Baumler
and Claire Bonial and Jeffrey Micher and Clare R. Voss and Marine
Carpuat and Daum\'e, III, Hal},
booktitle = {EMNLP},
year = {2023},
url = {http://hal3.name/docs/#daume23background},
}
Hallucination Detection for Grounded Instruction Generation
Lingjun Zhao, Khanh Nguyen and Hal Daumé III
EMNLP (Findings), 2023
[Abstract] [BibTeX]
We investigate the problem of generating instructions to guide humans to navigate in simulated residential environments. A major issue with current models is hallucination: they generate references to actions or objects that are inconsistent with what a human follower would perform or encounter along the described path. We develop a model that detects these hallucinated references by adopting a model pre-trained on a large corpus of image-text pairs, and fine-tuning it with a contrastive loss that separates correct instructions from instructions containing synthesized hallucinations. Our final model outperforms several baselines, including using word probability estimated by the instruction-generation model, and supervised models based on LSTM and Transformer.
@inproceedings{daume23hallucination,
title = {Hallucination Detection for Grounded Instruction Generation},
author = {Lingjun Zhao and Khanh Nguyen and Daum\'e, III, Hal},
booktitle = {EMNLP (Findings)},
year = {2023},
url = {http://hal3.name/docs/#daume23hallucination},
}
ASL Citizen: A Community-Sourced Dataset for Advancing Isolated Sign Language Recognition
Aashaka Desai, Lauren Berger, Fyodor O. Minakov, Vanessa Milan, Chinmay Singh, Kriston Pumphrey, Richard E. Ladner, Hal Daumé III, Alex X. Lu, Naomi Caselli and Danielle Bragg
NeurIPS (Data \& Benchmarks track), 2023
[Abstract] [BibTeX]
Sign languages are used as a primary language by approximately 70 million D/deaf people world-wide. However, most communication technologies operate in spoken and written languages, creating inequities in access. To help tackle this problem, we release ASL Citizen, the largest Isolated Sign Language Recognition (ISLR) dataset to date, collected with consent and containing 83,912 videos for 2,731 distinct signs filmed by 52 signers in a variety of environments. We propose that this dataset be used for sign language dictionary retrieval for American Sign Language (ASL), where a user demonstrates a sign to their own webcam with the aim of retrieving matching signs from a dictionary. We show that training supervised machine learning classifiers with our dataset greatly advances the state-of-the-art on metrics relevant for dictionary retrieval, achieving, for instance, 62\% accuracy and a recall-at-10 of 90\%, evaluated entirely on videos of users who are not present in the training or validation sets.
@inproceedings{daume23aslcitizen,
title = {ASL Citizen: A Community-Sourced Dataset for Advancing Isolated Sign
Language Recognition},
author = {Aashaka Desai and Lauren Berger and Fyodor O. Minakov and Vanessa Milan
and Chinmay Singh and Kriston Pumphrey and Richard E. Ladner and
Daum\'e, III, Hal and Alex X. Lu and Naomi Caselli and Danielle
Bragg},
booktitle = {NeurIPS (Data \& Benchmarks track)},
year = {2023},
url = {http://hal3.name/docs/#daume23aslcitizen},
}
A Rose by Any Other Name would not Smell as Sweet: Social Bias in Name Mistranslations
Sandra Sandoval, Jieyu Zhao, Marine Carpuat and Hal Daumé III
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
[Abstract] [BibTeX]
We ask the question: Are there widespread disparities in machine translations of names across race/ethnicity, and gender? We hypothesize that the translation quality of names and surrounding context will be lower for names associated with US racial and ethnic minorities due to these systems’ tendencies to standardize language to predominant language patterns. We develop a dataset of names that are strongly demographically aligned and propose a translation evaluation procedure based on round-trip translation. We analyze the effect of name demographics on translation quality using generalized linear mixed effects models and find that the ability of translation systems to correctly translate female-associated names is significantly lower than male-associated names. This effect is particularly pronounced for femaleassociated names that are also associated with racial (Black) and ethnic (Hispanic) minorities. This disparity in translation quality between social groups for something as personal as someone’s name has significant implications for people’s professional, personal and cultural identities, self-worth and ease of communication. Our findings suggest that more MT research is needed to improve the translation of names and to provide high-quality service for users regardless of gender, race, and ethnicity.
@inproceedings{daume23rose,
title = {A Rose by Any Other Name would not Smell as Sweet: Social Bias in Name
Mistranslations},
author = {Sandra Sandoval and Jieyu Zhao and Marine Carpuat and Daum\'e, III,
Hal},
booktitle = {Proceedings of the Conference on Empirical Methods in Natural
Language Processing (EMNLP)},
year = {2023},
url = {http://hal3.name/docs/#daume23rose},
}
It Takes Two to Tango: Navigating Conceptualizations of NLP Tasks and Measurements of Performance
Arjun Subramonian, Xingdi Yuan, Hal Daumé III and Su Lin Blodgett
Conference of the Association for Computational Linguistics (ACL), 2023
[Abstract] [BibTeX]
Progress in NLP is increasingly measured through benchmarks; hence, contextualizing progress requires understanding when and why practitioners may disagree about the validity of benchmarks. We develop a taxonomy of disagreement, drawing on tools from measurement modeling, and distinguish between two types of disagreement: 1) how tasks are conceptualized and 2) how measurements of model performance are operationalized. To provide evidence for our taxonomy, we conduct a meta-analysis of relevant literature to understand how NLP tasks are conceptualized, as well as a survey of practitioners about their impressions of different factors that affect benchmark validity. Our meta-analysis and survey across eight tasks, ranging from coreference resolution to question answering, uncover that tasks are generally not clearly and consistently conceptualized and benchmarks suffer from operationalization disagreements. These findings support our proposed taxonomy of disagreement. Finally, based on our taxonomy, we present a framework for constructing benchmarks and documenting their limitations.
@inproceedings{daume23conceptualizations,
title = {It Takes Two to Tango: Navigating Conceptualizations of NLP Tasks and
Measurements of Performance},
author = {Arjun Subramonian and Xingdi Yuan and Daum\'e, III, Hal and Su Lin
Blodgett},
booktitle = {Proceedings of the Conference of the Association for Computational
Linguistics (ACL)},
year = {2023},
url = {http://hal3.name/docs/#daume23conceptualizations},
}
FairPrism: Evaluating Fairness-Related Harms in Text Generation
Eve Fleisig, Aubrie Amstutz, Chad Atalla, Su Lin Blodgett, Hal Daumé III, Alexandra Olteanu, Emily Sheng, Dan Vann and Hanna Wallach
Conference of the Association for Computational Linguistics (ACL), 2023
[Abstract] [BibTeX]
It is critical to measure and mitigate fairness- related harms caused by AI text generation systems, including stereotyping and demeaning harms. To that end, we introduce FairPrism, a dataset of 5,000 examples of AI-generated English text with detailed human annotations covering a diverse set of harms relating to gender and sexuality. FairPrism aims to address several limitations of existing datasets for measuring and mitigating fairness-related harms, including improved transparency, clearer specification of dataset coverage, and accounting for annotator disagreement and harms that are context-dependent. FairPrism’s annotations include the extent of stereotyping and demeaning harms, the demographic groups targeted, and appropriateness for different applications. The annotations also include specific harms that occur in interactive contexts and harms that raise normative concerns when the “speaker” is an AI system. Due to its precision and granularity, FairPrism can be used to diagnose (1) the types of fairness- related harms that AI text generation systems cause, and (2) the potential limitations of mitigation methods, both of which we illustrate through case studies. Finally, the process we followed to develop FairPrism offers a recipe for building improved datasets for measuring and mitigating harms caused by AI systems.
@inproceedings{daume23fairprism,
title = {FairPrism: Evaluating Fairness-Related Harms in Text Generation},
author = {Eve Fleisig and Aubrie Amstutz and Chad Atalla and Su Lin Blodgett and
Daum\'e, III, Hal and Alexandra Olteanu and Emily Sheng and Dan
Vann and Hanna Wallach},
booktitle = {Proceedings of the Conference of the Association for Computational
Linguistics (ACL)},
year = {2023},
url = {http://hal3.name/docs/#daume23fairprism},
}
Which Examples Should be Multiply Annotated? Active Learning When Annotators May Disagree
Connor Baumler, Anna Sotnikova and Hal Daumé III
Conference of the Association for Computational Linguistics (ACL), 2023
[Abstract] [BibTeX]
Linguistic annotations, especially for controver-sial topics like hate speech detection, are fre-quently contested due to annotator backgroundsand positionalities. In such situations, pre-serving this disagreement through the machinelearning pipeline can be important for down-stream use cases. However, capturing disagree-ment can increase annotation time and expense.Fortunately, for many tasks, not all examplesare equally controversial; we develop an ac-tive learning approach, Disagreement AwareActive Learning (DAAL) that concentrates an-notations on examples where model entropyand annotator entropy are the most different.Because we cannot know the true entropy of an-notations on unlabeled examples, we estimatea model that predicts annotator entropy trainedusing very few multiply-labeled examples. Wefind that traditional uncertainty-based activelearning underperforms simple passive learn-ing on tasks with high levels of disagreement,but that our active learning approach is able tosuccessfully improve on passive learning, re-ducing the number of annotations required byat least 24\% on average across several datasets.
@inproceedings{daume23daal,
title = {Which Examples Should be Multiply Annotated? Active Learning When
Annotators May Disagree},
author = {Connor Baumler and Anna Sotnikova and Daum\'e, III, Hal},
booktitle = {Proceedings of the Conference of the Association for Computational
Linguistics (ACL)},
year = {2023},
url = {http://hal3.name/docs/#daume23daal},
}
Factual or Contextual? Disentangling Error Types in Entity Description Generation
Navita Goyal, Ani Nenkova and Hal Daumé III
Conference of the Association for Computational Linguistics (ACL), 2023
[Abstract] [BibTeX]
In the task of entity description generation, given a context and a specified entity, a model must describe that entity correctly and in a contextually-relevant way. In this task, as well as broader language generation tasks, the generation of a nonfactual description (factual error) versus an incongruous description (contextual error) is fundamentally different, yet often conflated. We develop an evaluation paradigm that enables us to disentangle these two types of errors in naturally occurring textual contexts. We find that factuality and congruity are often at odds, and that models specifically struggle with accurate descriptions of entities that are less familiar to people. This shortcoming of language models raises concerns around the trustworthiness of such models, since factual errors on less well-known entities are exactly those that a human reader will not recognize.
@inproceedings{daume23factual,
title = {Factual or Contextual? Disentangling Error Types in Entity Description
Generation},
author = {Navita Goyal and Ani Nenkova and Daum\'e, III, Hal},
booktitle = {Proceedings of the Conference of the Association for Computational
Linguistics (ACL)},
year = {2023},
url = {http://hal3.name/docs/#daume23factual},
}
Define, Evaluate, and Improve Task-Oriented Cognitive Capabilities for Instruction Generation
Lingjun Zhao, Khanh Nguyen and Hal Daumé III
ACL, 2023
[Abstract] [BibTeX]
Recent work studies the cognitive capabilities of language models through psychological tests designed for humans. While these studies are helpful for understanding the general capabilities of these models, there is no guarantee that a model possessing sufficient capabilities to pass those tests would actually use those capabilities in performing real-life tasks. In this work, we formulate task-oriented cognitive capabilities, which are human-like cognitive capabilities that language models leverage to perform tasks. These capabilities are (i) the ability to quickly generate good candidate utterances (the search capability) (ii) the ability to predict how a listener interprets those utterances and choose the most appropriate one (the pragmatic capability). We design an evaluation scheme for comparing these capabilities of a language model with those of a human. Applying this scheme to examine various models in a navigation instruction generation problem, we find that their pragmatic capability is severely lacking. This insight leads us to augment them with better models of the listener and obtain a significant boost of 11\% in success rate in guiding real humans. Our work advocates for having a principled procedure for aligning language models with humans that involves (i) formulating task-oriented capabilities, (ii) devising a method to quantify their deficiency, and (iii) iteratively improving them.
@inproceedings{daume23cognitive,
title = {Define, Evaluate, and Improve Task-Oriented Cognitive Capabilities for
Instruction Generation},
author = {Lingjun Zhao and Khanh Nguyen and Daum\'e, III, Hal},
booktitle = {ACL},
year = {2023},
url = {http://hal3.name/docs/#daume23cognitive},
}
Toward Gender-Inclusive Coreference Resolution: An Analysis of Gender and Bias Throughout the Machine Learning Lifecycle
Yang Trista Cao and Hal Daumé III
Computational Linguistics, 2022
[Abstract] [BibTeX]
Correctly resolving textual mentions of people fundamentally entails making inferences about those people. Such inferences raise the risk of systematic biases in coreference resolution systems, including biases that can harm binary and non-binary trans and cis stakeholders. To better understand such biases, we foreground nuanced conceptualizations of gender from sociology and sociolinguistics, and investigate where in the machine learning pipeline such biases can enter a coreference resolution system. We inspect many existing data sets for trans-exclusionary biases, and develop two new data sets for interrogating bias in both crowd annotations and in existing coreference resolution systems. Through these studies, conducted on English text, we confirm that without acknowledging and building systems that recognize the complexity of gender, we will build systems that fail for: quality of service …
@article{daume22coref,
title = {Toward Gender-Inclusive Coreference Resolution: An Analysis of Gender
and Bias Throughout the Machine Learning Lifecycle},
author = {Yang Trista Cao and Daum\'e, III, Hal},
journal = {Computational Linguistics},
year = {2022},
url = {http://hal3.name/docs/#daume22coref},
}
Heterogeneous Supervised Topic Models
Dhanya Sridhar, Hal Daumé III and David Blei
TACL, 2022
[Abstract] [BibTeX]
Researchers in the social sciences are often interested in the relationship between text and an outcome of interest, where the goal is to both uncover latent patterns in the text and predict outcomes for unseen texts. To this end, this paper develops the heterogeneous supervised topic models (HSTM), a probabilistic approach to text analysis and prediction. HSTMs posit a joint model of text and outcomes to find heterogeneous patterns that help with both text analysis and prediction. The main benefit of HSTMs is that they capture heterogeneity in the relationship between text and the outcome across latent topics. To fit HSTMs, we develop a variational inference algorithm based on the auto-encoding variational Bayes framework. We study the performance of HSTMs on eight datasets and find that they consistently outperform related methods, including fine-tuned black box models. Finally, we apply HSTMs to analyze news articles labeled with pro- or anti-tone. We find evidence of differing language used to signal a pro- and anti-tone.
@article{daume22hstm,
title = {Heterogeneous Supervised Topic Models},
author = {Dhanya Sridhar and Daum\'e, III, Hal and David Blei},
journal = {TACL},
year = {2022},
url = {http://hal3.name/docs/#daume22hstm},
}
What's Different between Visual Question Answering for Machine "Understanding" Versus for Accessibility?
Yang Trista Cao, Kyle Seelman, Kyungjun Lee and Hal Daumé III
AACL-IJCNLP, 2022
🏆 Best Theme Paper
[Abstract] [BibTeX]
In visual question answering (VQA), a machine must answer a question given an associated image. Recently, accessibility researchers have explored whether VQA can be deployed in a real-world setting where users with visual impairments learn about their environment by capturing their visual surroundings and asking questions. However, most of the existing benchmarking datasets for VQA focus on machine “understanding” and it remains unclear how progress on those datasets corresponds to improvements in this real-world use case. We aim to answer this question by evaluating discrepancies between machine “understanding” datasets (VQA-v2) and accessibility datasets (VizWiz) by evaluating a variety of VQA models. Based on our findings, we discuss opportunities and challenges in VQA for accessibility and suggest directions for future work.
@inproceedings{daume22vqa,
title = {What's Different between Visual Question Answering for Machine
"Understanding" Versus for Accessibility?},
author = {Yang Trista Cao and Kyle Seelman and Kyungjun Lee and Daum\'e, III,
Hal},
booktitle = {AACL-IJCNLP},
year = {2022},
url = {http://hal3.name/docs/#daume22vqa},
}
Theory-Grounded Measurement of U.S. Social Stereotypes in English Language Models
Yang Trista Cao, Anna Sotnikova, Hal Daumé III, Rachel Rudinger and Linda Zou
Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 2022
[Abstract] [BibTeX]
NLP models trained on text have been shown to reproduce human stereotypes, which can magnify harms to marginalized groups when systems are deployed at scale. We adapt the Agency-Belief-Communion (ABC) stereotype model of Koch et al. (2016) from social psychology as a framework for the systematic study and discovery of stereotypic group-trait associations in language models (LMs). We introduce the sensitivity test (SeT) for measuring stereotypical associations from language models. To evaluate SeT and other measures using the ABC model, we collect group-trait judgments from U.S.-based subjects to compare with English LM stereotypes. Finally, we extend this framework to measure LM stereotyping of intersectional identities.
@inproceedings{daume22stereotypes,
title = {Theory-Grounded Measurement of U.S. Social Stereotypes in English
Language Models},
author = {Yang Trista Cao and Anna Sotnikova and Daum\'e, III, Hal and Rachel
Rudinger and Linda Zou},
booktitle = {Proceedings of the Conference of the North American Chapter of the
Association for Computational Linguistics (NAACL)},
year = {2022},
url = {http://hal3.name/docs/#daume22stereotypes},
}
Deconstructing NLG Evaluation: Evaluation Practices, Assumptions, and Their Implications
Kaitlyn Zhou, Su Lin Blodgett, Adam Trischler, Hal Daumé III, Kaheer Suleman and Alexandra Olteanu
Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 2022
[Abstract] [BibTeX]
There are many ways to express similar things in text, which makes evaluating natural language generation (NLG) systems difficult. Compounding this difficulty is the need to assess varying quality criteria depending on the deployment setting. While the landscape of NLG evaluation has been well-mapped, practitioners' goals, assumptions, and constraints -- which inform decisions about what, when, and how to evaluate -- are often partially or implicitly stated, or not stated at all. Combining a formative semi-structured interview study of NLG practitioners (N=18) with a survey study of a broader sample of practitioners (N=61), we surface goals, community practices, assumptions, and constraints that shape NLG evaluations, examining their implications and how they embody ethical considerations.
@inproceedings{daume22nlg,
title = {Deconstructing NLG Evaluation: Evaluation Practices, Assumptions, and
Their Implications},
author = {Kaitlyn Zhou and Su Lin Blodgett and Adam Trischler and Daum\'e, III,
Hal and Kaheer Suleman and Alexandra Olteanu},
booktitle = {Proceedings of the Conference of the North American Chapter of the
Association for Computational Linguistics (NAACL)},
year = {2022},
url = {http://hal3.name/docs/#daume22nlg},
}
Spoken language interaction with robots: Recommendations for future research
Matthew Marge, Carol Espy-Wilson, Nigel G Ward, Abeer Alwan, Yoav Artzi, Mohit Bansal, Gil Blankenship, Joyce Chai, Hal Daumé III, Debadeepta Dey, Mary Harper, Thomas Howard, Casey Kennington, Ivana Kruijff-Korbayová, Dinesh Manocha, Cynthia Matuszek, Ross Mead, Raymond Mooney, Roger K Moore, Mari Ostendorf, Heather Pon-Barry, Alexander I Rudnicky, Matthias Scheutz, Robert St Amant, Tong Sun, Stefanie Tellex, David Traum and Zhou Yu
Computer Speech and Language, 2022
[Abstract] [BibTeX]
With robotics rapidly advancing, more effective human–robot interaction is increasingly needed to realize the full potential of robots for society. While spoken language must be part of the solution, our ability to provide spoken language interaction capabilities is still very limited. In this article, based on the report of an interdisciplinary workshop convened by the National Science Foundation, we identify key scientific and engineering advances needed to enable effective spoken language interaction with robotics. We make 25 recommendations, involving eight general themes: putting human needs first, better modeling the social and interactive aspects of language, improving robustness, creating new methods for rapid adaptation, better integrating speech and language with other communication modalities, giving speech and language components access to rich representations of the robot’s current knowledge and state, making all components operate in real time, and improving research infrastructure and resources. Research and development that prioritizes these topics will, we believe, provide a solid foundation for the creation of speech-capable robots that are easy and effective for humans to work with.
@inproceedings{daume22spoken,
title = {Spoken language interaction with robots: Recommendations for future
research},
author = {Matthew Marge and Carol Espy-Wilson and Nigel G Ward and Abeer Alwan
and Yoav Artzi and Mohit Bansal and Gil Blankenship and Joyce
Chai and Daum\'e, III, Hal and Debadeepta Dey and Mary Harper and
Thomas Howard and Casey Kennington and Ivana Kruijff-Korbayová
and Dinesh Manocha and Cynthia Matuszek and Ross Mead and Raymond
Mooney and Roger K Moore and Mari Ostendorf and Heather Pon-Barry
and Alexander I Rudnicky and Matthias Scheutz and Robert St Amant
and Tong Sun and Stefanie Tellex and David Traum and Zhou Yu},
booktitle = {Computer Speech and Language},
year = {2022},
url = {http://hal3.name/docs/#daume22spoken},
}
Multi-Step Reasoning Over Unstructured Text with Beam Dense Retrieval
Chen Zhao, Chenyan Xiong, Jordan Boyd-Graber and Hal Daumé III
NAACL (short), 2021
[Abstract] [BibTeX]
Complex question answering often requires finding a reasoning chain that consists of multiple evidence pieces. Current approaches incorporate the strengths of structured knowledge and unstructured text, assuming text corpora is semi-structured. Building on dense retrieval methods, we propose a new multi-step retrieval approach (B EAM DR) that iteratively forms an evidence chain through beam search in dense representations. When evaluated on multi-hop question answering, B EAM DR is competitive to state-of-the-art systems, without using any semi-structured information. Through query composition in dense space, B EAM DR captures the implicit relationships between evidence in the reasoning chain. The code is available at https://github.com/ henryzhao5852/BeamDR
@inproceedings{daume21beamdr,
title = {Multi-Step Reasoning Over Unstructured Text with Beam Dense Retrieval},
author = {Chen Zhao and Chenyan Xiong and Jordan Boyd-Graber and Daum\'e, III,
Hal},
booktitle = {NAACL (short)},
year = {2021},
url = {http://hal3.name/docs/#daume21beamdr},
}
Distantly-Supervised Evidence Retrieval Enables Question Answering without Annotated Evidence Pieces
Chen Zhao, Chenyan Xiong, Jordan Boyd-Graber and Hal Daumé III
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021
[BibTeX]
@inproceedings{daume21distqa,
title = {Distantly-Supervised Evidence Retrieval Enables Question Answering
without Annotated Evidence Pieces},
author = {Chen Zhao and Chenyan Xiong and Jordan Boyd-Graber and Daum\'e, III,
Hal},
booktitle = {Proceedings of the Conference on Empirical Methods in Natural
Language Processing (EMNLP)},
year = {2021},
url = {http://hal3.name/docs/#daume21distqa},
}
Analyzing Stereotypes in Generative Text Inference Tasks
Anna Sotnikova, Yang Trista Cao, Hal Daumé III and Rachel Rudinger
ACL (findings), 2021
[Abstract] [BibTeX]
Stereotypes are inferences drawn about people based on their demographic attributes, which may result in harms to users when a system is deployed. In generative language-inference tasks, given a premise, a model produces plausible hypotheses that follow either logically (natural language inference) or commonsensically (commonsense inference). Such tasks are therefore a fruitful setting in which to explore the degree to which NLP systems encode stereotypes. In our work, we study how stereotypes manifest when the potential targets of stereotypes are situated in real-life, neutral contexts. We collect human judgments on the presence of stereotypes in generated inferences, and compare how perceptions of stereotypes vary due to annotator positionality
@inproceedings{daume21stereotypes,
title = {Analyzing Stereotypes in Generative Text Inference Tasks},
author = {Anna Sotnikova and Yang Trista Cao and Hal {Daum\'e III} and Rachel
Rudinger},
booktitle = {ACL (findings)},
year = {2021},
url = {http://hal3.name/docs/#daume21stereotypes},
}
Meta-learning for Few-Shot NMT Adaptation
Amr Sharaf, Hany Hassan and Hal Daumé III
WNGT@ACL, 2020
🏆 Best Paper Award
[Abstract] [BibTeX]
We present META-MT, a meta-learning approach to adapt Neural Machine Translation (NMT) systems in a few-shot setting. M ETA MT provides a new approach to make NMT models easily adaptable to many target domains with the minimal amount of in-domain data. We frame the adaptation of NMT systems as a meta-learning problem, where we learn to adapt to new unseen domains based on simulated offline meta-training domain adaptation tasks. We evaluate the proposed metalearning strategy on ten domains with general large scale NMT systems. We show that M ETA -MT significantly outperforms classical domain adaptation when very few indomain examples are available. Our experiments shows that M ETA -MT can outperform classical fine-tuning by up to 2.5 BLEU points after seeing only 4, 000 translated words (300 parallel sentences).
@inproceedings{daume20nmtadapt,
title = {Meta-learning for Few-Shot NMT Adaptation},
author = {Amr Sharaf and Hany Hassan and Daum\'e, III, Hal},
booktitle = {WNGT@ACL},
year = {2020},
url = {http://hal3.name/docs/#daume20nmtadapt},
}
On the Potential of Lexico-logical Alignments for Semantic Parsing to SQL Queries
Tianze Shi, Chen Zhao, Jordan Boyd-Graber, Hal Daumé III and Lillian Lee
Findings of EMNLP, 2020
[Abstract] [BibTeX]
Large-scale semantic parsing datasets annotated with logical forms have enabled major advances in supervised approaches. But can richer supervision help even more? To explore the utility of fine-grained, lexical-level supervision, we introduce Squall, a dataset that enriches 11,276 WikiTableQuestions English-language questions with manually created SQL equivalents plus alignments between SQL and question fragments. Our annotation enables new training possibilities for encoder-decoder models, including approaches from machine translation previously precluded by the absence of alignments. We propose and test two methods: (1) supervised attention; (2) adopting an auxiliary objective of disambiguating references in the input queries to table columns. In 5-fold cross validation, these strategies improve over strong baselines by 4.4\% execution accuracy. Oracle experiments suggest that annotated alignments can support further accuracy gains of up to 23.9\%.
@inproceedings{daume20alignments,
title = {On the Potential of Lexico-logical Alignments for Semantic Parsing to
SQL Queries},
author = {Tianze Shi and Chen Zhao and Jordan Boyd-Graber and Hal {Daum\'e III}
and Lillian Lee},
booktitle = {Findings of EMNLP},
year = {2020},
url = {http://hal3.name/docs/#daume20alignments},
code = {https://www.github.com/tzshi/squall},
}
Global Voices: Crossing Borders in Automatic News Summarization
Khanh Nguyen and Hal Daumé III
EMNLP Summarization Workshop, 2019
[Abstract] [BibTeX]
We construct Global Voices, a multilingual dataset for evaluating cross-lingual summarization methods. We extract social-network descriptions of Global Voices news articles to cheaply collect evaluation data for into-English and from-English summarization in 15 languages. Especially, for the into-English summarization task, we crowd-source a high-quality evaluation dataset based on guidelines that emphasize accuracy, coverage, and understandability. To ensure the quality of this dataset, we collect human ratings to filter out bad summaries, and conduct a survey on humans, which shows that the remaining summaries are preferred over the social-network summaries. We study the effect of translation quality in cross-lingual summarization, comparing a translate-then-summarize approach with several baselines. Our results highlight the limitations of the ROUGE metric that are overlooked in monolingual summarization.
@inproceedings{daume19global,
title = {Global Voices: Crossing Borders in Automatic News Summarization},
author = {Khanh Nguyen and Daum\'e, III, Hal},
booktitle = {EMNLP Summarization Workshop},
year = {2019},
url = {http://hal3.name/docs/#daume19global},
}
Help, Anna! Visual Navigation with Natural Multimodal Assistance via Retrospective Curiosity-Encouraging Imitation Learning
Khanh Nguyen and Hal Daumé III
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2019
[Abstract] [BibTeX] [Code/Data]
Mobile agents that can leverage help from humans can potentially accomplish more complex tasks than they could entirely on their own. We develop "Help, Anna!" (HANNA), an interactive photo-realistic simulator in which an agent fulfills object-finding tasks by requesting and interpreting natural languageand-vision assistance. An agent solving tasks in a HANNA environment can leverage simulated human assistants, called ANNA (Automatic Natural Navigation Assistants), which, upon request, provide natural language and visual instructions to direct the agent towards the goals. To address the HANNA problem, we develop a memory-augmented neural agent that hierarchically models multiple levels of decision-making, and an imitation learning algorithm that teaches the agent to avoid repeating past mistakes while simultaneously predicting its own chances of making future progress. Empirically, our approach is able to ask for help more effectively than competitive baselines and, thus, attains higher task success rate on both previously seen and previously unseen environments. We publicly release code and data at this https URL .
@inproceedings{daume19hanna,
title = {Help, Anna! Visual Navigation with Natural Multimodal Assistance via
Retrospective Curiosity-Encouraging Imitation Learning},
author = {Khanh Nguyen and Daum\'e, III, Hal},
booktitle = {Proceedings of the Conference on Empirical Methods in Natural
Language Processing (EMNLP)},
year = {2019},
url = {http://hal3.name/docs/#daume19hanna},
link = {https://github.com/khanhptnk/hanna},
}
Comparing and Developing Tools to Measure the Readability of Domain-Specific Texts
Elissa Redmiles, Lisa Maszkiewicz, Emily Hwang, Dhruv Kuchhal, Everst Liu, Miraida Morales, Denis Peskov, Sudha Rao, Rock Stevens, Kristina Gligorić, Sean Kross, Michelle L. Mazurek and Hal Daumé III
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2019
[Abstract] [BibTeX] [Code/Data]
The readability of a digital text can influence people's ability to learn new things about a range of topics from digital resources (e.g., Wikipedia, WebMD). Readability also impacts search rankings, and is used to evaluate the performance of NLP systems. Despite this, we lack a thorough understanding of how to validly measure readability at scale, especially for domain-specific texts. In this work, we present a comparison of the validity of well-known readability measures and introduce a novel approach, Smart Cloze, which is designed to address short-comings of existing measures. We compare these approaches across four different corpora: crowdworker-generated stories, Wikipedia articles, security and privacy advice, and health information. On these corpora, we evaluate the convergent and content validity of each measure, and detail tradeoffs in score precision, domain-specificity, and participant burden. These results provide a foundation for more accurate readability measurements and better evaluation of new natural-language-processing systems and tools.
@inproceedings{daume19readability,
title = {Comparing and Developing Tools to Measure the Readability of
Domain-Specific Texts},
author = {Elissa Redmiles and Lisa Maszkiewicz and Emily Hwang and Dhruv Kuchhal
and Everst Liu and Miraida Morales and Denis Peskov and Sudha Rao
and Rock Stevens and Kristina Gligori\'c and Sean Kross and
Michelle L. Mazurek and Daum\'e, III, Hal},
booktitle = {Proceedings of the Conference on Empirical Methods in Natural
Language Processing (EMNLP)},
year = {2019},
link = {http://github.com/SP2-MC2/Readability-Resources},
url = {http://hal3.name/docs/#daume19readability},
}
Interpretable Engagement Models for MOOCs using Hinge-loss Markov Random Fields
Arti Ramesh, Dan Goldwasser, Bert Huang, Hal Daumé III and Lise Getoor
IEEE Transations on Learning Technologies, 2019
[Abstract] [BibTeX]
Maintaining and cultivating student engagement is critical for learning. Understanding factors affecting student engagement can help in designing better courses and improving student retention. The large number of participants in massive open online courses (MOOCs) and data collected from their interactions on the MOOC open up avenues for studying student engagement at scale. In this work, we develop an interpretable statistical relational learning model for understanding student engagement in online courses using a complex combination of behavioral, linguistic, structural, and temporal cues. We show how to abstract student engagement types of active, passive, and disengagement as meaningful latent variables using logical rules in our model connecting student behavioral signals with student success in MOOCs. We demonstrate that the latent formulation for engagement helps in predicting two measures of student success: performance, their final grade in the course, and survival, their continued presence in the course till the end, across seven MOOCs. Further, in order to initiate better instructor interventions, we need to be able to predict student success early in the course. We demonstrate that we can predict student success early in the course reliably using the latent model. We also demonstrate the utility of our models in predicting student success in new courses, by training our models on one course and testing on another course. We show that the latent abstractions are helpful in predicting student success and engagement reliably in new MOOCs that haven’t yet gathered student interaction data. We then perform a closer quantitative analysis of different features derived from student interactions on the MOOC and identify student activities that are good indicators of student success at different points in the course. Through a qualitative analysis of the latent engagement variable values, we demonstrate their utility in understanding students’ engagement levels at various points in the course and movement of students across different types of engagement.
@article{daume19moocs,
title = {Interpretable Engagement Models for {MOOC}s using Hinge-loss Markov
Random Fields},
author = {Arti Ramesh and Dan Goldwasser and Bert Huang and Hal {Daum\'e III} and
Lise Getoor},
journal = {IEEE Transations on Learning Technologies},
year = {2019},
url = {http://hal3.name/docs/#daume19moocs},
}
Content Selection in Deep Learning Models of Summarization
Chris Kedzie, Kathleen McKeown and Hal Daumé III
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2018
[Abstract] [BibTeX] [Code/Data]
We carry out experiments with deep learning models of summarization across the domains of news, personal stories, meetings, and medical articles in order to understand how content selection is performed. We find that many sophisticated features of state of the art extractive summarizers do not improve performance over simpler models. These results suggest that it is easier to create a summarizer for a new domain than previous work suggests and bring into question the benefit of deep learning models for summarization for those domains that do have massive datasets (i.e., news). At the same time, they suggest important questions for new research in summarization; namely, new forms of sentence representations or external knowledge sources are needed that are better suited to the summarization task.
@inproceedings{daume18summarization,
title = {Content Selection in Deep Learning Models of Summarization},
author = {Chris Kedzie and Kathleen {McKeown} and Daum\'e, III, Hal},
booktitle = {Proceedings of the Conference on Empirical Methods in Natural
Language Processing (EMNLP)},
year = {2018},
link = {https://github.com/kedz/nnsum/tree/emnlp18-release},
url = {http://hal3.name/docs/#daume18summarization},
}
Unsupervised Learning of Evolving Relationships Between Literary Characters
Snigdha Chaturvedi, Mohit Iyyer and Hal Daumé III
National Conference on Artificial Intelligence (AAAI), 2017
[Abstract] [BibTeX]
Understanding inter-character relationships is fundamental for understanding character intentions and goals in a narra- tive. This paper addresses unsupervised modeling of relation- ships between characters. We model relationships as dynamic phenomenon, represented as evolving sequences of latent states empirically learned from data. Unlike most previous work our approach is completely unsupervised. This enables data-driven inference of inter-character relationship types be- yond simple sentiment polarities, by incorporating lexical and semantic representations, and leveraging large quantities of raw text. We present three models based on rich sets of lin- guistic features that capture various cues about relationships. We compare these models with existing techniques and also demonstrate that relationship categories learned by our model are semantically coherent.
@inproceedings{daume17evolve,
title = {Unsupervised Learning of Evolving Relationships Between Literary
Characters},
author = {Snigdha Chaturvedi and Mohit Iyyer and Hal {Daum\'e III}},
booktitle = {Proceedings of the National Conference on Artificial Intelligence
(AAAI)},
year = {2017},
url = {http://hal3.name/docs/#daume17evolve},
}
Feuding Families and Former Friends: Unsupervised Learning for Dynamic Fictional Relationships
Mohit Iyyer, Anupam Guha, Snigdha Chaturvedi, Jordan Boyd-Graber and Hal Daumé III
Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 2016
🏆 Best Paper Award
[Abstract] [BibTeX]
Understanding how a fictional relationship between two characters changes over time (e.g., from best friends to sworn enemies) is a key challenge in digital humanities scholarship. We present a novel unsupervised neural network for this task that incorporates dictionary learning to generate interpretable, accurate relationship trajectories. While previous work on characterizing literary relationships relies on plot summaries annotated with predefined labels, our model jointly learns a set of global relationship descriptors as well as a trajectory over these descriptors for each relationship in a dataset of raw text from novels. We find that our model learns descriptors of events (e.g., marriage or murder) as well as interpersonal states (love, sadness). Our model outperforms topic model baselines on two crowdsourced tasks, and we also find interesting correlations to annotations in an existing dataset.
@inproceedings{daume16feuding,
title = {Feuding Families and Former Friends: Unsupervised Learning for Dynamic
Fictional Relationships},
author = {Mohit Iyyer and Anupam Guha and Snigdha Chaturvedi and Jordan
Boyd-Graber and Hal {Daum\'e III}},
booktitle = {Proceedings of the Conference of the North American Chapter of the
Association for Computational Linguistics (NAACL)},
year = {2016},
url = {http://hal3.name/docs/#daume16feuding},
}
The UMD CLPsych 2016 Shared Task System: Text Representation for Predicting Triage of Forum Posts about Mental Health
Meir Friedenberg, Hadi Amiri, Hal Daumé III and Philip Resnik
Workshop on CL for Clinical Psychology, 2016
[Abstract] [BibTeX]
We report on a multiclass classifier for triage of mental health forum posts as part of the CLPsych 2016 shared task. We investigate a number of document representations, including topic models and representation learning to represent posts in semantic space, including context-and emotion-sensitive feature representations of posts.
@inproceedings{daume16clpsych,
title = {The UMD CLPsych 2016 Shared Task System: Text Representation for
Predicting Triage of Forum Posts about Mental Health},
author = {Meir Friedenberg and Hadi Amiri and Hal {Daum\'e III} and Philip
Resnik},
booktitle = {Workshop on CL for Clinical Psychology},
year = {2016},
url = {http://hal3.name/docs/#daume16clpsych},
}
Modeling Evolving Relationships Between Characters in Literary Novels
Snigdha Chaturvedi, Shashank Srivastava, Hal Daumé III and Chris Dyer
National Conference on Artificial Intelligence (AAAI), 2016
[Abstract] [BibTeX]
Studying characters plays a vital role in computationally rep- resenting and interpreting narratives. Unlike previous work, which has focused on inferring character roles, we focus on the problem of modeling their relationships. Rather than as- suming a fixed relationship for a character pair, we hypothe- size that relationships temporally evolve with the progress of the narrative, and formulate the problem of relationship mod- eling as a structured prediction problem. We propose a semi- supervised framework to learn relationship sequences from fully as well as partially labeled data. We present a Marko- vian model capable of accumulating historical beliefs about the relationship and status changes. We use a set of rich lin- guistic and semantically motivated features that incorporate world knowledge to investigate the textual content of narra- tive. We empirically demonstrate that such a framework out- performs competitive baselines.
@inproceedings{daume16literary,
title = {Modeling Evolving Relationships Between Characters in Literary Novels},
author = {Snigdha Chaturvedi and Shashank Srivastava and Hal {Daum\'e III} and
Chris Dyer},
booktitle = {Proceedings of the National Conference on Artificial Intelligence
(AAAI)},
year = {2016},
url = {http://hal3.name/docs/#daume16literary},
}
Learning Text Pair Similarity with Context-sensitive Autoencoders
Hadi Amiri, Philip Resnik, Jordan Boyd-Graber and Hal Daumé III
Conference of the Association for Computational Linguistics (ACL), 2016
[Abstract] [BibTeX]
We present a pairwise context-sensitive Autoencoder for computing text pair similarity. Our model encodes input text into context-sensitive representations and uses them to compute similarity between text pairs. Our model outperforms the state-of-the-art models in two semantic retrieval tasks and a contextual word similarity task. For retrieval, our unsupervised approach that merely ranks inputs with respect to the cosine similarity between their hidden representations shows comparable performance with the state-of-the-art supervised models and in some cases outperforms them.
@inproceedings{daume16autoencode,
title = {Learning Text Pair Similarity with Context-sensitive Autoencoders},
author = {Hadi Amiri and Philip Resnik and Jordan Boyd-Graber and Hal {Daum\'e
III}},
booktitle = {Proceedings of the Conference of the Association for Computational
Linguistics (ACL)},
year = {2016},
url = {http://hal3.name/docs/#daume16autoencode},
}
Predicting the impact of scientific concepts using full‐text features
Kathy McKeown, Hal Daumé III, Snigdha Chaturvedi, John Paparrizos, Kapil Thadani, Pablo Barrio, Or Biran, Suvarna Bothe, Michael Collins, Kenneth R Fleischmann, Luis Gravano, Rahul Jha, Ben King, Kevin McInerney, Taesun Moon, Arvind Neelakantan, Diarmuid O'Seaghdha, Dragomir Radev, Clay Templeton and Simone Teufel
JAIST, 2016
[Abstract] [BibTeX]
New scientific concepts, interpreted broadly, are con- tinuously introduced in the literature, but relatively few concepts have a long-term impact on society. The identification of such concepts is a challenging predic- tion task that would help multiple parties—including researchers and the general public—focus their atten- tion within the vast scientific literature. In this paper we present a system that predicts the future impact of a scientific concept, represented as a technical term, based on the information available from recently pub- lished research articles. We analyze the usefulness of rich features derived from the full text of the articles through a variety of approaches, including rhetorical sentence analysis, information extraction, and time- series analysis. The results from two large-scale experi- ments with 3.8 million full-text articles and 48 million metadata records support the conclusion that full-text features are significantly more useful for prediction than metadata-only features and that the most accurate pre- dictions result from combining the metadata and full- text features. Surprisingly, these results hold even when the metadata features are available for a much larger number of documents than are available for the full-text features.
@inproceedings{daume16impact,
title = {Predicting the impact of scientific concepts using full‐text
features},
author = {Kathy McKeown and Hal {Daum\'e III} and Snigdha Chaturvedi and John
Paparrizos and Kapil Thadani and Pablo Barrio and Or Biran and
Suvarna Bothe and Michael Collins and Kenneth R Fleischmann and
Luis Gravano and Rahul Jha and Ben King and Kevin McInerney and
Taesun Moon and Arvind Neelakantan and Diarmuid O'Seaghdha and
Dragomir Radev and Clay Templeton and Simone Teufel},
booktitle = {JAIST},
year = {2016},
url = {http://hal3.name/docs/#daume16impact},
}
Interpretese vs. Translationese: The Uniqueness of Human Strategies in Simultaneous Interpretation
He He, Jordan Boyd-Graber and Hal Daumé III
Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 2016
[Abstract] [BibTeX]
Computational approaches to simultaneous in- terpretation are stymied by how little we know about the tactics human interpreters use. We produce a parallel corpus of translated and si- multaneously interpreted text and study differ- ences between them through a computational approach. Our analysis reveals that human in- terpreters regularly apply several effective tac- tics to reduce translation latency, including sen- tence segmentation and passivization. In addi- tion to these unique, clever strategies, we show that limited human memory also causes other idiosyncratic properties of human interpreta- tion such as generalization and omission of source content.
@inproceedings{daume16interpretese,
title = {Interpretese vs. Translationese: The Uniqueness of Human Strategies in
Simultaneous Interpretation},
author = {He He and Jordan Boyd-Graber and Hal {Daum\'e III}},
booktitle = {Proceedings of the Conference of the North American Chapter of the
Association for Computational Linguistics (NAACL)},
year = {2016},
url = {http://hal3.name/docs/#daume16interpretese},
}
A Framework for Discriminative Rule Selection in Hierarchical Moses
Fabienne Braune, Alexander Fraser, Hal Daumé III and Aleš Tamchyna
WMT, 2016
[Abstract] [BibTeX]
Training discriminative rule selection models is usually expensive because of the very large size of the hierarchical grammar. Previous approaches reduced the training costs either by (i) using models that are local to the source side of the rules or (ii) by heavily pruning out negative samples. Moreover, all previous evaluations were performed on small scale translation tasks, containing at most 250,000 sentence pairs. We propose two contributions to discriminative rule selection. First, we test previous approaches on two French-English translation tasks in domains for which only limited resources are available and show that they fail to improve translation quality. To improve on such tasks, we propose a rule selection model that is (i) global with rich label-dependent features (ii) trained with all available negative samples. Our global model yields significant improvements, up to 1 BLEU point, over previously proposed rule selection models. Second, we successfully scale rule selection models to large translation tasks but have so far failed to produce significant improvements in BLEU on these tasks.
@inproceedings{daume16moses,
title = {A Framework for Discriminative Rule Selection in Hierarchical Moses},
author = {Fabienne Braune and Alexander Fraser and Hal {Daum\'e III} and Ale\v{s}
Tamchyna},
booktitle = {WMT},
year = {2016},
url = {http://hal3.name/docs/#daume16moses},
}
Short Text Representation for Detecting Churn in Microblogs
Hadi Amiri and Hal Daumé III
National Conference on Artificial Intelligence (AAAI), 2016
[Abstract] [BibTeX]
Churn happens when a customer leaves a brand or stop us- ing its services. Brands reduce their churn rates by identi- fying and retaining potential churners through customer re- tention campaigns. In this paper, we consider the problem of classifying micro-posts as churny or non-churny with respect to a given brand. Motivated by the recent success of recur- rent neural networks (RNNs) in word representation, we pro- pose to utilize RNNs to learn micro-post and churn indicator representations. We show that such representations improve the performance of churn detection in microblogs and lead to more accurate ranking of churny contents. Furthermore, in this research we show that state-of-the-art sentiment analysis approaches fail to identify churny contents. Experiments on Twitter data about three telco brands show the utility of our approach for this task.
@inproceedings{daume16churn,
title = {Short Text Representation for Detecting Churn in Microblogs},
author = {Hadi Amiri and Hal {Daum\'e III}},
booktitle = {Proceedings of the National Conference on Artificial Intelligence
(AAAI)},
year = {2016},
url = {http://hal3.name/docs/#daume16churn},
}
Ask, and Shall You Receive? Understanding Desire Fulfillment in Natural Language Text
Snigdha Chaturvedi, Dan Goldwasser and Hal Daumé III
National Conference on Artificial Intelligence (AAAI), 2016
[Abstract] [BibTeX]
The ability to comprehend wishes or desires and their fulfillment is important to Natural Language Understanding. This paper introduces the task of identifying if a desire expressed by a subject in a given short piece of text was fulfilled. We propose various unstructured and structured models that capture fulfillment cues such as the subject's emotional state and actions. Our experiments with two different datasets demonstrate the importance of understanding the narrative and discourse structure to address this task.
@inproceedings{daume16ask,
title = {Ask, and Shall You Receive? Understanding Desire Fulfillment in Natural
Language Text},
author = {Snigdha Chaturvedi and Dan Goldwasser and Hal {Daum\'e III}},
booktitle = {Proceedings of the National Conference on Artificial Intelligence
(AAAI)},
year = {2016},
url = {http://hal3.name/docs/#daume16ask},
}
Why discourse affects speakers’ choice of referring expressions
Naho Orita, Eliana Vornov, Naomi H Feldman and Hal Daumé III
Conference of the Association for Computational Linguistics (ACL), 2015
[Abstract] [BibTeX]
We propose a language production model that uses dynamic discourse information to account for speakers’ choices of referring expressions. Our model extends previous rational speech act models (Frank and Goodman, 2012) to more naturally distributed linguistic data, instead of assuming a controlled experimental setting. Simulations show a close match between speakers’ utterances and model predictions, indicating that speakers’ behavior can be modeled in a principled way by considering the probabilities of referents in the discourse and the information conveyed by each word.
@inproceedings{daume15referring,
title = {Why discourse affects speakers’ choice of referring expressions},
author = {Naho Orita and Eliana Vornov and Naomi H Feldman and Daum\'e, III,
Hal},
booktitle = {Proceedings of the Conference of the Association for Computational
Linguistics (ACL)},
year = {2015},
url = {http://hal3.name/docs/#daume15referring},
}
Deep unordered composition rivals syntactic methods for text classification
Mohit Iyyer, Varun Manjunatha, Jordan Boyd-Graber and Hal Daumé III
Conference of the Association for Computational Linguistics (ACL), 2015
[Abstract] [BibTeX]
Many existing deep learning models for natural language processing tasks focus on learning the compositionality of their in- puts, which requires many expensive com- putations. We present a simple deep neural network that competes with and, in some cases, outperforms such models on sen- timent analysis and factoid question an- swering tasks while taking only a fraction of the training time. While our model is syntactically-ignorant, we show significant improvements over previous bag-of-words models by deepening our network and ap- plying a novel variant of dropout. More- over, our model performs better than syn- tactic models on datasets with high syn- tactic variance. We show that our model makes similar errors to syntactically-aware models, indicating that for the tasks we con- sider, nonlinearly transforming the input is more important than tailoring a network to incorporate word order and syntax.
@inproceedings{daume15dan,
title = {Deep unordered composition rivals syntactic methods for text
classification},
author = {Mohit Iyyer and Varun Manjunatha and Jordan Boyd-Graber and Hal
{Daum\'e III}},
booktitle = {Proceedings of the Conference of the Association for Computational
Linguistics (ACL)},
year = {2015},
url = {http://hal3.name/docs/#daume15dan},
}
Syntax-based Rewriting for Simultaneous Machine Translation
He He, Alvin Grissom II, Jordan Boyd-Graber and Hal Daumé III
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2015
[Abstract] [BibTeX]
Divergent word order between languages causes delay in simultaneous machine translation. We present a sentence rewrit- ing method that generates more mono- tonic translations to improve the speed- accuracy tradeoff. We design grammati- cality and meaning-preserving syntactic transformation rules that operate on con- stituent parse trees. We apply the rules to reference translations to make their word order closer to the source language word order. On Japanese-English transla- tion (two languages with substantially dif- ferent structure), incorporating the rewrit- ten, more monotonic reference translation into a phrase-based machine translation system enables better translations faster than a baseline system that only uses gold reference translations.
@inproceedings{daume15rewrite,
title = {Syntax-based Rewriting for Simultaneous Machine Translation},
author = {He He and Alvin Grissom II and Jordan Boyd-Graber and Hal {Daum\'e
III}},
booktitle = {Proceedings of the Conference on Empirical Methods in Natural
Language Processing (EMNLP)},
year = {2015},
url = {http://hal3.name/docs/#daume15rewrite},
}
Understanding MOOC Discussion Forums using seeded LDA
Arti Ramesh, Dan Goldwasser, Bert Huang, Hal Daumé III and Lise Getoor
Workshop on Innovative Use of NLP for Building Educational Applications, 2014
[Abstract] [BibTeX]
Discussion forums serve as a platform for student discussions in massive open online courses (MOOCs). Analyzing content in these forums can uncover useful information for improving student retention and help in initiating instructor intervention. In this work, we explore the use of topic models, particularly seeded topic models toward this goal. We demonstrate that features derived from topic analysis help in predicting student survival.
@inproceedings{daume14seededmooc,
title = {Understanding {MOOC} Discussion Forums using seeded {LDA}},
author = {Arti Ramesh and Dan Goldwasser and Bert Huang and Hal {Daum\'e III} and
Lise Getoor},
booktitle = {Workshop on Innovative Use of NLP for Building Educational
Applications},
year = {2014},
url = {http://hal3.name/docs/#daume14seededmooc},
}
Don't Until the Final Verb Wait: Reinforcement Learning for Simultaneous Machine Translation
Alvin Grissom II, Jordan Boyd-Graber, He He, John Morgan and Hal Daumé III
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2014
[BibTeX]
@inproceedings{daume14simultaneousmt,
title = {Don't Until the Final Verb Wait: Reinforcement Learning for Simultaneous
Machine Translation},
author = {Alvin Grissom II and Jordan Boyd-Graber and He He and John Morgan and
Hal {Daum\'e III}},
booktitle = {Proceedings of the Conference on Empirical Methods in Natural
Language Processing (EMNLP)},
year = {2014},
url = {http://hal3.name/docs/#daume14simultaneousmt},
}
Integrating a Discriminative Classifier into Phrase-based and Hierarchical Decoding
Aleš Tamchyna, Fabienne Braune, Alexander Fraser, Marine Carpuat, Hal Daumé III and Chris Quirk
The Prague Bulletin of Mathematical Linguistics, 2014
[Abstract] [BibTeX]
Current state-of-the-art statistical machine translation (SMT) relies on simple feature functions which make independence assumptions at the level of phrases or hierarchical rules. However, it is well-known that discriminative models can benefit from rich features extracted from the source sentence context outside of the applied phrase or hierarchical rule, which is available at decoding time. We present a framework for the open-source decoder Moses that allows discriminative models over source context to easily be trained on a large number of examples and then be included as feature functions in decoding.
@inproceedings{daume14vwmoses,
title = {Integrating a Discriminative Classifier into Phrase-based and
Hierarchical Decoding},
author = {Ale\v{s} Tamchyna and Fabienne Braune and Alexander Fraser and Marine
Carpuat and Hal {Daum\'e III} and Chris Quirk},
booktitle = {The Prague Bulletin of Mathematical Linguistics},
year = {2014},
url = {http://hal3.name/docs/#daume14vwmoses},
}
Predicting Instructor Intervention in MOOC Forums
Snigdha Chaturvedi, Dan Goldwasser and Hal Daumé III
Conference of the Association for Computational Linguistics (ACL), 2014
[Abstract] [BibTeX]
Instructor intervention in student discussion forums is a vital component in Massive Open Online Courses (MOOCs), where personalized interaction is limited. This paper introduces the problem of predicting instructor interventions in MOOC forums. We propose several prediction models designed to capture unique aspects of MOOCs, combining course information, forum structure and posts content. Our models abstract contents of individual posts of threads using latent categories, learned jointly with the binary intervention prediction problem. Experiments over data from two Coursera MOOCs demonstrate that incorporating the structure of threads into the learning problem leads to better predictive performance.
@inproceedings{daume14moocintervention,
title = {Predicting Instructor Intervention in MOOC Forums},
author = {Snigdha Chaturvedi and Dan Goldwasser and Hal {Daum\'e III}},
booktitle = {Proceedings of the Conference of the Association for Computational
Linguistics (ACL)},
year = {2014},
url = {http://hal3.name/docs/#daume14moocintervention},
}
"I Object!" Modeling Latent Pragmatic Effects in Courtroom Dialogues
Dan Goldwasser and Hal Daumé III
Conference of the European Association for Computational Linguistics (EACL), 2014
[BibTeX]
@inproceedings{daume14iobject,
title = {``I Object!'' Modeling Latent Pragmatic Effects in Courtroom Dialogues},
author = {Dan Goldwasser and Hal {Daum\'e III}},
booktitle = {Proceedings of the Conference of the European Association for
Computational Linguistics (EACL)},
year = {2014},
url = {http://hal3.name/docs/#daume14iobject},
}
Modeling Syntactic and Semantic Structures in Hierarchical Phrase-based Translation
Junhui Li, Philip Resnik and Hal Daumé III
Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 2013
[Abstract] [BibTeX]
Incorporating semantic structure into a linguistics-free translation model is challenging, since semantic structures are closely tied to syntax. In this paper, we propose a two-level approach to exploiting predicate-argument structure reordering in a hierarchical phrase-based translation model. First, we introduce linguistically motivated constraints into a hierarchical model, guiding translation phrase choices in favor of those that respect syntactic boundaries. Second, based on such translation phrases, we propose a predicate-argument structure reordering model that predicts reordering not only between an argument and its predicate, but also between two arguments. Experiments on Chinese-to-English translation demonstrate that both advances significantly improve translation accuracy.
@inproceedings{daume13semanticmt,
title = {Modeling Syntactic and Semantic Structures in Hierarchical Phrase-based
Translation},
author = {Junhui Li and Philip Resnik and and Hal {Daum\'e III}},
booktitle = {Proceedings of the Conference of the North American Chapter of the
Association for Computational Linguistics (NAACL)},
year = {2013},
url = {http://hal3.name/docs/#daume13semanticmt},
}
Discriminatively Enhanced Topic Models
Snigdha Chaturvedi, Hal Daumé III and Taesun Moon
International Conference on Data Mining (ICDM), 2013
[BibTeX]
@inproceedings{daume13detm,
title = {Discriminatively Enhanced Topic Models},
author = {Snigdha Chaturvedi and Hal {Daum\'e III} and Taesun Moon},
booktitle = {International Conference on Data Mining (ICDM)},
year = {2013},
url = {http://hal3.name/docs/#daume13detm},
}
Measuring Machine Translation Errors in New Domains
Ann Irvine, John Morgan, Marine Carpuat, Hal Daumé III and Dragos Munteanu
Transactions of the Association for Computational Linguistics (TACL), 2013
[Abstract] [BibTeX]
We develop two techniques for analyzing the effect of porting a machine translation system to a new domain. One is a macro-level analysis that measures how domain shift affects corpus-level evaluation; the second is a micro-level analysis for word-level errors. We apply these methods to understand what happens when a Parliament-trained phrase-based machine translation system is applied in four very different domains: news, medical texts, scientific articles and movie subtitles. We present quantitative and qualitative experiments that highlight opportunities for future research in domain adaptation for machine translation.
@article{daume13mterrors,
title = {Measuring Machine Translation Errors in New Domains},
author = {Ann Irvine and John Morgan and Marine Carpuat and Hal {Daum\'e III} and
Dragos Munteanu},
journal = {Transactions of the Association for Computational Linguistics (TACL)},
year = {2013},
url = {http://hal3.name/docs/#daume13mterrors},
}
Improving performance of natural language processing part-of-speech tagging on clinical narratives through domain adaptation
Jeffrey Ferraro, Hal Daumé III, Scott DuVall, Wendy Chapman, Henk Harkema and Peter Haug
Jornal of the American Medical Informatics Association, 2013
[Abstract] [BibTeX]
Natural language processing (NLP) tasks are commonly decomposed into subtasks, chained together to form processing pipelines. The residual error produced in these subtasks propagates, adversely affecting the end objectives. Limited availability of annotated clinical data remains a barrier to reaching state-of-the-art operating characteristics using statistically based NLP tools in the clinical domain. Here we explore the unique linguistic constructions of clinical texts and demonstrate the loss in operating characteristics when out-of-the-box part-of-speech (POS) tagging tools are applied to the clinical domain. We test a domain adaptation approach integrating a novel lexical-generation probability rule used in a transformation-based learner to boost POS performance on clinical narratives.
@inproceedings{daume13clinical,
title = {Improving performance of natural language processing part-of-speech
tagging on clinical narratives through domain adaptation},
author = {Jeffrey Ferraro and Hal {Daum\'e III} and Scott DuVall and Wendy
Chapman and Henk Harkema and Peter Haug},
booktitle = {Jornal of the American Medical Informatics Association},
year = {2013},
url = {http://hal3.name/docs/#daume13clinical},
}
SenseSpotting: Never let your parallel data tie you to an old domain
Marine Carpuat, Hal Daumé III, Katharine Henry, Ann Irvine, Jagadeesh Jagarlamudi and Rachel Rudinger
Conference of the Association for Computational Linguistics (ACL), 2013
[Abstract] [BibTeX]
Words often gain new senses in new domains. Being able to automatically identify, from a corpus of monolingual text, which word tokens are being used in a previously unseen sense has applications to machine translation and other tasks sensitive to lexical semantics. We define a task, S ENSE S POTTING, in which we build systems to spot tokens that have new senses in new domain text. Instead of difficult and expensive annotation, we build a goldstandard by leveraging cheaply available parallel corpora, targeting our approach to the problem of domain adaptation for machine translation. Our system is able to achieve F-measures of as much as 80%, when applied to word types it has never seen before. Our approach is based on a large set of novel features that capture varied aspects of how words change when used in new domains.
@inproceedings{daume13sensespotting,
title = {{SenseSpotting}: Never let your parallel data tie you to an old domain},
author = {Marine Carpuat and Hal {Daum\'e III} and Katharine Henry and Ann Irvine
and Jagadeesh Jagarlamudi and Rachel Rudinger},
booktitle = {Proceedings of the Conference of the Association for Computational
Linguistics (ACL)},
year = {2013},
url = {http://hal3.name/docs/#daume13sensespotting},
}
A Computational Model for Plot Units
Amit Goyal, Ellen Riloff and Hal Daumé III
Computational Intelligence Journal, 2013
[Abstract] [BibTeX]
This research revisits plot units, which were developed in the 1980s as a conceptual knowledge structure to represent the affect states of and emotional tensions between characters in narrative stories. We present a fully automated system, called AESOP, that generates plot unit representations for narrative texts. AESOP performs four steps: affect state recognition, character identification, affect state projection, and link creation. We also identify a type of knowledge that seems to be missing from existing lexical resources: verbs that impart positive or negative polarity onto their patients (e.g., “eat” imparts negative polarity because being eaten is bad, whereas “fed” imparts positive polarity because being fed is good). We develop two techniques to automatically harvest these “patient polarity verbs” (PPVs) from a Web corpus, and show that the PPVs improve affect state recognition. Finally, we evaluate AESOP’s performance on a set of fables, and present several analyses to shed light on the capabilities and limitations of current natural language processing technology for plot unit generation.
@article{daume13plotunits,
author = {Amit Goyal and Ellen Riloff and Hal {Daum\'e III}},
title = {A Computational Model for Plot Units},
journal = {Computational Intelligence Journal},
year = {2013},
volume = {29},
number = {3},
url = {http://hal3.name/docs/#daume13plotunits}
}
Monolingual Marginal Matching for Translation Model Adaptation
Ann Irvine, Chris Quirk and Hal Daumé III
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2013
[BibTeX]
@inproceedings{daume13mm,
title = {Monolingual Marginal Matching for Translation Model Adaptation},
author = {Ann Irvine and Chris Quirk and Hal {Daum\'e III}},
booktitle = {Proceedings of the Conference on Empirical Methods in Natural
Language Processing (EMNLP)},
year = {2013},
url = {http://hal3.name/docs/#daume13mm},
}
Dynamic Feature Selection for Dependency Parsing
He He, Hal Daumé III and Jason Eisner
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2013
[Abstract] [BibTeX] [Slides] [Video]
Feature computation and exhaustive search have significantly restricted the speed of graph-based dependency parsing. We propose a faster framework of dynamic feature selection, where features are added sequentially as needed, edges are pruned early, and decisions are made online for each sentence. We model this as a sequential decision-making problem and solve it by imitation learning techniques. We test our method on 7 languages. Our dynamic parser can achieve accuracies comparable or even superior to parsers using a full set of features, while computing fewer than 30% of the feature templates.
@inproceedings{daume13depfeat,
title = {Dynamic Feature Selection for Dependency Parsing},
author = {He He and Hal {Daum\'e III} and Jason Eisner},
booktitle = {Proceedings of the Conference on Empirical Methods in Natural
Language Processing (EMNLP)},
year = {2013},
url = {http://hal3.name/docs/#daume13depfeat},
}
Midge: Generating Image Descriptions From Computer Vision Detections
Margaret Mitchell, Jesse Dodge, Amit Goyal, Kota Yamaguchi, Karl Stratos, Xufeng Han, Alyssa Mensch, Alexander C. Berg, Tamara L. Berg and Hal Daumé III
European Chapter of the Association for Computational Linguistics (EACL), 2012
🏆 Test of Time Award (2022)
[Abstract] [BibTeX]
This paper introduces a novel generation system that composes humanlike descriptions of images from computer vision detections. By leveraging syntactically informed word co-occurrence statistics, the generator filters and constrains the noisy detections output from a vision system to generate syntactic trees that detail what the computer vision system sees. Results show that the generation system outperforms state-of-the-art systems, automatically generating some of the most natural image descriptions to date.
@InProceedings{daume12midge,
author = {Margaret Mitchell and Jesse Dodge and Amit Goyal and Kota Yamaguchi and
Karl Stratos and Xufeng Han and Alyssa Mensch and Alexander C.
Berg and Tamara L. Berg and Hal {Daum\'e III}},
title = {Midge: Generating Image Descriptions From Computer Vision Detections},
booktitle = {European Chapter of the Association for Computational Linguistics
(EACL)},
year = {2012},
url = {http://hal3.name/docs/#daume12midge}
}
Incorporating Lexical Priors into Topic Models
Jagadeesh Jagarlamudi, Hal Daumé III and Raghavendra Udupa
Conference on European Chapter of the Association for Computational Linguistics (EACL), 2012
[Abstract] [BibTeX]
Topic models have great potential for helping users understand document corpora. This potential is stymied by their purely unsupervised nature, which often leads to topics that are neither entirely meaningful nor effective in extrinsic tasks (Chang et al., 2009). We propose a simple and effective way to guide topic models to learn topics of specific interest to a user. We achieve this by providing sets of seed words that a user believes are representative of the underlying topics in a corpus. Our model uses these seeds to improve both topicword distributions (by biasing topics to produce appropriate seed words) and to improve document-topic distributions (by biasing documents to select topics related to the seed words they contain). Extrinsic evaluation on a document clustering task reveals a significant improvement when using seed information, even over other models that use seed information navely.
@inproceedings{daume12seeded,
title = {Incorporating Lexical Priors into Topic Models},
author = {Jagadeesh Jagarlamudi and Hal {Daum\'e III} and Raghavendra Udupa},
booktitle = {Proceedings of the Conference on European Chapter of the Association
for Computational Linguistics (EACL)},
year = {2012},
address = {Avignon, France},
url = {http://hal3.name/docs/#daume12seeded}
}
Detecting Visual Text
Jesse Dodge, Amit Goyal, Xufeng Han, Alyssa Mensch, Margaret Mitchell, Karl Stratos, Kota Yamaguchi, Yejin Choi, Hal Daumé III, Alexander C. Berg and Tamara L. Berg
North American Chapter of the Association for Computational Linguistics (NAACL), 2012
[Abstract] [BibTeX]
When people describe a scene, they often include information that is not visually apparent; sometimes based on background knowledge, sometimes to tell a story. We aim to separate visual text—descriptions of what is being seen—from non-visual text in natural images and their descriptions. To do so, we first concretely define what it means to be visual, annotate visual text and then develop algorithms to automatically classify noun phrases as visual or non-visual. We find that using text alone, we are able to achieve high accuracies at this task, and that incorporating features derived from computer vision algorithms improves performance. Finally, we show that we can reliably mine visual nouns and adjectives from large corpora and that we can use these effectively in the classification task.
@InProceedings{daume12desctext,
author = {Jesse Dodge and Amit Goyal and Xufeng Han and Alyssa Mensch and
Margaret Mitchell and Karl Stratos and Kota Yamaguchi and Yejin
Choi and Hal {Daum\'e III} and Alexander C. Berg and Tamara L.
Berg},
title = {Detecting Visual Text},
booktitle = {North American Chapter of the Association for Computational
Linguistics (NAACL)},
year = {2012},
url = {http://hal3.name/docs/#daume12desctext}
}
Towards a Watson That Sees: Language-Guided Action Recognition for Robots
Ching Lik Teo, Yezhou Yang, Hal Daumé III, Cornelia Fermüller and Yiannis Aloimonos
ICRA, 2012
[Abstract] [BibTeX]
For robots of the future to interact seamlessly with humans, they must be able to reason about their surroundings and take actions that are appropriate to the situation. Such reasoning is only possible when the robot has knowledge of how the World functions, which must either be learned or hardcoded. In this paper, we propose an approach that exploits language as an important resource of high-level knowledge that a robot can use, akin to IBM’s Watson in Jeopardy!. In particular, we show how language can be leveraged to reduce the ambiguity that arises from recognizing actions involving hand-tools from video data. Starting from the premise that tools and actions are intrinsically linked, with one explaining the existence of the other, we trained a language model over a large corpus of English newswire text so that we can extract this relationship directly. This model is then used as a prior to select the best tool and action that explains the video. We formalize the approach in the context of 1) an unsupervised recognition and 2) a supervised classification scenario by an EM formulation for the former and integrating language features for the latter. Results are validated over a new hand-tool action dataset, and comparisons with state of the art STIP features showed significantly improved results when language is used. In addition, we discuss the implications of these results and how it provides a framework for integrating language into vision on other robotic applications.
@inproceedings{daume12watson,
title = {Towards a Watson That Sees: Language-Guided Action Recognition for
Robots},
author = {Ching Lik Teo and Yezhou Yang and Hal {Daum\'e III} and Cornelia
Ferm\"uller and Yiannis Aloimonos},
booktitle = {ICRA},
year = {2012},
url = {http://hal3.name/docs/#daume12watson},
}
Fast Large-Scale Approximate Graph Construction for NLP
Amit Goyal, Hal Daumé III and Raul Guerra
Empirical Methods in Natural Language Processing (EMNLP), 2012
[BibTeX]
@InProceedings{daume12flag,
author = {Amit Goyal and Hal {Daum\'e III} and Raul Guerra},
title = {Fast Large-Scale Approximate Graph Construction for {NLP}},
booktitle = {Empirical Methods in Natural Language Processing (EMNLP)},
year = {2012},
url = {http://hal3.name/docs/#daume12flag}
}
Low-dimensional Discriminative Reranking
Jagadeesh Jagarlamudi and Hal Daumé III
Conference on North American Chapter of the Association for Computational Linguistics, 2012
[Abstract] [BibTeX]
The accuracy of many natural language processing tasks can be improved by a reranking step, which involves selecting a single output from a list of candidate outputs generated by a baseline system. We propose a novel family of reranking algorithms based on learning separate low-dimensional embeddings of the task’s input and output spaces. This embedding is learned in such a way that prediction becomes a low-dimensional nearest-neighbor search, which can be done computationally efficiently. A key quality of our approach is that feature engineering can be done separately on the input and output spaces; the relationship between inputs and outputs is learned automatically. Experiments on part-of-speech tagging task in four languages show significant improvements over a baseline decoder and existing reranking approaches.
@inproceedings{daume12lowdim,
title = {Low-dimensional Discriminative Reranking},
author = {Jagadeesh Jagarlamudi and Hal {Daum\'e III}},
booktitle = {Proceedings of the Conference on North American Chapter of the
Association for Computational Linguistics},
year = {2012},
address = {Montreal, Canada},
url = {http://hal3.name/docs/#daume12lowdim}
}
Besting the quiz master: crowdsourcing incremental classification games
Jordan Boyd-Graber, Brianna Satinoff, He He and Hal Daumé III
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2012
[Abstract] [BibTeX]
Cost-sensitive classification, where the features used in machine learning tasks have a cost, has been explored as a means of balancing knowl- edge against the expense of incrementally ob- taining new features. We introduce a setting where humans engage in classification with incrementally revealed features: the collegiate trivia circuit. By providing the community with a web-based system to practice, we collected tens of thousands of implicit word-by-word ratings of how useful features are for eliciting correct answers. Observing humans’ classifi- cation process, we improve the performance of a state-of-the art classifier. We also use the dataset to evaluate a system to compete in the incremental classification task through a reduc- tion of reinforcement learning to classification. Our system learns when to answer a question, performing better than baselines and most hu- man players.
@inproceedings{daume12quiz,
title = {Besting the quiz master: crowdsourcing incremental classification
games},
author = {Jordan Boyd-Graber and Brianna Satinoff and He He and Hal {Daum\'e
III}},
booktitle = {Proceedings of the Conference on Empirical Methods in Natural
Language Processing (EMNLP)},
year = {2012},
url = {http://hal3.name/docs/#daume12quiz},
}
Understanding and Predicting Importance in Images
Karl Stratos, Aneesh Sood, Alyssa Mensch, Xufeng Han, Margaret Mitchell, Kota Yamaguchi, Jesse Dodge, Amit Goyal, Hal Daumé III, Alexander C. Berg and Tamara L. Berg
Computer Vision and Pattern Recognition (CVPR), 2012
[Abstract] [BibTeX]
What do people care about in an image? To drive computational visual recognition toward more human-centric outputs, we need a better understanding of how people perceive and judge the importance of content in images. In this paper, we explore how a number of factors relate to human perception of importance. Proposed factors fall into 3 broad types: 1) factors related to composition, e.g. size, location, 2) factors related to semantics, e.g. category of object or scene, and 3) contextual factors related to the likelihood of attribute-object, or object-scene pairs. We explore these factors using what people describe as a proxy for importance. Finally, we build models to predict what will be described about an image given either known image content, or image content estimated automatically by recognition systems.
@InProceedings{daume12importance,
author = {Karl Stratos and Aneesh Sood and Alyssa Mensch and Xufeng Han and
Margaret Mitchell and Kota Yamaguchi and Jesse Dodge and Amit
Goyal and Hal {Daum\'e III} and Alexander C. Berg and Tamara L.
Berg},
title = {Understanding and Predicting Importance in Images},
booktitle = {Computer Vision and Pattern Recognition (CVPR)},
year = {2012},
url = {http://hal3.name/docs/#daume12importance}
}
Learned Prioritization for Trading Off Accuracy and Speed
Jiarong Jiang, Adam Teichert, Hal Daumé III and Jason Eisner
Advances in Neural Information Processing Systems (NeurIPS), 2012
[Abstract] [BibTeX]
Users want inference to be both fast and accurate, but quality often comes at the cost of speed. The field has experimented with approximate inference algorithms that make different speed-accuracy tradeoffs (for particular problems and datasets). We aim to explore this space automatically, focusing here on the case of agenda-based syntactic parsing [12]. Unfortunately, off-the-shelf reinforcement learning techniques fail to learn good policies: the state space is simply too large to explore naively. An attempt to counteract this by applying imitation learning algorithms also fails: the “teacher” follows a far better policy than anything in our learner’s policy space, free of the speed-accuracy tradeoff that arises when oracle information is unavailable, and thus largely insensitive to the known reward functfion. We propose a hybrid reinforcement/apprenticeship learning algorithm that learns to speed up an initial policy, trading off accuracy for speed according to various settings of a speed term in the loss function.
@inproceedings{daume12prioritization,
title = {Learned Prioritization for Trading Off Accuracy and Speed},
author = {Jiarong Jiang and Adam Teichert and Hal {Daum\'e III} and Jason
Eisner},
booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
year = {2012},
url = {http://hal3.name/docs/#daume12prioritization},
}
Sketch Algorithms for Estimating Point Queries in NLP
Amit Goyal, Hal Daumé III and Graham Cormode
Empirical Methods in Natural Language Processing (EMNLP), 2012
[BibTeX]
@InProceedings{daume12pointquery,
author = {Amit Goyal and Hal {Daum\'e III} and Graham Cormode},
title = {Sketch Algorithms for Estimating Point Queries in {NLP}},
booktitle = {Empirical Methods in Natural Language Processing (EMNLP)},
year = {2012},
url = {http://hal3.name/docs/#daume12pointquery}
}
Domain Adaptation for Machine Translation by Mining Unseen Words
Hal Daumé III and Jagadeesh Jagarlamudi
Association for Computational Linguistics, 2011
[Abstract] [BibTeX]
We show that unseen words account for a large part of the translation error when moving to new domains. Using an extension of a recent approach to mining translations from comparable corpora (Haghighi et al., 2008), we are able to find translations for otherwise OOV terms. We show several approaches to integrating such translations into a phrasebased translation system, yielding consistent improvements in translations quality (between 0.5 and 1.5 Bleu points) on four domains and two language pairs.
@InProceedings{daume11lexicaladapt,
author = {Hal {Daum\'e III} and Jagadeesh Jagarlamudi},
title = {Domain Adaptation for Machine Translation by Mining Unseen Words},
booktitle = {Association for Computational Linguistics},
year = {2011},
address = {Portland, OR},
url = {http://hal3.name/docs/#daume11lexicaladapt}
}
Generating Semantic Orientation Lexicon using Large Data and Thesaurus
Amit Goyal and Hal Daumé III
ACL Workshop on Computational Approaches to Subjectivity and Sentiment Analysis (WASSA), 2011
[BibTeX]
@InProceedings{daume11wassa,
author = {Amit Goyal and Hal {Daum\'e III}},
title = {Generating Semantic Orientation Lexicon using Large Data and Thesaurus},
booktitle = {Proceedings of ACL Workshop on Computational Approaches to
Subjectivity and Sentiment Analysis (WASSA)},
year = {2011},
address = {Portland, OR},
url = {http://hal3.name/docs/#daume11wassa}
}
A Corpus-Guided Framework for Robotic Visual Perception
Ching L. Teo, Yezhou Yang, Hal Daumé III, Cornelia Fermüller and Yiannis Aloimonos
AAAI Workshop on Language-Action Tools for Cognitive Artificial Agents, 2011
[Abstract] [BibTeX]
We present a framework that produces sentence-level summarizations of videos containing complex human activities that can be implemented as part of the Robot Perception Control Unit (RPCU). This is done via: 1) detection of pertinent objects in the scene: tools and direct-objects, 2) predicting actions guided by a large lexical corpus and 3) generating the most likely sentence description of the video given the detections. We pursue an active object detection approach by focusing on regions of high optical flow. Next, an iterative EM strategy, guided by language, is used to predict the possible actions. Finally, we model the sentence generation process as a HMM optimization problem, combining visual detections and a trained language model to produce a readable description of the video. Experimental results validate our approach and we discuss the implications of our approach to the RPCU in future applications.
@inproceedings{daume11robotic,
title = {A Corpus-Guided Framework for Robotic Visual Perception},
author = {Ching L. Teo and Yezhou Yang and Hal {Daum\'e III} and Cornelia
Ferm\"uller and Yiannis Aloimonos},
booktitle = {AAAI Workshop on Language-Action Tools for Cognitive Artificial
Agents},
year = {2011},
url = {http://hal3.name/docs/#daume11robotic},
}
Approximate Scalable Bounded Space Sketch for Large Data NLP
Amit Goyal and Hal Daumé III
Empirical Methods in Natural Language Processing (EMNLP), 2011
[BibTeX]
@InProceedings{daume11sketch,
author = {Amit Goyal and Hal {Daum\'e III}},
title = {Approximate Scalable Bounded Space Sketch for Large Data {NLP}},
booktitle = {Empirical Methods in Natural Language Processing (EMNLP)},
year = {2011},
address = {Edinburgh, Scotland},
url = {http://hal3.name/docs/#daume11sketch}
}
Lossy Conservative Update (LCU) sketch: Succinct approximate count storage
Amit Goyal and Hal Daumé III
Conference on Artificial Intelligence (AAAI), 2011
[BibTeX]
@InProceedings{daume11lcu,
author = {Amit Goyal and Hal {Daum\'e III}},
title = {Lossy Conservative Update ({LCU}) sketch: Succinct approximate count
storage},
booktitle = {Conference on Artificial Intelligence (AAAI)},
year = {2011},
address = {Portland, OR},
url = {http://hal3.name/docs/#daume11lcu}
}
From Bilingual Dictionaries to Interlingual Document Representations
Jagadeesh Jagarlamudi, Hal Daumé III and Raghavendra Udupa
Association for Computational Linguistics (ACL), 2011
[Abstract] [BibTeX]
Mapping documents into an interlingual representation can help bridge the language barrier of a cross-lingual corpus. Previous approaches use aligned documents as training data to learn an interlingual representation, making them sensitive to the domain of the training data. In this paper, we learn an interlingual representation in an unsupervised manner using only a bilingual dictionary. We first use the bilingual dictionary to find candidate document alignments and then use them to find an interlingual representation. Since the candidate alignments are noisy, we develop a robust learning algorithm to learn the interlingual representation. We show that bilingual dictionaries generalize to different domains better: our approach gives better performance than either a word by word translation method or Canonical Correlation Analysis (CCA) trained on a different domain.
@InProceedings{daume11interlingual,
author = {Jagadeesh Jagarlamudi and Hal {Daum\'e III} and Raghavendra Udupa},
title = {From Bilingual Dictionaries to Interlingual Document Representations},
booktitle = {Association for Computational Linguistics (ACL)},
year = {2011},
address = {Portland, OR},
url = {http://hal3.name/docs/#daume11interlingual}
}
Speed-Accuracy Tradeoffs in Nondeterministic Inference Algorithms
Jason Eisner and Hal Daumé III
COST: NeurIPS 2011 Workshop on Computational Trade-offs in Statistical Learning, 2011
[Abstract] [BibTeX]
Statistical learning has led to great advances in building models that achieve high accuracy. However, test-time inference in these models can be slow, for example in structured prediction problems. This is frequently addressed by using test-time heuristics to guide and prune the search for a good structured output. In this high-level paper, we ask: Could we explicitly train such heuristics to trade off accuracy and efficiency? And how does this relate to existing learning problems?
@InProceedings{daume11tradeoffs,
author = {Jason Eisner and Hal {Daum\'e III}},
title = {Speed-Accuracy Tradeoffs in Nondeterministic Inference Algorithms},
booktitle = {Proceedings of COST: NeurIPS 2011 Workshop on Computational
Trade-offs in Statistical Learning},
year = {2011},
address = {Sierra Nevada, Spain},
url = {http://hal3.name/docs/#daume11tradeoffs}
}
Improving Bilingual Projections via Sparse Covariance Matrices
Jagadeesh Jagarlamudi, Raghavendra Udupa, Hal Daumé III and Abhijit Bhole
Empirical Methods in Natural Language Processing (EMNLP), 2011
[BibTeX]
@InProceedings{daume11sparse,
author = {Jagadeesh Jagarlamudi and Raghavendra Udupa and Hal {Daum\'e III} and
Abhijit Bhole},
title = {Improving Bilingual Projections via Sparse Covariance Matrices},
booktitle = {Empirical Methods in Natural Language Processing (EMNLP)},
year = {2011},
address = {Edinburgh, Scotland},
}
Corpus-Guided Sentence Generation of Natural Images
Yezhou Yang, Ching Lik Teo, Hal Daumé III and Yiannis Aloimonos
Empirical Methods in Natural Language Processing (EMNLP), 2011
[Abstract] [BibTeX]
We propose a sentence generation strategy that describes images by predicting the most likely nouns, verbs, scenes and prepositions that make up the core sentence structure. The input are initial noisy estimates of the objects and scenes detected in the image using state of the art trained detectors. As predicting actions from still images directly is unreliable, we use a language model trained from the English Gigaword corpus to obtain their estimates; together with probabilities of co-located nouns, scenes and prepositions. We use these estimates as parameters on a HMM that models the sentence generation process, with hidden nodes as sentence components and image detections as the emissions. Experimental results show that our strategy of combining vision and language produces readable and descriptive sentences compared to naive strategies that use vision alone.
@InProceedings{daume11generation,
author = {Yezhou Yang and Ching Lik Teo and Hal {Daum\'e III} and Yiannis
Aloimonos},
title = {Corpus-Guided Sentence Generation of Natural Images},
booktitle = {Empirical Methods in Natural Language Processing (EMNLP)},
year = {2011},
address = {Edinburgh, Scotland},
}
Sketch Techniques for Scaling Distributional Similarity to the Web
Amit Goyal, Jagadeesh Jagarlamudi, Hal Daumé III and Suresh Venkatasubramanian
GEometrical Models of Natural Language Semantics Workshop (GEMS) at ACL, 2010
[Abstract] [BibTeX]
In this paper, we propose a memory, space, and time efficient framework to scale distributional similarity to the web. We exploit sketch techniques, especially the Count-Min sketch, which approximates the frequency of an item in the corpus without explicitly storing the item itself. These methods use hashing to deal with massive amounts of the streaming text. We store all item counts computed from 90 GB of web data in just 2 billion counters (8 GB main memory) of CM sketch. Our method returns semantic similarity between word pairs in O(K) time and can compute similarity between any word pairs that are stored in the sketch. In our experiments, we show that our framework is as effective as using the exact counts.
@InProceedings{daume10distsim,
author = {Amit Goyal and Jagadeesh Jagarlamudi and Hal {Daum\'e III} and Suresh
Venkatasubramanian},
title = {Sketch Techniques for Scaling Distributional Similarity to the Web},
booktitle = {GEometrical Models of Natural Language Semantics Workshop (GEMS) at
ACL},
year = {2010},
address = {Uppsala, Sweden},
url = {http://hal3.name/docs/#daume09mrtf}
}
Extracting Multilingual Topics from Unaligned Corpora
Jagadeesh Jagarlamudi and Hal Daumé III
European Conference on Information Retrieval (ECIR), 2010
[Abstract] [BibTeX]
Topic models have been studied extensively in the context of monolingual corpora. Though there are some attempts to mine topical structure from cross-lingual corpora, they require clues about document alignments. In this paper we present a generative model called JointLDA which uses a bilingual dictionary to mine multilingual topics from an unaligned corpus. Experiments conducted on different data sets confirm our conjecture that jointly modeling the cross-lingual corpora offers several advantages compared to individual monolingual models. Since the JointLDA model merges related topics in different languages into a single multilingual topic: a) it can fit the data with relatively fewer topics. b) it has the ability to predict related words from a language different than that of the given document. In fact it has better predictive power compared to the bag-of-word based translation model leaving the possibility for JointLDA to be preferred over bag-of-word model for cross-lingual IR applications. We also found that the monolingual models learnt while optimizing the cross-lingual copora are more effective than the corresponding LDA models.
@InProceedings{daume10multilingual,
author = {Jagadeesh Jagarlamudi and Hal {Daum\'e III}},
title = {Extracting Multilingual Topics from Unaligned Corpora},
booktitle = {Proceedings of the European Conference on Information Retrieval
(ECIR)},
year = {2010},
address = {Milton Keynes, United Kingdom},
url = {http://hal3.name/docs/#daume10multilingual}
}
Sketching Techniques for Large Scale NLP
Amit Goyal, Jagadeesh Jagarlamudi, Hal Daumé III and Suresh Venkatasubramanian
HLT/NAACL Workshop on the Web as a Corpus (WAC), 2010
[Abstract] [BibTeX]
In this paper, we address the challenges posed by large amounts of text data by exploiting the power of hashing in the context of streaming data. We explore sketch techniques, especially the Count- Min Sketch, which approximates the frequency of a word pair in the corpus without explicitly storing the word pairs themselves. We use the idea of a conservative update with the Count-Min Sketch to reduce the average relative error of its approximate counts by a factor of two. We show that it is possible to store all words and word pairs counts computed from 37 GB of web data in just 2 billion counters (8 GB RAM). The number of these counters is up to 30 times less than the stream size which is a big memory and space gain. In Semantic Orientation experiments, the PMI scores computed from 2 billion counters are as effective as exact PMI scores.
@InProceedings{daume10sketch,
author = {Amit Goyal and Jagadeesh Jagarlamudi and Hal {Daum\'e III} and Suresh
Venkatasubramanian},
title = {Sketching Techniques for Large Scale {NLP}},
booktitle = {Proceedings of HLT/NAACL Workshop on the Web as a Corpus (WAC)},
year = {2010},
address = {Los Angeles, CA},
url = {http://hal3.name/docs/#daume10sketch}
}
Kernelized Sorting for Natural Language Processing
Jagadeesh Jagarlamudi, Seth Juarez and Hal Daumé III
Conference on Artificial Intelligence (AAAI), 2010
[Abstract] [BibTeX]
Kernelized sorting is an approach for matching objects from two sources (or domains) that does not require any prior notion of similarity between objects across the two sources. Unfortunately, this technique is highly sensitive to initialization and high dimensional data. We present variants of kernelized sorting to increase its robustness and performance on several Natural Language Processing (NLP) tasks: document matching from parallel and comparable corpora, machine transliteration and even image processing. Empirically we show that, on these tasks, a semi-supervised variant of kernelized sorting outperforms matching canonical correlation analysis.
@InProceedings{daume10sorting,
author = {Jagadeesh Jagarlamudi and Seth Juarez and Hal {Daum\'e III}},
title = {Kernelized Sorting for Natural Language Processing},
booktitle = {Proceedings of the Conference on Artificial Intelligence (AAAI)},
year = {2010},
address = {Atlanta, Georgia},
url = {http://hal3.name/docs/#daume10sorting}
}
Automatically Producing Plot Unit Representations for Narrative Text
Amit Goyal, Ellen Riloff and Hal Daumé III
Empirical Methods in Natural Language Processing (EMNLP), 2010
[BibTeX]
@InProceedings{daume10plotunits-emnlp,
author = {Amit Goyal and Ellen Riloff and Hal {Daum\'e III}},
title = {Automatically Producing Plot Unit Representations for Narrative Text},
booktitle = {Empirical Methods in Natural Language Processing (EMNLP)},
year = {2010},
address = {Boston, MA},
url = {http://hal3.name/docs/#daume10daal}
}
Toward Plot Units: Automatic Affect State Analysis
Amit Goyal, Ellen Riloff, Hal Daumé III and Nathan Gilbert
HLT/NAACL Workshop on Computational Approaches to Analysis and Generation of Emotion in Text (CAET), 2010
[Abstract] [BibTeX]
We present a system called AESOP that automatically produces affect states associated with characters in a story. This research represents a first step toward the automatic generation of plot unit structures from text. AESOP incorporates several existing sentiment analysis tools and lexicons to evaluate the effectiveness of current sentiment technology on this task. AESOP also includes two novel components: a method for acquiring patient polarity verbs, which impart negative affect on their patients, and affect projection rules to propagate affect tags from surrounding words onto the characters in the story. We evaluate AESOP on a small collection of fables.
@InProceedings{daume10plotunits,
author = {Amit Goyal and Ellen Riloff and Hal {Daum\'e III} and Nathan Gilbert},
title = {Toward Plot Units: Automatic Affect State Analysis},
booktitle = {Proceedings of HLT/NAACL Workshop on Computational Approaches to
Analysis and Generation of Emotion in Text (CAET)},
year = {2010},
address = {Los Angeles, CA},
url = {http://hal3.name/docs/#daume10plotunits}
}
Unsupervised Search-based Structured Prediction
Hal Daumé III
International Conference on Machine Learning (ICML), 2009
[Abstract] [BibTeX]
We describe an adaptation and application of a search-based structured prediction algorithm "Searn" to unsupervised learning problems. We show that it is possible to reduce unsupervised learning to supervised learning and demonstrate a high-quality unsupervised shift-reduce parsing model. We additionally show a close connection between unsupervised Searn and expectation maximization. Finally, we demonstrate the efficacy of a semi-supervised extension. The key idea that enables this is an application of the predict-self idea for unsupervised learning.
@InProceedings{daume09unsearn,
author = {Hal {Daum\'e III}},
title = {Unsupervised Search-based Structured Prediction},
booktitle = {International Conference on Machine Learning (ICML)},
year = {2009},
address = {Montreal, Canada},
url = {http://hal3.name/docs/#daume09unsearn}
}
Non-Parametric Bayesian Model Areal Linguistics
Hal Daumé III
North American Chapter of the Association for Computational Linguistics (NAACL), 2009
[Abstract] [BibTeX]
We describe a statistical model over linguistic areas and phylogeny. Our model recovers known areas and identifies a plausible hierarchy of areal features. The use of areas improves genetic reconstruction of languages both qualitatively and quantitatively according to a variety of metrics. We model linguistic areas by a Pitman-Yor process and linguistic phylogeny by Kingman's coalescent.
@InProceedings{daume09areal,
author = {Hal {Daum\'e III}},
title = {Non-Parametric {B}ayesian Model Areal Linguistics},
booktitle = {North American Chapter of the Association for Computational
Linguistics (NAACL)},
year = {2009},
address = {Boulder, CO},
url = {http://hal3.name/docs/#daume09areal}
}
Markov Random Topic Fields
Hal Daumé III
Association for Computational Linguistics (ACL), 2009
[Abstract] [BibTeX]
Most approaches to topic modeling assume an independence between documents that is frequently violated. We present an topic model that makes use of one or more user-specified graphs describing relationships between documents. These graph are encoded in the form of a Markov random field over topics and serve to encourage related documents to have similar topic structures. Experiments on show upwards of a $10\%$ improvement in modeling performance.
@InProceedings{daume09mrtf,
author = {Hal {Daum\'e III}},
title = {Markov Random Topic Fields},
booktitle = {Association for Computational Linguistics (ACL)},
year = {2009},
address = {Singapore},
url = {http://hal3.name/docs/#daume09mrtf}
}
Streaming for Large Scale NLP: Language Modeling
Amit Goyal, Hal Daumé III and Suresh Venkatasubramanian
North American Chapter of the Association for Computational Linguistics (NAACL), 2009
[Abstract] [BibTeX]
In this paper, we explore a streaming algorithm paradigm to handle large amounts of data for NLP problems. We present an efficient low-memory method for constructing high-order approximate n-gram frequency counts. The method is based on a deterministic streaming algorithm which efficiently computes approximate frequency counts over a stream of data while employing a small memory footprint. We show that this method easily scales to billion-word monolingual corpora using a conventional (4 GB RAM) desktop machine. Statistical machine translation experimental results corroborate that the resulting high-n approximate small language model is as effective as models obtained from other count pruning methods.
@InProceedings{daume09streaming,
author = {Amit Goyal and Hal {Daum\'e III} and Suresh Venkatasubramanian},
title = {Streaming for Large Scale {NLP}: Language Modeling},
booktitle = {North American Chapter of the Association for Computational
Linguistics (NAACL)},
year = {2009},
address = {Boulder, CO},
url = {http://hal3.name/docs/#daume09streaming}
}
Semi-supervised or Semi-unsupervised?
Hal Daumé III
Unpublished, 2009
[BibTeX]
@Misc{daume09sslnlp,
author = {Hal {Daum\'e III}},
title = {Semi-supervised or Semi-unsupervised?},
howpublished = {Invited paper: NAACL-HLT Workshop on Semi-supervised Learning in
NLP (SSLNLP)},
year = {2009},
address = {Boulder, CO},
url = {http://hal3.name/docs/#daume09sslnlp}
}
Search-based Structured Prediction
Hal Daumé III, John Langford and Daniel Marcu
Machine Learning Journal (MLJ), 2009
[Abstract] [BibTeX]
We present Searn, an algorithm for integrating search and learning to solve complex structured prediction problems such as those that occur in natural language, speech, computational biology, and vision. Searn is a meta-algorithm that transforms these complex problems into simple classification problems to which any binary classifier may be applied. Unlike current algorithms for structured learning that require decomposition of both the loss function and the feature functions over the predicted structure, Searn is able to learn prediction functions for any loss function and any class of features. Moreover, Searn comes with a strong, natural theoretical guarantee: good performance on the derived classification problems implies good performance on the structured prediction problem.
@article{daume09searn,
author = {Hal {Daum\'e III} and John Langford and Daniel Marcu},
title = {Search-based Structured Prediction},
year = {2009},
booktitle = {Machine Learning Journal (MLJ)},
url = {http://hal3.name/docs/#daume09searn}
}
Unsupervised Part of Speech Tagging Without a Lexicon
Adam R. Teichert and Hal Daumé III
NeurIPS Workshop on Grammar Induction, Representation of Language and Language Learning (GIRLLL), 2009
[BibTeX]
@InProceedings{daume09typpos,
author = {Adam R. Teichert and Hal {Daum\'e III}},
title = {Unsupervised Part of Speech Tagging Without a Lexicon},
booktitle = {NeurIPS Workshop on Grammar Induction, Representation of Language
and Language Learning (GIRLLL)},
year = {2009},
address = {Vancouver, Canada},
url = {http://hal3.name/docs/#daume09typpos}
}
Name Translation in Statistical Machine Translation: Learning When to Transliterate
Ulf Hermjakob, Kevin Knight and Hal Daumé III
Conference of the Association for Computational Linguistics (ACL), 2008
[Abstract] [BibTeX]
We present a method to transliterate names in the framework of end-to-end statistical machine translation. The system is trained to learn when to transliterate. For Ararbic to English MT, we developed and trained a transliterator on a bitext of 7 million sentences and Google's English terabyte ngrams and achieved better name translation accuracy than 3 out of 4 professional translators. The paper also includes a discussion of challenges in name translation evaluation.
@InProceedings{daume08transliterate,
author = {Ulf Hermjakob and Kevin Knight and Hal {Daum\'e III}},
title = {Name Translation in Statistical Machine Translation: Learning When to
Transliterate},
booktitle = {Conference of the Association for Computational Linguistics (ACL)},
year = {2008},
address = {Columbus, OH},
url = {http://hal3.name/docs/#daume08transliterate}
}
Cross-Task Knowledge-Constrained Self Training
Hal Daumé III
Empirical Methods in Natural Language Processing (EMNLP), 2008
[Abstract] [BibTeX]
We present an algorithmic framework for learning multiple related tasks. Our framework exploits a form of prior knowledge that relates the output spaces of these tasks. We present PAC learning results that analyze the conditions under which such learning is possible. We present results on learning a shallow parser and named-entity recognition system that exploits our framework, showing consistent improvements over baseline methods.
@InProceedings{daume08hints,
author = {Hal {Daum\'e III}},
title = {Cross-Task Knowledge-Constrained Self Training},
booktitle = {Empirical Methods in Natural Language Processing (EMNLP)},
year = {2008},
address = {Honolulu, Hawaii},
url = {http://hal3.name/docs/#daume08hints}
}
Structure Compilation: Trading Structure for Features
Percy Liang, Hal Daumé III and Dan Klein
International Conference on Machine Learning (ICML), 2008
[Abstract] [BibTeX]
Structured models often achieve excellent performance but can be slow at test time. We investigate structure compilation, where we replace structure with features, which are often computationally simpler but unfortunately statistically more complex. We analyze this tradeoff theoretically and empirically on three natural language processing tasks. We also introduce a simple method to transfer predictive power from structure to features via unlabeled data, while incurring a minimal statistical penalty.
@InProceedings{daume08flat,
author = {Percy Liang and Hal {Daum\'e III} and Dan Klein},
title = {Structure Compilation: Trading Structure for Features},
booktitle = {International Conference on Machine Learning (ICML)},
year = {2008},
address = {Helsinki, Finland},
url = {http://hal3.name/docs/#daume08flat}
}
Frustratingly Easy Domain Adaptation
Hal Daumé III
Conference of the Association for Computational Linguistics (ACL), 2007
🏆 Test of Time Award Nomination (2017)
[Abstract] [BibTeX]
We describe an approach to domain adaptation that is appropriate exactly in the case when one has enough "target" data to do slightly better than just using only "source" data. Our approach is incredibly simple, easy to implement as a preprocessing step (10 lines of Perl!) and outperforms state-of-the-art approaches on a range of datasets. The technique comes with several simple theoretical guarantees. Moreover, it is trivially extended to a multi-domain adaptation problem, where one has data from a variety of different domains.
@InProceedings{daume07easyadapt,
author = {Hal {Daum\'e III}},
title = {Frustratingly Easy Domain Adaptation},
booktitle = {Conference of the Association for Computational Linguistics (ACL)},
year = {2007},
address = {Prague, Czech Republic},
A Bayesian Model for Discovering Typological Implications
Hal Daumé III and Lyle Campbell
Conference of the Association for Computational Linguistics (ACL), 2007
[Abstract] [BibTeX]
A standard form of analysis for linguistic typology is the universal implication. These implications state facts about the range of extant languages, such as "if objects come after verbs, then adjectives come after nouns." Such implications are typically discovered by painstaking hand analysis over a small sample of languages. We propose a computational model for assisting at this process. Our model is able to discover both well-known implications as well as some novel implications that deserve further study. Moreover, through a careful application of hierarchical analysis, we are able to cope with the well-known sampling problem: languages are not independent.
@InProceedings{daume07implication,
author = {Hal {Daum\'e III} and Lyle Campbell},
title = {A {B}ayesian Model for Discovering Typological Implications},
booktitle = {Conference of the Association for Computational Linguistics (ACL)},
year = {2007},
address = {Prague, Czech Republic},
url = {http://hal3.name/docs/#daume07implication}
}
Practical Structured Learning Techniques for Natural Language Processing
Hal Daumé III
Ph.D. Thesis, 2006
[BibTeX]
@PhdThesis{daume06thesis,
author = {Hal {Daum\'e III}},
title = {Practical Structured Learning Techniques for Natural Language
Processing},
school = {University of Southern California},
year = {2006},
address = {Los Angeles, CA},
month = {August},
url = {http://hal3.name/docs/#daume06thesis}
}
Bayesian Query-Focused Summarization
Hal Daumé III and Daniel Marcu
Conference of the Association for Computational Linguistics (ACL), 2006
[Abstract] [BibTeX]
We present BayeSum (for "Bayesian summarization"), a model for sentence extraction in query-focused summarization. BayeSum leverages the common case in which multiple documents are relevant to a single query. Using these documents as reinforcement for query terms, BayeSum is not afflicted by the paucity of information in short queries. We show that approximate inference in BayeSum is possible on large data sets and results in a state-of-the-art summarization system. Furthermore, we show how BayeSum can be understood as a justified query expansion technique in the language modeling for IR framework.
@InProceedings{daume06bqfs,
author = {Hal {Daum\'e III} and Daniel Marcu},
title = {Bayesian Query-Focused Summarization},
booktitle = {Proceedings of the Conference of the Association for Computational
Linguistics (ACL)},
year = {2006},
address = {Sydney, Australia},
url = {http://hal3.name/docs/#daume06bqfs}
}
Domain Adaptation for Statistical Classifiers
Hal Daumé III and Daniel Marcu
Journal of Artificial Intelligence Research (JAIR), 2006
[Abstract] [BibTeX]
The most basic assumption used in statistical learning theory is that training data and test data are drawn from the same underlying distribution. Unfortunately, in many applications, the "in-domain" test data is drawn from a distribution that is related, but not identical, to the "out-of-domain" distribution of the training data. We consider the common case in which labeled out-of-domain data is plentiful, but labeled in-domain data is scarce. We introduce a statistical formulation of this problem in terms of a simple mixture model and present an instantiation of this framework to maximum entropy classifiers and their linear chain counterparts. We present efficient inference algorithms for this special case based on the technique of conditional expectation maximization. Our experimental results show that our approach leads to improved performance on three real world tasks on four different data sets from the natural language processing domain.
@article{daume06megam,
author = {Hal {Daum\'e III} and Daniel Marcu},
title = {Domain Adaptation for Statistical Classifiers},
journal = {Journal of Artificial Intelligence Research (JAIR)},
year = {2006},
volume = {26},
pages = {101--126},
url = {http://hal3.name/docs/#daume06megam}
}
Searn in Practice
Hal Daumé III, John Langford and Daniel Marcu
Unpublished, 2006
[Abstract] [BibTeX]
We recently introduced an algorithm, Searn, for solving hard structured prediction problems. This algorithm enjoys many nice properties: efficiency, wide applicability, theoretical justification and simplicity. However, under a desire to fit a lot of information into the original paper, it may not be so clear how simple the technique is. This report is designed to showcase how Searn can be applied to a wide variety of techniques and what really goes on behind the scenes. We will make use of three example problems, ranging from simple to complex. These are: (1) sequence labeling, (2) parsing and (3) machine translation. (These were chosen to be as widely understandable, especially in the NLP community, as possible.) In the end, we will come back to discuss Searn for general problems.
@unpublished{daume06searn-practice,
author = {Hal {Daum\'e III} and John Langford and Daniel Marcu},
title = {Searn in Practice},
year = {2006},
url = {http://hal3.name/docs/#daume06searn-practice}
}
Induction of Word and Phrase Alignments for Automatic Document Summarization
Hal Daumé III and Daniel Marcu
Computational Linguistics (CL), 2005
[Abstract] [BibTeX]
Current research in automatic single document summarization is dominated by two effective, yet na\"ive approaches: summarization by sentence extraction, and headline generation via bag-of-words models. While successful in some tasks, neither of these models is able to adequately capture the large set of linguistic devices utilized by humans when they produce summaries. One possible explanation for the widespread use of these models is that good techniques have been developed to extract appropriate training data for them from existing document/abstract and document/headline corpora. We believe that future progress in automatic summarization will be driven both by the development of more sophisticated, linguistically informed models, as well as a more effective leveraging of document/abstract corpora. In order to open the doors to simultaneously achieving both of these goals, we have developed techniques for automatically producing word-to-word and phrase-to-phrase \emphalignments between documents and their human-written abstracts. These alignments make explicit the correspondences that exist in such document/abstract pairs, and create a potentially rich data source from which complex summarization algorithms may learn. This paper describes experiments we have carried out to analyze the ability of \emphhumans to perform such alignments, and based on these analyses, we describe experiments for creating them automatically. Our model for the alignment task is based on an extension of the standard hidden Markov model, and learns to create alignments in a completely unsupervised fashion. We describe our model in detail and present experimental results that show that our model is able to learn to reliably identify word- and phrase-level alignments in a corpus of \docabs\ pairs.
@Article{daume05alignments,
author = {Hal {Daum\'e III} and Daniel Marcu},
title = {Induction of Word and Phrase Alignments for Automatic Document
Summarization},
journal = {Computational Linguistics (CL)},
year = {2005},
month = {December},
volume = {31},
number = {4},
pages = {505--530},
url = {http://hal3.name/docs/#daume05alignments}
}
A Large-Scale Exploration of Effective Global Features for a Joint Entity Detection and Tracking Model
Hal Daumé III and Daniel Marcu
Joint Conference on Human Language Technology and Empirical Methods in Natural Language Processing (HLT/EMNLP), 2005
[Abstract] [BibTeX]
Entity detection and tracking (EDT) is the task of identifying textual mentions of real-world entities in documents, extending the named entity detection and coreference resolution task by considering mentions other than names (pronouns, definite descriptions, etc.). Like NE tagging and coreference resolution, most solutions to the EDT task separate out the mention detection aspect from the coreference aspect. By doing so, these solutions are limited to using only local features for learning. In contrast, by modeling both aspects of the EDT task simultaneously, we are able to learn using highly complex, non-local features. We develop a new joint EDT model and explore the utility of many features, demonstrating their effectiveness on this task.
@InProceedings{daume05coref,
author = {Hal {Daum\'e III} and Daniel Marcu},
title = {A Large-Scale Exploration of Effective Global Features for a Joint
Entity Detection and Tracking Model},
booktitle = {Joint Conference on Human Language Technology and Empirical Methods
in Natural Language Processing (HLT/EMNLP)},
year = {2005},
address = {Vancouver, Canada},
url = {http://hal3.name/docs/#daume05coref}
}
Bayesian Summarization at DUC and a Suggestion for Extrinsic Evaluation
Hal Daumé III and Daniel Marcu
Document Understanding Conference (DUC), 2005
[Abstract] [BibTeX]
We describe our entry into the Document Understanding Conference competition for evaluating query-focused multi-document summarization systems. Our system is based on a Bayesian Query-Focused Summarization model, similar to the system we entered into the MSE competition. This paper begins by describing the (few) differences between our DUC system and our MSE system and describes our placement in the competition. The remainder of this paper argues in favor of performing \emphextrinsic evaluation of summarization systems, and suggests a method for doing so.
@InProceedings{daume05duc,
author = {Hal {Daum\'e III} and Daniel Marcu},
title = {Bayesian Summarization at DUC and a Suggestion for Extrinsic
Evaluation},
booktitle = {Proceedings of the Document Understanding Conference (DUC)},
year = {2005},
address = {Vancouver, B.C., Canada},
month = {October 9--10},
url = {http://hal3.name/docs/#daume05duc}
}
Bayesian Multi-Document Summarization at MSE
Hal Daumé III and Daniel Marcu
Workshop on Multilingual Summarization Evaluation (MSE), 2005
[Abstract] [BibTeX]
We describe our entry into the Multilingual Summarization Evaluation (MSE) competition for evaluating generic multi-document summarization systems, where documents are drawn both from English data and English translations of Arabic data. Our system is based on a Bayesian Query-Focused Summarization model, adapted to the generic, multi-document setting and tuned against the \textscRouge evaluation metric. In the human pyramid-based evaluation, our system scored an average of $0.530$, approximately $8\%$ better than the next best system, which scored $0.489$. In the automatic evaluation, our system scored $0.157$ (behind four other sites) with the skip-bigram evaluation, and $0.131$ (behind two other sites) with the standard bigram evaluation.
@InProceedings{daume05mse,
author = {Hal {Daum\'e III} and Daniel Marcu},
title = {Bayesian Multi-Document Summarization at MSE},
booktitle = {Proceedings of the Workshop on Multilingual Summarization Evaluation
(MSE)},
year = {2005},
address = {Ann Arbor, MI},
month = {June 29},
url = {http://hal3.name/docs/#daume05mse}
}
NP Bracketing by Maximum Entropy Tagging and SVM Reranking
Hal Daumé III and Daniel Marcu
Empirical Methods in Natural Language Processing, 2004
[Abstract] [BibTeX]
We perform Noun Phrase Bracketing by using a local, maximum entropy-based tagging model, which produces bracketing hypotheses. These hypotheses are subsequently fed into a reranking framework based on support vector machines. We solve the problem of hierarchical structure in our tagging model by modeling underspecified tags, which are fully determined only at decoding time. The tagging model performs comparably to competing approaches and the subsequent reranking increases our system's performance from an f-score of $81.7$ to $86.1$, surpassing the best reported results to date of $83.8$.
@InProceedings{daume04bracketing,
author = {Hal {Daum\'e III} and Daniel Marcu},
title = {NP Bracketing by Maximum Entropy Tagging and {SVM} Reranking},
booktitle = {Empirical Methods in Natural Language Processing},
year = {2004},
address = {Barcelona, Spain},
url = {http://hal3.name/docs/#daume04bracketing}
}
A Tree-Position Kernel for Document Compression
Hal Daumé III and Daniel Marcu
Fourth Document Understanding Conference (DUC), 2004
[Abstract] [BibTeX]
We describe our entry into the DUC 2004 automatic document summarization competition. We competed only in the single document, headline generation task. Our system is based on a novel kernel dubbed the tree position kernel, combined with two other well-known kernels. Our system performs well on white-box evaluations, but does very poorly in the overall DUC evaluation. However, the latter results are offset by the fact that baseline systems consistently outperform well engineered systems.
@InProceedings{daume04treeposition,
author = {Hal {Daum\'e III} and Daniel Marcu},
title = {A Tree-Position Kernel for Document Compression},
booktitle = {Proceedings of the Fourth Document Understanding Conference (DUC)},
year = {2004},
address = {Boston, MA},
month = {May 6 -- 7},
url = {http://hal3.name/docs/#daume04treeposition}
}
A Phrase-Based HMM Approach to Document/Abstract Alignment
Hal Daumé III and Daniel Marcu
Empirical Methods in Natural Language Processing (EMNLP), 2004
[Abstract] [BibTeX]
We describe a model for creating word-to-word and phrase-to-phrase alignments between documents and their human written abstracts. Such alignments are critical for the development of statistical summarization systems that can be trained on large corpora of document/abstract pairs. Our model, which is based on a novel Phrase-Based HMM, outperforms both the Cut \& Paste alignment model \citejing:cl and models developed in the context of machine translation \citebrownetal93.
@InProceedings{daume04pbhmm,
author = {Hal {Daum\'e III} and Daniel Marcu},
title = {A Phrase-Based {HMM} Approach to Document/Abstract Alignment},
booktitle = {Empirical Methods in Natural Language Processing (EMNLP)},
year = {2004},
address = {Barcelona, Spain},
url = {http://hal3.name/docs/#daume04pbhmm}
}
Web Search Intent Induction via Automatic Query Reformulation
Hal Daumé III and Eric Brill
North American Chapter of the Association for Computational Linguistics (NAACL), 2004
[Abstract] [BibTeX]
We present a computationally efficient method for automatic grouping of web search results based on reformulating the original query to alternative queries the user may have intended. The method requires no data other than query logs and the standard inverted indices used by most search engines. Our method outperforms standard web search in the task of enabling users to quickly find relevant documents for informational queries.
@InProceedings{daume04intents,
author = {Hal {Daum\'e III} and Eric Brill},
title = {Web Search Intent Induction via Automatic Query Reformulation},
booktitle = {North American Chapter of the Association for Computational
Linguistics (NAACL)},
year = {2004},
address = {Boston, MA},
url = {http://hal3.name/docs/#daume04intents}
}
Generic Sentence Fusion is an Ill-Defined Summarization Task
Hal Daumé III and Daniel Marcu
Text Summarization Branches Out Workshop at ACL (TextSum), 2004
[Abstract] [BibTeX]
We report on a series of human evaluations of the task of sentence fusion. In this task, a human is given two sentences and asked to produce a single coherent sentence that contains only the \emphimportant information from the original two. Thus, this is a highly constrained summarization task. Our investigations show that even at this restricted level, there is no measurable agreement between humans regarding what information should be considered important. We further investigate the ability of separate evaluators to assess summaries, and find similarly disturbing lack of agreement.
@InProceedings{daume04fusion,
author = {Hal {Daum\'e III} and Daniel Marcu},
title = {Generic Sentence Fusion is an Ill-Defined Summarization Task},
booktitle = {Proceedings of the Text Summarization Branches Out Workshop at ACL
(TextSum)},
year = {2004},
address = {Barcelona, Spain},
month = {July 25 -- 26},
url = {http://hal3.name/docs/#daume04fusion}
}
A Noisy-Channel Model for Document Compression
Hal Daumé III and Daniel Marcu
40th Annual Meeting of the Association for Computational Linguistics (ACL), 2002
[Abstract] [BibTeX]
We present a document compression system that uses a hierarchical noisy-channel model of text production. Our compression system first automatically derives the syntactic structure of each sentence and the overall discourse structure of the text given as input. The system then uses a statistical hierarchical model of text production in order to drop non-important syntactic and discourse constituents so as to generate coherent, grammatical document compressions of arbitrary length. The system outperforms both a baseline and a sentence-based compression system that operates by simplifying sequentially all sentences in a text. Our results support the claim that discourse knowledge plays an important role in document summarization.
@InProceedings{daume02noisy,
author = {Hal {Daum\'e III} and Daniel Marcu},
title = {A Noisy-Channel Model for Document Compression},
booktitle = {Proceedings of the 40th Annual Meeting of the Association for
Computational Linguistics (ACL)},
year = {2002},
month = {July 6 -- 12},
address = {Philadelphia, PA},
pages = {449 - 456},
url = {http://hal3.name/docs/#daume02noisy}
}
GLEANS: A Generator of Logical Extracts and Abstracts for Nice Summaries
Hal Daumé III, Abdesammad Echihabi, Daniel Marcu, Dragos Stefan Munteanu and Radu Soricut
Second Document Understanding Conference (DUC), 2002
[Abstract] [BibTeX]
We briefly describe GLEANS, a summarization system that uses four novel techniques for summarizing document collections. (i) GLEANS first maps all documents in a collection into a canonical, database-like representation that makes explicit the main entities and relations in a document collection. (ii) GLEANS also classifies each document collection into one of four categories: collections about a single person, single events, multiple events, and natural disasters. (iii) For each type of document collection, GLEANS also generates from scratch, using predefined templates, the first two sentences in the abstract. (iv) The rest of the summary is then generated by extracting from the database sentences that conform to a set of predefined schemas and by presenting them in an order that reflects coherence constraints specific to each collection category.
@InProceedings{daume02gleans,
author = {Hal {Daum\'e III} and Abdesammad Echihabi and Daniel Marcu and Dragos
Stefan Munteanu and Radu Soricut},
title = {{GLEANS}: A Generator of Logical Extracts and Abstracts for Nice
Summaries},
booktitle = {Proceedings of the Second Document Understanding Conference (DUC)},
year = {2002},
address = {Philadelphia, PA},
month = {July 11 -- 12},
pages = {9 - 14},
url = {http://hal3.name/docs/#daume-gleans}
}
The Importance of Lexicalized Syntax Models for Natural Language Generation Tasks
Hal Daumé III, Kevin Knight, Irene Langkilde-Geary, Daniel Marcu and Kenji Yamada
2002 International Conference on Natural Language Generation (INLG), 2002
[Abstract] [BibTeX]
The parsing community has long recognized the importance of lexicalized models of syntax. By contrast, these models do not appear to have had an impact on the statistical NLG community. To prove their importance in NLG, we show that a lexicalized model of syntax improves the performance of a statistical text compression system, and show results that suggest it would also improve the performances of an MT application and a pure natural language generation system.
@InProceedings{daume02lexicalized,
author = {Hal {Daum\'e III} and Kevin Knight and Irene {Langkilde-Geary} and
Daniel Marcu and Kenji Yamada},
title = {The Importance of Lexicalized Syntax Models for Natural Language
Generation Tasks},
booktitle = {Proceedings of the 2002 International Conference on Natural Language
Generation (INLG)},
year = {2002},
address = {Harriman, NY},
month = {July 1 -- 3},
pages = {9 - 16},
url = {http://hal3.name/docs/#daume-lexicalized}
}
A Phrase-Based HMM
Hal Daumé III
Unpublished, 2002
[BibTeX]
@Unpublished{daume02pbhmm,
author = {Hal {Daum\'e III}},
title = {A Phrase-Based {HMM}},
note = {Available at \url{http://www.isi.edu/~hdaume/docs/daume02pbhmm.ps}},
month = {December},
year = {2002}
}
Integrated Information Management: An Interactive, Extensible Architecture for Information Retrieval
Eric Nyberg and Hal Daumé III
2001 Human Language Technology Conference (HLT), 2001
[Abstract] [BibTeX]
Most current IR research is focused on specific technologies, such as filtering, classification, entity extraction, question answering, etc. There is relatively little research on merging multiple technologies into sophisticated applications, due in part to the high cost of integrating independently-developed text processing modules. In this paper, we present the Integrated Information Management (IIM) architecture for component-based development of IR applications. The IIM architecture is general enough to model different types of IR tasks, beyond indexing and retrieval.
@InProceedings{daume01iim,
author = {Eric Nyberg and Hal {Daum\'e III}},
title = {Integrated Information Management: An Interactive, Extensible
Architecture for Information Retrieval},
booktitle = {Proceedings of the 2001 Human Language Technology Conference (HLT)},
year = {2001},
address = {San Diego, CA},
month = {March 18 -- 21},
url = {http://hal3.name/docs/#daume-iim}
}
Machine Learning
Progressively Efficient Learning
Ruijie Zheng, Khanh Nguyen, Hal Daumé III, Furong Huang and Karthik Narasimhan
Preprint, 2023
[Abstract] [BibTeX]
Assistant AI agents should be capable of rapidly acquiring novel skills and adapting to new user preferences. Traditional frameworks like imitation learning and reinforcement learning do not facilitate this capability because they support only low-level, inefficient forms of communication. In contrast, humans communicate with progressive efficiency by defining and sharing abstract intentions. Reproducing similar capability in AI agents, we develop a novel learning framework named Communication-Efficient Interactive Learning (CEIL). By equipping a learning agent with an abstract, dynamic language and an intrinsic motivation to learn with minimal communication effort, CEIL leads to emergence of a human-like pattern where the learner and the teacher communicate progressively efficiently by exchanging increasingly more abstract intentions. CEIL demonstrates impressive performance and communication efficiency on a 2D MineCraft domain featuring long-horizon decision-making tasks. Agents trained with CEIL quickly master new tasks, outperforming non-hierarchical and hierarchical imitation learning by up to 50% and 20% in absolute success rate, respectively, given the same number of interactions with the teacher. Especially, the framework performs robustly with teachers modeled after human pragmatic communication behavior.
@inproceedings{daume23ceil,
title = {Progressively Efficient Learning},
author = {Ruijie Zheng and Khanh Nguyen and Daum\'e, III, Hal and Furong Huang
and Karthik Narasimhan},
booktitle = {Preprint},
year = {2023},
url = {http://hal3.name/docs/#daume23ceil},
}
Promoting Fairness in Learned Models by Learning to Active Learn under Parity Constraints
Amr Sharaf and Hal Daumé III
FAccT, 2022
[Abstract] [BibTeX]
Machine learning models can have consequential effects when used to automate decisions, and disparities between groups of people in the error rates of those decisions can lead to harms suffered more by some groups than others. Past algorithmic approaches aim to enforce parity across groups given a fixed set of training data; instead, we ask: what if we can gather more data to mitigate disparities? We develop a meta-learning algorithm for parity-constrained active learning that learns a policy to decide which labels to query so as to maximize accuracy subject to parity constraints. To optimize the active learning policy, our proposed algorithm formulates the parity-constrained active learning task as a bi-level optimization problem. The inner level corresponds to training a classifier on a subset of labeled examples. The outer level corresponds to updating the selection policy choosing this subset to achieve a desired fairness and accuracy behavior on the trained classifier. To solve this constrained bi-level optimization problem, we employ the Forward-Backward Splitting optimization method. Empirically, across several parity metrics and classification tasks, our approach outperforms alternatives by a large margin.
@inproceedings{daume22panda,
title = {Promoting Fairness in Learned Models by Learning to Active Learn under
Parity Constraints},
author = {Amr Sharaf and Daum\'e, III, Hal},
booktitle = {FAccT},
year = {2022},
url = {http://hal3.name/docs/#daume22panda},
}
A framework for learning to request rich and contextually useful information from humans
Khanh Nguyen, Yonatan Bisk and Hal Daumé III
International Conference on Machine Learning (ICML), 2022
[Abstract] [BibTeX]
When deployed, AI agents will encounter problems that are beyond their autonomous problem-solving capabilities. Leveraging human assistance can help agents overcome their inherent limitations and robustly cope with unfamiliar situations. We present a general interactive framework that enables an agent to request and interpret rich, contextually useful information from an assistant that has knowledge about the task and the environment. We demonstrate the practicality of our framework on a simulated human-assisted navigation problem. Aided with an assistance-requesting policy learned by our method, a navigation agent achieves up to a 7× improvement in success rate on tasks that take place in previously unseen environments, compared to fully autonomous behavior. We show that the agent can take advantage of different types of information depending on the context, and analyze the benefits and challenges of learning the assistance-requesting policy when the assistant can recursively decompose tasks into subtasks.
@inproceedings{daume22request,
title = {A framework for learning to request rich and contextually useful
information from humans},
author = {Khanh Nguyen and Yonatan Bisk and Daum\'e, III, Hal},
booktitle = {Proceedings of the International Conference on Machine Learning
(ICML)},
year = {2022},
url = {http://hal3.name/docs/#daume22request},
}
Learning When and What to Ask: a Hierarchical Reinforcement Learning Framework
Khanh Nguyen, Yonatan Bisk and Hal Daumé III
International Conference on Machine Learning (ICML), 2022
[Abstract] [BibTeX]
Reliable AI agents should be mindful of the limits of their knowledge and consult humans when sensing that they do not have sufficient knowledge to make sound decisions. We formulate a hierarchical reinforcement learning framework for learning to decide when to request additional information from humans and what type of information would be helpful to request. Our framework extends partially-observed Markov decision processes (POMDPs) by allowing an agent to interact with an assistant to leverage their knowledge in accomplishing tasks. Results on a simulated human-assisted navigation problem demonstrate the effectiveness of our framework: aided with an interaction policy learned by our method, a navigation policy achieves up to a 7x improvement in task success rate compared to performing tasks only by itself. The interaction policy is also efficient: on average, only a quarter of all actions taken during a task execution are requests for information. We analyze benefits and challenges of learning with a hierarchical policy structure and suggest directions for future work.
@inproceedings{daume22ask,
title = {Learning When and What to Ask: a Hierarchical Reinforcement Learning
Framework},
author = {Khanh Nguyen and Yonatan Bisk and Daum\'e, III, Hal},
booktitle = {Proceedings of the International Conference on Machine Learning
(ICML)},
year = {2022},
url = {http://hal3.name/docs/#daume22ask},
}
A Novice-Reviewer Experiment to Address Scarcity of Qualified Reviewers in Large Conferences
Ivan Stelmakh, Nihar B. Shah, Aarti Singh and Hal Daumé III
AAAI, 2021
[Abstract] [BibTeX]
Conference peer review constitutes a human-computation process whose importance cannot be overstated: not only it identifies the best submissions for acceptance, but, ultimately, it impacts the future of the whole research area by promoting some ideas and restraining others. A surge in the number of submissions received by leading AI conferences has challenged the sustainability of the review process by increasing the burden on the pool of qualified reviewers which is growing at a much slower rate. In this work, we consider the problem of reviewer recruiting with a focus on the scarcity of qualified reviewers in large conferences. Specifically, we design a procedure for (i) recruiting reviewers from the population not typically covered by major conferences and (ii) guiding them through the reviewing pipeline. In conjunction with ICML 2020 — a large, top-tier machine learning conference — we recruit a small set of reviewers through our procedure and compare their performance with the general population of ICML reviewers. Our experiment reveals that a combination of the recruiting and guiding mechanisms allows for a principled enhancement of the reviewer pool and results in reviews of superior quality compared to the conventional pool of reviews as evaluated by senior members of the program committee (meta-reviewers).
@inproceedings{daume21novice,
title = {A Novice-Reviewer Experiment to Address Scarcity of Qualified Reviewers
in Large Conferences},
author = {Ivan Stelmakh and Nihar B. Shah and Aarti Singh and Daum\'e, III, Hal},
booktitle = {AAAI},
year = {2021},
url = {http://hal3.name/docs/#daume21novice},
}
Prior and Prejudice: The Novice Reviewers' Bias against Resubmissions in Conference Peer Review
Ivan Stelmakh, Nihar B. Shah, Aarti Singh and Hal Daumé III
CSCW, 2021
[Abstract] [BibTeX]
Modern machine learning and computer science conferences are experiencing a surge in the number of submissions that challenges the quality of peer review as the number of competent reviewers is growing at a much slower rate. To curb this trend and reduce the burden on reviewers, several conferences have started encouraging or even requiring authors to declare the previous submission history of their papers. Such initiatives have been met with skepticism among authors, who raise the concern about a potential bias in reviewers’ recommendations induced by this information. In this work, we investigate whether reviewers exhibit a bias caused by the knowledge that the submission under review was previously rejected at a similar venue, focusing on a population of novice reviewers who constitute a large fraction of the reviewer pool in leading machine learning and computer science conferences. We design and conduct a randomized controlled trial closely replicating the relevant components of the peer-review pipeline with 133 reviewers (master’s, junior PhD students, and recent graduates of top US universities) writing reviews for 19 papers. The analysis reveals that reviewers indeed become negatively biased when they receive a signal about paper being a resubmission, giving almost 1 point lower overall score on a 10-point Likert item (∆ = −0.78, 95\% CI = [−1.30, −0.24]) than reviewers who do not receive such a signal. Looking at specific criteria scores (originality, quality, clarity and significance), we observe that novice reviewers tend to underrate quality the most.
@inproceedings{daume21resubmit,
title = {Prior and Prejudice: The Novice Reviewers' Bias against Resubmissions in
Conference Peer Review},
author = {Ivan Stelmakh and Nihar B. Shah and Aarti Singh and Daum\'e, III, Hal},
booktitle = {CSCW},
year = {2021},
url = {http://hal3.name/docs/#daume21resubmit},
}
Supporting human flourishing by ensuring human involvement in AI-infused systems
Joel Chan, Hal Daumé III, John P. Dickerson, Hernisa Kacorri and Ben Shneiderman
HCAI Workshop at NeurIPS 2021, 2021
[Abstract] [BibTeX]
Researchers, developers, business leaders, policy makers and others are expanding the technology-centered scope of Artificial Intelligence (AI) to include HumanCentered AI (HCAI) ways of thinking. This expansion from an algorithm-focused view to embrace a human-centered perspective, can shape the future of technology so as to better serve human needs. Educators, designers, software engineers, product managers, evaluators, and government agency staffers can build on AIinfused technologies to design products and services that make life better for the users. By switching the scope from technology-centered to human-centered, we can build AI-infused tools that enable people to better care for each other, build sustainable communities, and restore the environment.
@inproceedings{daume21flourishing,
title = {Supporting human flourishing by ensuring human involvement in AI-infused
systems},
author = {Joel Chan and Daum\'e, III, Hal and John P. Dickerson and Hernisa
Kacorri and Ben Shneiderman},
booktitle = {HCAI Workshop at NeurIPS 2021},
year = {2021},
url = {http://hal3.name/docs/#daume21flourishing},
}
Responsible Computing During COVID-19 and Beyond
Solon Barocas, Asia J. Biega, Margarita Boyarskaya, Kate Crawford, Hal Daumé III, Miroslav Dudík, Benjamin Fish, Mary L. Gray, Brent Hecht, Alexandra Olteanu, Forough Poursabzi-Sangdeh, Luke Stark, Jennifer Wortman Vaughan, Hanna Wallach and Marion Zepf
CACM, 2021
[Abstract] [BibTeX]
The COVID-19 pandemic has both created and exacerbated a series of cascading and interrelated crises whose impacts continue to reverberate. From the immediate effects on people's health to the pressures on healthcare systems and mass unemployment, millions of people are suffering. For many of us who work in the digital technology industry, our first impulse may be to devise technological solutions to what we perceive as the most urgent problems when faced by crises such as these. Although the desire to put our expertise to good use is laudable, technological solutions that fail to consider broader social, political, and economic contexts can have unintended consequences, undermining their efficacy and even harming the very communities that they are intended to help.10 To ensure our contributions achieve their intended results without causing inadvertent harm, we must think carefully about which projects we work on, how we should go about working on them, and with whom such work should be done. In this column, we offer a series of guidelines for navigating these choices. As current and former members of the Fairness, Accountability, Transparency, and Ethics (FATE) group at Microsoft Research, we have been working actively on the ethical and societal impacts of technologies such as artificial intelligence since 2016. While we originally developed these guidelines to help our colleagues at Microsoft respond to the first wave of the pandemic in the spring of 2020, we believe they are general enough that their value extends beyond Microsoft and beyond projects focused on the COVID-19 pandemic.
@inproceedings{daume21covid,
title = {Responsible Computing During COVID-19 and Beyond},
author = {Solon Barocas and Asia J. Biega and Margarita Boyarskaya and Kate
Crawford and Hal {Daum\'e III} and Miroslav Dud\'ik and Benjamin
Fish and Mary L. Gray and Brent Hecht and Alexandra Olteanu and
Forough Poursabzi-Sangdeh and Luke Stark and Jennifer Wortman
Vaughan and Hanna Wallach and Marion Zepf},
booktitle = {CACM},
year = {2021},
url = {http://hal3.name/docs/#daume21covid},
}
Help, Anna! Visual Navigation with Natural Multimodal Assistance via Retrospective Curiosity-Encouraging Imitation Learning
Khanh Nguyen and Hal Daumé III
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2019
[Abstract] [BibTeX] [Code/Data]
Mobile agents that can leverage help from humans can potentially accomplish more complex tasks than they could entirely on their own. We develop "Help, Anna!" (HANNA), an interactive photo-realistic simulator in which an agent fulfills object-finding tasks by requesting and interpreting natural languageand-vision assistance. An agent solving tasks in a HANNA environment can leverage simulated human assistants, called ANNA (Automatic Natural Navigation Assistants), which, upon request, provide natural language and visual instructions to direct the agent towards the goals. To address the HANNA problem, we develop a memory-augmented neural agent that hierarchically models multiple levels of decision-making, and an imitation learning algorithm that teaches the agent to avoid repeating past mistakes while simultaneously predicting its own chances of making future progress. Empirically, our approach is able to ask for help more effectively than competitive baselines and, thus, attains higher task success rate on both previously seen and previously unseen environments. We publicly release code and data at this https URL .
@inproceedings{daume19hanna,
title = {Help, Anna! Visual Navigation with Natural Multimodal Assistance via
Retrospective Curiosity-Encouraging Imitation Learning},
author = {Khanh Nguyen and Daum\'e, III, Hal},
booktitle = {Proceedings of the Conference on Empirical Methods in Natural
Language Processing (EMNLP)},
year = {2019},
url = {http://hal3.name/docs/#daume19hanna},
link = {https://github.com/khanhptnk/hanna},
}
Meta-Learning for Contextual Bandit Exploration
Amr Sharaf and Hal Daumé III
arxiv, 2019
[Abstract] [BibTeX] [Code/Data]
We describe MELEE, a meta-learning algorithm for learning a good exploration policy in the interactive contextual bandit setting. Here, an algorithm must take actions based on contexts, and learn based only on a reward signal from the action taken, thereby generating an exploration/exploitation trade-off. MELEE addresses this trade-off by learning a good exploration strategy for offline tasks based on synthetic data, on which it can simulate the contextual bandit setting. Based on these simulations, MELEE uses an imitation learning strategy to learn a good exploration policy that can then be applied to true contextual bandit tasks at test time. We compare MELEE to seven strong baseline contextual bandit algorithms on a set of three hundred real-world datasets, on which it outperforms alternatives in most settings, especially when differences in rewards are large. Finally, we demonstrate the importance of having a rich feature representation for learning how to explore.
@inproceedings{daume19melee,
title = {Meta-Learning for Contextual Bandit Exploration},
author = {Amr Sharaf and Hal {Daum\'e III}},
booktitle = {arxiv},
year = {2019},
link =
{https://www.dropbox.com/sh/dc3v8po5cbu8zaw/AACu1f_4c4wIZxD1e7W0KVZ0a?dl=0},
url = {http://hal3.name/docs/#daume19melee},
}
Reinforcement Learning with Convex Constraints
Sobhan Miryoosefi, Kianté Brantley, Hal Daumé III, Miroslav Dudík and Robert Schapire
NeurIPS, 2019
[Abstract] [BibTeX]
In standard reinforcement learning (RL), a learning agent seeks to optimize the overall reward. However, many key aspects of a desired behavior are more naturally expressed as constraints. For instance, the designer may want to limit the use of unsafe actions, increase the diversity of trajectories to enable exploration, or approximate expert trajectories when rewards are sparse. In this paper, we propose an algorithmic scheme that can handle a wide class of constraints in RL tasks, specifically, any constraints that require expected values of some vector measurements (such as the use of an action) to lie in a convex set. This captures previously studied constraints (such as safety and proximity to an expert), but also enables new classes of constraints (such as diversity). Our approach comes with rigorous theoretical guarantees and only relies on the ability to approximately solve standard RL tasks. As a result, it can be easily adapted to work with any model-free or model-based RL algorithm. In our experiments, we show that it matches previous algorithms that enforce safety via constraints, but can also enforce new properties that these algorithms cannot incorporate, such as diversity.
@inproceedings{daume19convexrl,
title = {Reinforcement Learning with Convex Constraints},
author = {Sobhan Miryoosefi and Kiant\'e Brantley and Hal {Daum\'e III} and
Miroslav Dud\'ik and Robert Schapire},
booktitle = {NeurIPS},
year = {2019},
url = {http://hal3.name/docs/#daume19convexrl},
}
Warm-starting Contextual Bandits: Robustly Combining Supervised and Bandit Feedback
Chicheng Zhang, Alekh Agarwal, Hal Daumé III, John Langford and Sahand N Negahban
ICML, 2019
[Abstract] [BibTeX]
We investigate the feasibility of learning from both fully-labeled supervised data and contextual bandit data. We specifically consider settings in which the underlying learning signal may be different between these two data sources. Theoretically, we state and prove no-regret algorithms for learning that is robust to divergences between the two sources. Empirically, we evaluate some of these algorithms on a large selection of datasets, showing that our approaches are feasible, and helpful in practice.
@inproceedings{daume19awesome,
title = {Warm-starting Contextual Bandits: Robustly Combining Supervised and
Bandit Feedback},
author = {Chicheng Zhang and Alekh Agarwal and Hal {Daum\'e III} and John
Langford and Sahand N Negahban},
booktitle = {ICML},
year = {2019},
url = {http://hal3.name/docs/#daume19awesome},
}
Non-Monotonic Sequential Text Generation
Sean Welleck, Kianté Brantley, Hal Daumé III and Kyunghyun Cho
ICML, 2019
[Abstract] [BibTeX]
Standard sequential generation methods assume a pre-specified generation order, such as text generation methods which generate words from left to right. In this work, we propose a framework for training models of text generation that operate in non-monotonic orders; the model directly learns good orders, without any additional annotation. Our framework operates by generating a word at an arbitrary position, and then recursively generating words to its left and then words to its right, yielding a binary tree. Learning is framed as imitation learning, including a coaching method which moves from imitating an oracle to reinforcing the policy's own preferences. Experimental results demonstrate that using the proposed method, it is possible to learn policies which generate text without pre-specifying a generation order, while achieving competitive performance with conventional left-to-right generation.
@inproceedings{daume19nonmon,
title = {Non-Monotonic Sequential Text Generation},
author = {Sean Welleck and Kiant\'e Brantley and Hal {Daum\'e III} and Kyunghyun
Cho},
booktitle = {ICML},
year = {2019},
url = {http://hal3.name/docs/#daume19nonmon},
}
Improving fairness in machine learning systems: What do industry practitioners need?
Kenneth Holstein, Jennifer Wortman Vaughan, Hal Daumé III, Miroslav Dudík and Hanna Wallach
CHI, 2019
[Abstract] [BibTeX]
The potential for machine learning (ML) systems to amplify social inequities and unfairness is receiving increasing popular and academic attention. A surge of recent work has focused on the development of algorithmic tools to assess and mitigate such unfairness. If these tools are to have a positive impact on industry practice, however, it is crucial that their design be informed by an understanding of real-world needs. Through 35 semi-structured interviews and an anonymous survey of 267 ML practitioners, we conduct the first systematic investigation of commercial product teams' challenges and needs for support in developing fairer ML systems. We identify areas of alignment and disconnect between the challenges faced by industry practitioners and solutions proposed in the fair ML research literature. Based on these findings, we highlight directions for future ML and HCI research that will better address industry practitioners' needs.
@inproceedings{daume19fairness,
title = {Improving fairness in machine learning systems: What do industry
practitioners need?},
author = {Kenneth Holstein and Jennifer Wortman Vaughan and Hal {Daum\'e III} and
Miroslav Dud\'ik and Hanna Wallach},
booktitle = {CHI},
year = {2019},
url = {http://hal3.name/docs/#daume19fairness},
}
NeurIPS 2018 Demographics and Inclusion Survey: Summary of Responses
Hal Daumé III and Katherine Heller
NeurIPS (not a normal paper), 2018
[Abstract] [BibTeX] [Code/Data]
We report the results of a survey conducted from August–October 2018 on demographics & inclusion in the NeurIPS community. At analysis, 2375 people participated; the range of responses is vast. Here, we attempt to capture the key themes, with pointers to where more information can be found. Such a summary runs the risk of ignoring concerns of some members; we encourage all interested to read the full report. The below concerns are listed arbitrarily; there is no implied priority. At the NeurIPS 2018 conference, during the lunch period on Tuesday, there will be a moderated and guided townhall; one goal is to develop action items to improve the level of respect and inclusion at the conference. Thank you to all participants.
@inproceedings{daume18neuripsdi,
title = {NeurIPS 2018 Demographics and Inclusion Survey: Summary of Responses},
author = {Daum\'e, III, Hal and Katherine Heller},
booktitle = {NeurIPS (not a normal paper)},
year = {2018},
link = {https://github.com/hal3/neurips2018survey},
url = {http://hal3.name/docs/#daume18neuripsdi},
}
When Does Machine Learning FAIL? Generalized Transferability for Evasion and Poisoning Attacks
Octavian Suciu, Radu Mărginean, Yiĝitcan Kaya, Hal Daumé III and Tudor Dumitraş
USENIX, 2018
[Abstract] [BibTeX]
Recent results suggest that attacks against supervised machine learning systems are quite effective, while defenses are easily bypassed by new attacks. However, thespecificationsformachinelearningsystemscurrently lack precise adversary definitions, and the existing attacks make diverse, potentially unrealistic assumptions about the strength of the adversary who launches them. We propose the FAIL attacker model, which describes theadversary’sknowledgeandcontrolalongfourdimensions. Our model allows us to consider a wide range of weaker adversaries who have limited control and incomplete knowledge of the features, learning algorithms and training instances utilized. ToevaluatetheutilityoftheFAILmodel,weconsider the problem of conducting targeted poisoning attacks in a realistic setting: the crafted poison samples must have cleanlabels,mustbeindividuallyandcollectivelyinconspicuous, and must exhibit a generalized form of transferability, defined by the FAIL model. By taking these constraints into account, we design StingRay, a targeted poisoningattackthatispracticalagainst4machinelearning applications, which use 3 different learning algorithms, and can bypass 2 existing defenses. Conversely, weshowthatapriorevasionattackislesseffectiveunder generalized transferability. Such attack evaluations, undertheFAILadversarymodel,mayalsosuggestpromising directions for future defenses.
@inproceedings{daume18poisoning,
title = {When Does Machine Learning {FAIL}? Generalized Transferability for
Evasion and Poisoning Attacks},
author = {Octavian Suciu and Radu M\u{a}rginean and Yi\u{g}itcan Kaya and Hal
{Daum\'e III} and Tudor Dumitra\c{s}},
booktitle = {USENIX},
year = {2018},
url = {http://hal3.name/docs/#daume18poisoning},
}
Active Learning for Cost-Sensitive Classification
Akshay Krishnamurthy, Alekh Agarwal, Tzu-Kuo Huang, Hal Daumé III and John Langford
International Conference on Machine Learning (ICML), 2017
[Abstract] [BibTeX]
We design an active learning algorithm for cost-sensitive multiclass classification: problems where different errors have different costs. Our algorithm, COAL, makes predictions by regressing on each label’s cost and predicting the smallest. On a new example, it uses a set of regressors that perform well on past data to estimate possible costs for each label. It queries only the labels that could be the best, ignoring the sure losers. We prove COAL can be efficiently implemented for any regression family that admits squared loss optimization; it also enjoys strong guarantees with respect to predictive performance and labeling effort. We empirically compare COAL to passive learning, showing significant improvements in labeling effort and test cost.
@inproceedings{daume17coal,
title = {Active Learning for Cost-Sensitive Classification},
author = {Akshay Krishnamurthy and Alekh Agarwal and Tzu-Kuo Huang and Hal
{Daum\'e III} and John Langford},
booktitle = {Proceedings of the International Conference on Machine Learning
(ICML)},
year = {2017},
url = {http://hal3.name/docs/#daume17coal},
}
Logarithmic time one-against-some
Hal Daumé III, Nikos Karampatziakis, John Langford and Paul Mineiro
International Conference on Machine Learning (ICML), 2017
[Abstract] [BibTeX]
We create a new online reduction of multiclass classification to binary classification for which training and prediction time scale logarithmically with the number of classes. We show that several simple techniques give rise to an algorithm that can compete with one-against-all in both space and predictive power while offering exponential improvements in speed when the number of classes is large.
@inproceedings{daume17oas,
title = {Logarithmic time one-against-some},
author = {Hal {Daum\'e III} and Nikos Karampatziakis and John Langford and Paul
Mineiro},
booktitle = {Proceedings of the International Conference on Machine Learning
(ICML)},
year = {2017},
url = {http://hal3.name/docs/#daume17oas},
}
Logarithmic Time One-Against-Some
Hal Daumé III, Nikos Karampatziakis, John Langford and Paul Mineiro
ICML, 2016
[Abstract] [BibTeX]
We create a new online reduction of multiclass classification to binary classification for which training and prediction time scale logarithmically with the number of classes. We show that several simple techniques give rise to an algorithm that can compete with one-against-all in both space and predictive power while offering exponential improvements in speed when the number of classes is large.
@inproceedings{daume16recalltree,
title = {Logarithmic Time One-Against-Some},
author = {Hal {Daum\'e III} and Nikos Karampatziakis and John Langford and Paul
Mineiro},
booktitle = {ICML},
year = {2016},
url = {http://hal3.name/docs/#daume16recalltree},
}
A Credit Assignment Compiler for Joint Prediction
Kai-Wei Chang, He He, Stéphane Ross, Hal Daumé III and John Langford
Advances in Neural Information Processing Systems (NeurIPS), 2016
[Abstract] [BibTeX]
Many machine learning applications involve jointly predicting multiple mutually dependent output variables. Learning to search is a family of methods where the complex decision problem is cast into a sequence of decisions via a search space. Although these methods have shown promise both in theory and in practice, implementing them has been burdensomely awkward. In this paper, we show the search space can be defined by an arbitrary imperative program, turning learning to search into a credit assignment compiler. Altogether with the algorithmic improvements for the compiler, we radically reduce the complexity of programming and the running time. We demonstrate the feasibility of our approach on multiple joint prediction tasks. In all cases, we obtain accuracies as high as alternative approaches, at drastically reduced execution and programming time.
@inproceedings{daume16compiler,
title = {A Credit Assignment Compiler for Joint Prediction},
author = {Kai-Wei Chang and He He and St\'ephane Ross and Hal {Daum\'e III} and
John Langford},
booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
year = {2016},
url = {http://hal3.name/docs/#daume16compiler},
}
Opponent Modeling in Deep Reinforcement Learning
He He, Jordan Boyd-Graber, Kevin Kwok and Hal Daumé III
International Conference on Machine Learning (ICML), 2016
[Abstract] [BibTeX]
Opponent modeling is necessary in multi-agent settings where secondary agents with competing goals also adapt their strategies, yet it remains challenging because strategies interact with each other and change. Most previous work focuses on developing probabilistic models or parameterized strategies for specific applications. Inspired by the recent success of deep reinforcement learning, we present neural-based models that jointly learn a policy and the behavior of opponents. Instead of explicitly predicting the opponent’s action, we encode observation of the opponents into a deep Q-Network (DQN); however, we retain explicit modeling (if desired) using multitasking. By using a Mixture-of-Experts architecture, our model automatically discovers different strategy patterns of opponents without extra supervision. We evaluate our models on a simulated soccer game and a popular trivia game, showing superior performance over DQN and its variants.
@inproceedings{daume16opponent,
title = {Opponent Modeling in Deep Reinforcement Learning},
author = {He He and Jordan Boyd-Graber and Kevin Kwok and Hal {Daum\'e III}},
booktitle = {Proceedings of the International Conference on Machine Learning
(ICML)},
year = {2016},
url = {http://hal3.name/docs/#daume16opponent},
}
Learning Reductions that Really Work
Alina Beygelzimer, Hal Daumé III, John Langford and Paul Mineiro
IEEE Proceedings, 2015
[Abstract] [BibTeX]
We provide a summary of the mathematical and computational techniques that have enabled learning reductions to effectively address a wide class of problems, and show that this approach to solving machine learning problems can be broadly useful.
@inproceedings{daume15reductions,
title = {Learning Reductions that Really Work},
author = {Alina Beygelzimer and Hal {Daum\'e III} and John Langford and Paul
Mineiro},
booktitle = {IEEE Proceedings},
year = {2015},
url = {http://hal3.name/docs/#daume15reductions},
}
On Correcting inputs: Inverse Optimization for Online Structured Prediction
Hal Daumé III, Samir Khuller, Manish Purohit and Gregory Sanders
FSTTCS, 2015
[Abstract] [BibTeX]
Algorithm designers typically assume that the input data is correct, and then proceed to find “optimal” or “sub-optimal” solutions using this input data. However this assumption of correct data does not always hold in practice, especially in the context of online learning systems where the objective is to learn appropriate feature weights given some training samples. Such scenarios necessitate the study of inverse optimization problems where one is given an input instance as well as a desired output and the task is to adjust the input data so that the given output is indeed optimal. Motivated by learning structured prediction models, in this paper we consider inverse optimization with a margin, i.e., we require the given output to be better than all other feasible outputs by a desired margin. We consider such inverse optimization problems for maximum weight matroid basis, matroid intersection, perfect matchings, minimum cost maximum flows, and shortest paths and derive the first known results for such problems with a non-zero margin. The effectiveness of these algorithmic approaches to online learning for structured prediction is also discussed.
@inproceedings{daume15inverse,
title = {On Correcting inputs: Inverse Optimization for Online Structured
Prediction},
author = {Hal {Daum\'e III} and Samir Khuller and Manish Purohit and Gregory
Sanders},
booktitle = {FSTTCS},
year = {2015},
url = {http://hal3.name/docs/#daume15daume15inverse},
}
Deep unordered composition rivals syntactic methods for text classification
Mohit Iyyer, Varun Manjunatha, Jordan Boyd-Graber and Hal Daumé III
Conference of the Association for Computational Linguistics (ACL), 2015
[Abstract] [BibTeX]
Many existing deep learning models for natural language processing tasks focus on learning the compositionality of their in- puts, which requires many expensive com- putations. We present a simple deep neural network that competes with and, in some cases, outperforms such models on sen- timent analysis and factoid question an- swering tasks while taking only a fraction of the training time. While our model is syntactically-ignorant, we show significant improvements over previous bag-of-words models by deepening our network and ap- plying a novel variant of dropout. More- over, our model performs better than syn- tactic models on datasets with high syn- tactic variance. We show that our model makes similar errors to syntactically-aware models, indicating that for the tasks we con- sider, nonlinearly transforming the input is more important than tailoring a network to incorporate word order and syntax.
@inproceedings{daume15dan,
title = {Deep unordered composition rivals syntactic methods for text
classification},
author = {Mohit Iyyer and Varun Manjunatha and Jordan Boyd-Graber and Hal
{Daum\'e III}},
booktitle = {Proceedings of the Conference of the Association for Computational
Linguistics (ACL)},
year = {2015},
url = {http://hal3.name/docs/#daume15dan},
}
Learning to search better than your teacher
Kai-Wei Chang, Akshay Krishnamurthy, Alekh Agarwal, Hal Daumé III and John Langford
International Conference on Machine Learning (ICML), 2015
[Abstract] [BibTeX]
Methods for learning to search for structured prediction typically imitate a reference policy, with existing theoretical guarantees demonstrating low regret compared to that reference. This is unsatisfactory in many applications where the reference policy is suboptimal and the goal of learning is to improve upon it. Can learning to search work even when the reference is poor? We provide a new learning to search algorithm, LOLS, which does well relative to the reference policy, but additionally guarantees low regret compared to deviations from the learned policy: a local-optimality guarantee. Consequently, LOLS can improve upon the reference policy, unlike previous algorithms. This enables us to develop structured contextual bandits, a partial information structured prediction setting with many potential applications.
@inproceedings{daume15lols,
title = {Learning to search better than your teacher},
author = {Kai-Wei Chang and Akshay Krishnamurthy and Alekh Agarwal and Hal
{Daum\'e III} and John Langford},
booktitle = {Proceedings of the International Conference on Machine Learning
(ICML)},
year = {2015},
url = {http://hal3.name/docs/#daume15lols},
}
Learning to search in branch and bound algorithms
He He, Hal Daumé III and Jason M. Eisner
NeurIPS, 2014
[Abstract] [BibTeX]
Branch-and-bound is a widely used method in combinatorial optimization, including mixed integer programming, structured prediction and MAP inference. While most work has been focused on developing problem-specific techniques, little is known about how to systematically design the node searching strategy on a branch-and-bound tree. We address the key challenge of learning an adaptive node searching order for any class of problem solvable by branch-and-bound. Our strategies are learned by imitation learning. We apply our algorithm to linear programming based branch-and-bound for solving mixed integer programs (MIP). We compare our method with one of the fastest open-source solvers, SCIP; and a very efficient commercial solver, Gurobi. We demonstrate that our approach achieves better solutions faster on four MIP libraries. Branch-and-bound is a widely used method in combinatorial optimization, including mixed integer programming, structured prediction and MAP inference. While most work has been focused on developing problem-specific techniques, little is known about how to systematically design the node searching strategy on a branch-and-bound tree. We address the key challenge of learning an adaptive node searching order for any class of problem solvable by branch-and-bound. Our strategies are learned by imitation learning. We apply our algorithm to linear programming based branch-and-bound for solving mixed integer programs (MIP). We compare our method with one of the fastest open-source solvers, SCIP; and a very efficient commercial solver, Gurobi. We demonstrate that our approach achieves better solutions faster on four MIP libraries.
@inproceedings{daume14ltsbb,
title = {Learning to search in branch and bound algorithms},
author = {He He and Hal {Daum\'e III} and Jason M. Eisner},
booktitle = {NeurIPS},
year = {2014},
url = {http://hal3.name/docs/#daume14ltsbb},
}
Efficient programmable learning to search
Kai-Wei Chang, Hal Daumé III, John Langford and Stéphane Ross
NeurIPS, 2014
[Abstract] [BibTeX]
We improve "learning to search" approaches to structured prediction in two ways. First, we show that the search space can be defined by an arbitrary imperative program, reducing the number of lines of code required to develop new structured prediction tasks by orders of magnitude. Second, we make structured prediction orders of magnitude faster through various algorithmic improvements.
@inproceedings{daume14lts,
title = {Efficient programmable learning to search},
author = {Kai-Wei Chang and Hal {Daum\'e III} and John Langford and St\'ephane
Ross},
booktitle = {NeurIPS},
year = {2014},
url = {http://hal3.name/docs/#daume14lts},
}
Learning Latent Engagement Patterns of Students in Online Courses
Arti Ramesh, Dan Goldwasser, Bert Huang, Hal Daumé III and Lise Getoor
National Conference on Artificial Intelligence (AAAI), 2014
[Abstract] [BibTeX]
Maintaining and cultivating student engagement is critical for learning. Understanding factors affecting student engagement will help in designing better courses and improving student retention. The large number of participants in massive open online courses (MOOCs) and data collected from their interaction with the MOOC open up avenues for studying student engagement at scale. In this work, we develop a framework for modeling and understanding student engagement in online courses based on student behavioral cues. Our first contribution is the abstraction of student engagement using latent representations. We use that abstraction in a probabilistic model to connect student behavior with course completion. We demonstrate that the latent formulation for engagement helps in predicting student survival across three MOOCs. Next, in order to initiate better instructor interventions, we need to be able to predict student survival early in the course. We demonstrate that we can predict student survival early in the course reliably using the latent model. Finally, we perform a closer quantitative analysis of user interaction with the MOOC and identify student activities that are good indicators for survival at different points in the course.
@inproceedings{daume14mooclearner,
title = {Learning Latent Engagement Patterns of Students in Online Courses},
author = {Arti Ramesh and Dan Goldwasser and Bert Huang and Hal {Daum\'e III} and
Lise Getoor},
booktitle = {Proceedings of the National Conference on Artificial Intelligence
(AAAI)},
year = {2014},
url = {http://hal3.name/docs/#daume14mooclearner},
}
Uncovering Hidden Engagement Patterns for Predicting Learner Performance in MOOCs
Arti Ramesh, Dan Goldwasser, Bert Huang, Hal Daumé III and Lise Getoor
Learning at Scale, 2014
[Abstract] [BibTeX]
Maintaining and cultivating student engagement is a prerequisite for MOOCs to have broad educational impact. Understanding student engagement as a course progresses helps characterize student learning patterns and can aid in minimizing dropout rates, initiating instructor intervention. In this paper, we construct a probabilistic model connecting student behavior and class performance, formulating student engagement types as latent variables. We show that our model identifies course success indicators that can be used by instructors to initiate interventions and assist students.
@inproceedings{daume14moocengagement,
title = {Uncovering Hidden Engagement Patterns for Predicting Learner Performance
in MOOCs},
author = {Arti Ramesh and Dan Goldwasser and Bert Huang and Hal {Daum\'e III} and
Lise Getoor},
booktitle = {Learning at Scale},
year = {2014},
url = {http://hal3.name/docs/#daume14moocengagement},
}
A Neural Network for Factoid Question Answering over Paragraphs
Mohit Iyyer, Jordan Boyd-Graber, Leonardo Claudino, Richard Socher and Hal Daumé III
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2014
[Abstract] [BibTeX]
Text classification methods for tasks like factoid question answering typically use manually defined string matching rules or bag of words representations. These methods are ineffective when question text contains very few individual words (e.g., named entities) that are indicative of the answer. We introduce a recursive neural network (RNN) model that can reason over such input by modeling textual compositionality. We apply our model, QANTA, to a dataset of questions from a trivia competition called quiz bowl. Unlike previous RNN models, QANTA learns word and phrase-level representations that combine across sentences to reason about entities. The model outperforms multiple baselines and, when combined with information retrieval methods, rivals the best human players.
@inproceedings{daume14deepqa,
title = {A Neural Network for Factoid Question Answering over Paragraphs},
author = {Mohit Iyyer and Jordan Boyd-Graber and Leonardo Claudino and Richard
Socher and Hal {Daum\'e III}},
booktitle = {Proceedings of the Conference on Empirical Methods in Natural
Language Processing (EMNLP)},
year = {2014},
url = {http://hal3.name/docs/#daume14deepqa},
}
Binary to Bushy: Bayesian Hierarchical Clustering with the Beta Coalescent
Yuening Hu, Jordan Boyd-Graber, Hal Daumé III and Z. Irene Ying
Advances in Neural Information Processing Systems (NeurIPS), 2013
[Abstract] [BibTeX]
Discovering hierarchical regularities in data is a key problem in interacting with large datasets, modeling cognition, and encoding knowledge. A previous Bayesian solution—Kingman’s coalescent—provides a probabilistic model for data represented as a binary tree. Unfortunately, this is inappropriate for data better described by bushier trees. We generalize an existing belief propagation framework of Kingman’s coalescent to the beta coalescent, which models a wider range of tree structures. Because of the complex combinatorial search over possible structures, we develop new sampling schemes using sequential Monte Carlo and Dirichlet process mixture models, which render inference efficient and tractable. We present results on synthetic and real data that show the beta coalescent outperforms Kingman’s coalescent and is qualitatively better at capturing data in bushy hierarchies.
@inproceedings{daume13bushy,
title = {Binary to Bushy: Bayesian Hierarchical Clustering with the Beta
Coalescent},
author = {Yuening Hu and Jordan Boyd-Graber and Hal {Daum\'e III} and Z. Irene
Ying},
booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
year = {2013},
url = {http://hal3.name/docs/#daume13bushy},
}
Discriminatively Enhanced Topic Models
Snigdha Chaturvedi, Hal Daumé III and Taesun Moon
International Conference on Data Mining (ICDM), 2013
[BibTeX]
@inproceedings{daume13detm,
title = {Discriminatively Enhanced Topic Models},
author = {Snigdha Chaturvedi and Hal {Daum\'e III} and Taesun Moon},
booktitle = {International Conference on Data Mining (ICDM)},
year = {2013},
url = {http://hal3.name/docs/#daume13detm},
}
Prioritized Asynchronous Belief Propagation
Jiarong Jiang, Taesun Moon, Hal Daumé III and Jason Eisner
ICML Workshop on Inferning, 2013
[Abstract] [BibTeX]
Message scheduling is shown to be very effective in belief propagation (BP) algorithms. However, most existing scheduling algorithms use fixed heuristics regardless of the structure of the graphs or properties of the distribution. On the other hand, designing different scheduling heuristics for all graph structures are not feasible. In this paper, we propose a reinforcement learning based message scheduling framework (RLBP) to learn the heuristics automatically which generalizes to any graph structures and distributions. In the experiments, we show that the learned problem-specific heuristics largely outperform other baselines in speed.
@inproceedings{daume13pabp,
title = {Prioritized Asynchronous Belief Propagation},
author = {Jiarong Jiang and Taesun Moon and Hal {Daum\'e III} and Jason Eisner},
booktitle = {ICML Workshop on Inferning},
year = {2013},
url = {http://hal3.name/docs/#daume13pabp},
}
Predicting Dialogue Outcomes over Structured Latent Representations
Dan Goldwasser and Hal Daumé III
NeurIPS Workshop on Output Representation Learning, 2013
[BibTeX]
@inproceedings{daume13dialogoutcomes,
title = {Predicting Dialogue Outcomes over Structured Latent Representations},
author = {Dan Goldwasser and Hal {Daum\'e III}},
booktitle = {NeurIPS Workshop on Output Representation Learning},
year = {2013},
url = {http://hal3.name/docs/#daume13dialogoutcomes},
}
A Topical Graph Kernel for Link Prediction in Labeled Graphs
Snigdha Chaturvedi, Hal Daumé III, Taesun Moon and Shashank Srivastava
ICML workshop on Mining and Learning with Graphs (MLG), 2013
[Abstract] [BibTeX]
This paper proposes a solution to the problem of link prediction in labeled graphs with additional text information associated with the nodes. By fitting a topic model on the text corpus and some processing, we compute the topics of interest to a node. We propose a walk based graph kernel which incorporates the node’s interest and thus represents structural as well as textual information. We then make predictions about the existence of unseen links using a kernelized SVM. Our experiments with an author citation network shows that our method is effective and significantly outperforms a network-oriented approach.
@inproceedings{daume13graphkernel,
title = {A Topical Graph Kernel for Link Prediction in Labeled Graphs},
author = {Snigdha Chaturvedi and Hal {Daum\'e III} and Taesun Moon and Shashank
Srivastava},
booktitle = {ICML workshop on Mining and Learning with Graphs (MLG)},
year = {2013},
url = {http://hal3.name/docs/#daume13graphkernel},
}
Kernel Regression for Head-Related Transfer Function Interpolation and Spectral Extrema Extraction
Yuancheng Luo, Dmitry N. Zotkin, Hal Daumé III and Ramani Duraiswami
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2013
[Abstract] [BibTeX]
Head-Related Transfer Function (HRTF) representation and interpolation is an important problem in spatial audio. We present a kernel regression method based on Gaussian process (GP) modeling of the joint spatial-frequency relationship between HRTF measurements and obtain a smooth non-linear representation based on data measured over both arbitrary and structured spherical measurement grids. This representation is further extended to the problem of extracting spectral extrema (notches and peaks). We perform HRTF interpolation and spectral extrema extraction using freely available CIPIC HRTF data. Experimental results are shown.
@inproceedings{daume13hrtf,
title = {Kernel Regression for Head-Related Transfer Function Interpolation and
Spectral Extrema Extraction},
author = {Yuancheng Luo and Dmitry N. Zotkin and Hal {Daum\'e III} and Ramani
Duraiswami},
booktitle = {Proceedings of the IEEE International Conference on Acoustics,
Speech and Signal Processing (ICASSP)},
year = {2013},
url = {http://hal3.name/docs/#daume13hrtf},
}
Predictable Dual-View Hashing
Mohammad Rastegari, Jonghyun Choi, Shobeir Fakhraei, Hal Daumé III and Larry S. Davis
International Conference on Machine Learning (ICML), 2013
[Abstract] [BibTeX]
We propose a Predictable Dual-View Hashing (PDH) algorithm which embeds proximity of data samples in the original spaces. We create a cross-view hamming space with the ability to compare information from previously incomparable domains with a notion of ‘predictability’. By performing comparative experimental analysis on two large datasets, PASCAL-Sentence and SUN-Attribute, we demonstrate the superiority of our method to the state-of-the-art dual-view binary code learning algorithms.
@inproceedings{daume13dvh,
title = {Predictable Dual-View Hashing},
author = {Mohammad Rastegari and Jonghyun Choi and Shobeir Fakhraei and Hal
{Daum\'{e} III} and Larry S. Davis},
booktitle = {Proceedings of the International Conference on Machine Learning
(ICML)},
year = {2013},
url = {http://hal3.name/docs/#daume13dvh},
}
Dynamic Feature Selection for Dependency Parsing
He He, Hal Daumé III and Jason Eisner
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2013
[Abstract] [BibTeX] [Slides] [Video]
Feature computation and exhaustive search have significantly restricted the speed of graph-based dependency parsing. We propose a faster framework of dynamic feature selection, where features are added sequentially as needed, edges are pruned early, and decisions are made online for each sentence. We model this as a sequential decision-making problem and solve it by imitation learning techniques. We test our method on 7 languages. Our dynamic parser can achieve accuracies comparable or even superior to parsers using a full set of features, while computing fewer than 30% of the feature templates.
@inproceedings{daume13depfeat,
title = {Dynamic Feature Selection for Dependency Parsing},
author = {He He and Hal {Daum\'e III} and Jason Eisner},
booktitle = {Proceedings of the Conference on Empirical Methods in Natural
Language Processing (EMNLP)},
year = {2013},
url = {http://hal3.name/docs/#daume13depfeat},
}
Flexible Modeling of Latent Task Structures in Multitask Learning
Alexandre Passos, Piyush Rai, Jacques Wainer and Hal Daumé III
International Conference on Machine Learning (ICML), 2012
[Abstract] [BibTeX]
Multitask learning algorithms are typically designed assuming some fixed, a priori known latent structure shared by all the tasks. However, it is usually unclear what type of latent task structure is the most appropriate for a given multitask learning problem. Ideally, the "right" latent task structure should be learned in a data-driven manner. We present a flexible, nonparametric Bayesian model that posits a mixture of factor analyzers structure on the tasks. The nonparametric aspect makes the model expressive enough to subsume many existing models of latent task structures (e.g, meanregularized tasks, clustered tasks, low-rank or linear/non-linear subspace assumption on tasks, etc.). Moreover, it can also learn more general task structures, addressing the shortcomings of such models. We present a variational inference algorithm for our model. Experimental results on synthetic and realworld datasets, on both regression and classification problems, demonstrate the effectiveness of the proposed method.
@InProceedings{daume12flexiblemtl,
author = {Alexandre Passos and Piyush Rai and Jacques Wainer and Hal {Daum\'e
III}},
title = {Flexible Modeling of Latent Task Structures in Multitask Learning},
booktitle = {International Conference on Machine Learning (ICML)},
year = {2012},
address = {Edinburgh, Scotland},
url = {http://hal3.name/docs/#daume12flexiblemtl}
}
Protocols for Learning Classifiers on Distributed Data
Hal Daumé III, Jeff Phillips, Avishek Saha and Suresh Venkatasubramanian
Workshop on Artificial Intelligence and Statistics (AI-Stats), 2012
[Abstract] [BibTeX]
We consider the problem of learning classifiers for labeled data that has been distributed across several nodes. Our goal is to find a single classifier, with small approximation error, across all datasets while minimizing the communication between nodes. This setting models real-world communication bottlenecks in the processing of massive distributed datasets. We present several very general sampling-based solutions as well as two-way protocols which have a provable exponential speed-up over any one-way protocol. We focus on core problems for noise-less data distributed across two or more nodes. The techniques we introduce are reminiscent of active learning, but rather than actively probing labels, nodes actively communicate with each other, each node simultaneously learning important data from another node.
@inproceedings{daume12protocols,
title = {Protocols for Learning Classifiers on Distributed Data},
author = {Hal {Daum\'e III} and Jeff Phillips and Avishek Saha and Suresh
Venkatasubramanian},
booktitle = {Proceedings of the Workshop on Artificial Intelligence and
Statistics (AI-Stats)},
year = {2012},
url = {http://hal3.name/docs/#daume12protocols},
}
Simultaneously Leveraging Output and Task Structures for Multiple-Output Regression
Piyush Rai, Abhishek Kumar and Hal Daumé III
Advances in Neural Information Processing Systems (NeurIPS), 2012
[Abstract] [BibTeX]
Multiple-output regression models require estimating multiple parameters, one for each output. Structural regularization is usually employed to improve parameter estimation in such models. In this paper, we present a multiple-output regression model that leverages the covariance structure of the latent model parameters as well as the conditional covariance structure of the observed outputs. This is in contrast with existing methods that usually take into account only one of these structures. More importantly, unlike some of the other existing methods, none of these structures need be known a priori in our model, and are learned from the data. Several previously proposed structural regularization based multiple-output regression models turn out to be special cases of our model. Moreover, in addition to being a rich model for multiple-output regression, our model can also be used in estimating the graphical model structure of a set of variables (multivariate outputs) conditioned on another set of variables (inputs). Experimental results on both synthetic and real datasets demonstrate the effectiveness of our method.
@inproceedings{daume12tasks,
title = {Simultaneously Leveraging Output and Task Structures for Multiple-Output
Regression},
author = {Piyush Rai and Abhishek Kumar and Hal {Daum\'e III}},
booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
year = {2012},
url = {http://hal3.name/docs/#daume12tasks},
}
Efficient Protocols for Distributed Classification and Optimization
Hal Daumé III, Jeff M. Phillips, Avishek Saha and Suresh Venkatasubramanian
ALT, 2012
[Abstract] [BibTeX]
In distributed learning, the goal is to perform a learning task over data distributed across multiple nodes with minimal (expensive) communication. Prior work (Daume III et al., 2012) proposes a general model that bounds the communication required for learning classifiers while allowing for $\eps$ training error on linearly separable data adversarially distributed across nodes. In this work, we develop key improvements and extensions to this basic model. Our first result is a two-party multiplicative-weight-update based protocol that uses $O(d^2 \log1/\eps)$ words of communication to classify distributed data in arbitrary dimension d, $\eps$-optimally. This readily extends to classification over k nodes with $O(kd^2 \log1/\eps)$ words of communication. Our proposed protocol is simple to implement and is considerably more efficient than baselines compared, as demonstrated by our empirical results. In addition, we illustrate general algorithm design paradigms for doing efficient learning over distributed data. We show how to solve fixed-dimensional and high dimensional linear programming efficiently in a distributed setting where constraints may be distributed across nodes. Since many learning problems can be viewed as convex optimization problems where constraints are generated by individual points, this models many typical distributed learning scenarios. Our techniques make use of a novel connection from multipass streaming, as well as adapting the multiplicative-weight-update framework more generally to a distributed setting. As a consequence, our methods extend to the wide range of problems solvable using these techniques.
@inproceedings{daume12distributed,
title = {Efficient Protocols for Distributed Classification and Optimization},
author = {Hal {Daum\'e III} and Jeff M. Phillips and Avishek Saha and Suresh
Venkatasubramanian},
booktitle = {ALT},
year = {2012},
url = {http://hal3.name/docs/#daume12distributed},
}
Cost-sensitive Dynamic Feature Selection
He He, Hal Daumé III and Jason Eisner
ICML 2012 Workshop on Interactions between Inference and Learning (Inferning), 2012
[Abstract] [BibTeX]
We present an instance-specific dynamic feature selection algorithm at test time, which sequentially chooses features given values of already selected features and stops to make a prediction according to a user-specified speed-accuracy trade-off. We apply imitation learning techniques to address the problem of learning and inference jointly in a simple multiclass classification setting. Our feature selection method treats the given solver (e.g. a classifier trained with a full set of features) as a black box and does not have any constraint on it. Experimental results show that using a dynamic instance-specific feature set can significantly improve accuracy at a low cost.
@inproceedings{daume12dynafea,
title = {Cost-sensitive Dynamic Feature Selection},
author = {He He and Hal {Daum\'e III} and Jason Eisner},
booktitle = {ICML 2012 Workshop on Interactions between Inference and Learning
(Inferning)},
year = {2012},
address = {Edinburgh, Scotland},
url = {http://hal3.name/docs/#daume12dynafeat}
}
A Binary Classification Framework for Two-Stage Multiple Kernel Learning
Abhishek Kumar, Alexandru Niculescu-Mizil, Koray Kavukcuoglu and Hal Daumé III
International Conference on Machine Learning (ICML), 2012
[Abstract] [BibTeX]
With the advent of kernel methods, automating the task of specifying a suitable kernel has become increasingly important. In this context, the Multiple Kernel Learning (MKL) problem of finding a combination of prespecified base kernels that is suitable for the task at hand has received significant attention from researchers. In this paper we show that Multiple Kernel Learning can be framed as a standard binary classification problem with additional constraints that ensure the positive definiteness of the learned kernel. Framing MKL in this way has the distinct advantage that it makes it easy to leverage the extensive research in binary classification to develop better performing and more scalable MKL algorithms that are conceptually simpler, and, arguably, more accessible to practitioners. Experiments on nine data sets from different domains show that, despite its simplicity, the proposed technique compares favorably with current leading MKL approaches.
@InProceedings{daume12binarymkl,
author = {Abhishek Kumar and Alexandru Niculescu-Mizil and Koray Kavukcuoglu and
Hal {Daum\'e III}},
title = {A Binary Classification Framework for Two-Stage Multiple Kernel
Learning},
booktitle = {International Conference on Machine Learning (ICML)},
year = {2012},
url = {http://hal3.name/docs/#daume12binarymkl}
}
Learning Task Grouping and Overlap in Multi-task Learning
Abhishek Kumar and Hal Daumé III
International Conference on Machine Learning (ICML), 2012
[Abstract] [BibTeX]
In the paradigm of multi-task learning, multiple related prediction tasks are learned jointly, sharing information across the tasks. We propose a framework for multi-task learning that enables one to selectively share the information across the tasks. We assume that each task parameter vector is a linear combination of a finite number of underlying basis tasks. The coefficients of the linear combination are sparse in nature and the overlap in the sparsity patterns of two tasks controls the amount of sharing across these. Our model is based on the assumption that task parameters within a group lie in a low dimensional subspace but allows the tasks in different groups to overlap with each other in one or more bases. Experimental results on four datasets show that our approach outperforms competing methods.
@InProceedings{daume12gomtl,
author = {Abhishek Kumar and Hal {Daum\'e III}},
title = {Learning Task Grouping and Overlap in Multi-task Learning},
booktitle = {International Conference on Machine Learning (ICML)},
year = {2012},
url = {http://hal3.name/docs/#daume12gomtl}
}
Low-dimensional Discriminative Reranking
Jagadeesh Jagarlamudi and Hal Daumé III
Conference on North American Chapter of the Association for Computational Linguistics, 2012
[Abstract] [BibTeX]
The accuracy of many natural language processing tasks can be improved by a reranking step, which involves selecting a single output from a list of candidate outputs generated by a baseline system. We propose a novel family of reranking algorithms based on learning separate low-dimensional embeddings of the task’s input and output spaces. This embedding is learned in such a way that prediction becomes a low-dimensional nearest-neighbor search, which can be done computationally efficiently. A key quality of our approach is that feature engineering can be done separately on the input and output spaces; the relationship between inputs and outputs is learned automatically. Experiments on part-of-speech tagging task in four languages show significant improvements over a baseline decoder and existing reranking approaches.
@inproceedings{daume12lowdim,
title = {Low-dimensional Discriminative Reranking},
author = {Jagadeesh Jagarlamudi and Hal {Daum\'e III}},
booktitle = {Proceedings of the Conference on North American Chapter of the
Association for Computational Linguistics},
year = {2012},
address = {Montreal, Canada},
url = {http://hal3.name/docs/#daume12lowdim}
}
Besting the quiz master: crowdsourcing incremental classification games
Jordan Boyd-Graber, Brianna Satinoff, He He and Hal Daumé III
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2012
[Abstract] [BibTeX]
Cost-sensitive classification, where the features used in machine learning tasks have a cost, has been explored as a means of balancing knowl- edge against the expense of incrementally ob- taining new features. We introduce a setting where humans engage in classification with incrementally revealed features: the collegiate trivia circuit. By providing the community with a web-based system to practice, we collected tens of thousands of implicit word-by-word ratings of how useful features are for eliciting correct answers. Observing humans’ classifi- cation process, we improve the performance of a state-of-the art classifier. We also use the dataset to evaluate a system to compete in the incremental classification task through a reduc- tion of reinforcement learning to classification. Our system learns when to answer a question, performing better than baselines and most hu- man players.
@inproceedings{daume12quiz,
title = {Besting the quiz master: crowdsourcing incremental classification
games},
author = {Jordan Boyd-Graber and Brianna Satinoff and He He and Hal {Daum\'e
III}},
booktitle = {Proceedings of the Conference on Empirical Methods in Natural
Language Processing (EMNLP)},
year = {2012},
url = {http://hal3.name/docs/#daume12quiz},
}
Learned Prioritization for Trading Off Accuracy and Speed
Jiarong Jiang, Adam Teichert, Hal Daumé III and Jason Eisner
Advances in Neural Information Processing Systems (NeurIPS), 2012
[Abstract] [BibTeX]
Users want inference to be both fast and accurate, but quality often comes at the cost of speed. The field has experimented with approximate inference algorithms that make different speed-accuracy tradeoffs (for particular problems and datasets). We aim to explore this space automatically, focusing here on the case of agenda-based syntactic parsing [12]. Unfortunately, off-the-shelf reinforcement learning techniques fail to learn good policies: the state space is simply too large to explore naively. An attempt to counteract this by applying imitation learning algorithms also fails: the “teacher” follows a far better policy than anything in our learner’s policy space, free of the speed-accuracy tradeoff that arises when oracle information is unavailable, and thus largely insensitive to the known reward functfion. We propose a hybrid reinforcement/apprenticeship learning algorithm that learns to speed up an initial policy, trading off accuracy for speed according to various settings of a speed term in the loss function.
@inproceedings{daume12prioritization,
title = {Learned Prioritization for Trading Off Accuracy and Speed},
author = {Jiarong Jiang and Adam Teichert and Hal {Daum\'e III} and Jason
Eisner},
booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
year = {2012},
url = {http://hal3.name/docs/#daume12prioritization},
}
Using Classifier Cascades for Scalable E-Mail Classification
Jay Pujara, Hal Daumé III and Lise Getoor
CEAS, 2011
🏆 Best Paper Award
[Abstract] [BibTeX]
In many real-world scenarios, we must make judgments in the presence of computational constraints. One common computational constraint arises when the features used to make a judgment each have differing acquisition costs, but there is a fixed total budget for a set of judgments. Particularly when there are a large number of classifications that must be made in a real-time, an intelligent strategy for optimizing accuracy versus computational costs is essential. E-mail classification is an area where accurate and timely results require such a trade-off. We identify two scenarios where intelligent feature acquisition can improve classifier performance.
@inproceedings{daume11cascades,
title = {Using Classifier Cascades for Scalable E-Mail Classification},
author = {Jay Pujara and Hal {Daum\'e III} and Lise Getoor},
booktitle = {CEAS},
year = {2011},
url = {http://hal3.name/docs/#daume11cascades},
}
Multiple Hash Functions for Learning
Amit Goyal, Piyush Rai and Hal Daumé III
NeurIPS Big Learning Workshop, 2011
[Abstract] [BibTeX]
In this paper, we explore the idea of feature-hashing in learning problems. We first evaluate some hashing strategies on the basis of their efficacy on classification problems. We then explore the following trade-off: Given a fixed budget (say K) for the hashed feature vector, should one use a single hash function that gives a hashed vector of size K, or use multiple hash functions to come up with smaller representations (say 3 hash functions, each giving a representation of size K=3)? In particular, for the latter setting, how should the different hashed representations be combined? We propose online learning algorithms for this setting using multiple Perceptrons (one for each hashed representation), and explore a number of Perceptron update and prediction schemes. Experimental results demonstrate that our update schemes give better classification accuracies than the case when a single hashed feature vector is used to train the model.
@InProceedings{daume11multihash,
author = {Amit Goyal and Piyush Rai and Hal {Daum\'e III}},
title = {Multiple Hash Functions for Learning},
booktitle = {NeurIPS Big Learning Workshop},
year = {2011},
address = {Sierra Nevada, Spain},
url = {http://hal3.name/docs/#daume11multihash}
}
A Co-training Approach for Multiview Spectral Clustering
Abhishek Kumar and Hal Daumé III
International Conference on Machine Learning (ICML), 2011
[BibTeX]
@InProceedings{daume11cospec,
author = {Abhishek Kumar and Hal {Daum\'e III}},
title = {A Co-training Approach for Multiview Spectral Clustering},
booktitle = {International Conference on Machine Learning (ICML)},
year = {2011},
address = {Bellevue, WA},
url = {http://hal3.name/docs/#daume11cospec}
}
Leveraging Social Bookmarks from Partially Tagged Corpus for Improved Webpage Clustering
Anusua Trivedi, Piyush Rai, Hal Daumé III and Scott L. DuVall
ACM Transactions on Intelligent Systems and Technology, 2011
[BibTeX]
@InProceedings{daume11social,
author = {Anusua Trivedi and Piyush Rai and Hal {Daum\'e III} and Scott L.
DuVall},
title = {Leveraging Social Bookmarks from Partially Tagged Corpus for Improved
Webpage Clustering},
booktitle = {ACM Transactions on Intelligent Systems and Technology},
year = {2011},
url = {http://hal3.name/docs/#daume11social}
}
Message-Passing for Approximate MAP Inference with Latent Variables
Jiarong Jiang, Piyush Rai and Hal Daumé III
Conference on Neural Information Processing Systems (NeurIPS), 2011
[BibTeX]
@InProceedings{daume11mapmarg,
author = {Jiarong Jiang and Piyush Rai and Hal {Daum\'e III}},
title = {Message-Passing for Approximate MAP Inference with Latent Variables},
booktitle = {Proceedings of the Conference on Neural Information Processing
Systems (NeurIPS)},
year = {2011},
address = {Granada, Spain},
url = {http://hal3.name/docs/#daume11mapmarg}
}
Beam Search based MAP Estimates for the Indian Buffet Process
Piyush Rai and Hal Daumé III
International Conference on Machine Learning (ICML), 2011
[BibTeX]
@InProceedings{daume11ibpsearch,
author = {Piyush Rai and Hal {Daum\'e III}},
title = {Beam Search based MAP Estimates for the Indian Buffet Process},
booktitle = {International Conference on Machine Learning (ICML)},
year = {2011},
address = {Bellevue, WA},
url = {http://hal3.name/docs/#daume11ibpsearch}
}
Speed-Accuracy Tradeoffs in Nondeterministic Inference Algorithms
Jason Eisner and Hal Daumé III
COST: NeurIPS 2011 Workshop on Computational Trade-offs in Statistical Learning, 2011
[Abstract] [BibTeX]
Statistical learning has led to great advances in building models that achieve high accuracy. However, test-time inference in these models can be slow, for example in structured prediction problems. This is frequently addressed by using test-time heuristics to guide and prune the search for a good structured output. In this high-level paper, we ask: Could we explicitly train such heuristics to trade off accuracy and efficiency? And how does this relate to existing learning problems?
@InProceedings{daume11tradeoffs,
author = {Jason Eisner and Hal {Daum\'e III}},
title = {Speed-Accuracy Tradeoffs in Nondeterministic Inference Algorithms},
booktitle = {Proceedings of COST: NeurIPS 2011 Workshop on Computational
Trade-offs in Statistical Learning},
year = {2011},
address = {Sierra Nevada, Spain},
url = {http://hal3.name/docs/#daume11tradeoffs}
}
Generative Kernels for Exponential Families
Arvind Agarwal and Hal Daumé III
Conference on Artificial Intelligence and Statistics (AI-Stats), 2011
[Abstract] [BibTeX]
In this paper, we propose a family of kernels for the data distributions belonging to the exponential family. We call these kernels generative kernels because they take into account the generative process of the data. Our proposed method considers the geometry of the data distribution to build a set of efficient closed-form kernels best suited for that distribution. We compare our generative kernels on multinomial data and observe improved empirical performance across the board. Moreover, our generative kernels perform signicantly better when training size is small, an important property of the generative models.
@InProceedings{daume11genkern,
author = {Arvind Agarwal and Hal {Daum\'e III}},
title = {Generative Kernels for Exponential Families},
booktitle = {Conference on Artificial Intelligence and Statistics (AI-Stats)},
year = {2011},
address = {Ft. Lauderdale, FL},
url = {http://hal3.name/docs/#daume11genkern}
}
Co-regularized Multi-view Spectral Clustering
Abhishek Kumar, Piyush Rai and Hal Daumé III
Conference on Neural Information Processing Systems (NeurIPS), 2011
[BibTeX]
@InProceedings{daume11spectral,
author = {Abhishek Kumar and Piyush Rai and Hal {Daum\'e III}},
title = {Co-regularized Multi-view Spectral Clustering},
booktitle = {Proceedings of the Conference on Neural Information Processing
Systems (NeurIPS)},
year = {2011},
address = {Granada, Spain},
url = {http://hal3.name/docs/#daume11spectral}
}
Online Learning of Multiple Tasks and Their Relationships
Avishek Saha, Piyush Rai, Hal Daumé III and Suresh Venkatasubramanian
Conference on Artificial Intelligence and Statistics (AI-Stats), 2011
[Abstract] [BibTeX]
We propose an Online MultiTask Learning (OMTL) framework which simultaneously learns the task weight vectors as well as the task relatedness adaptively from the data. Our work is in contrast with prior work on online multitask learning which assumes fixed task relatedness, a priori. Furthermore, whereas prior work in such settings assume only positively correlated tasks, our framework can capture negative correlations as well. Our proposed framework learns the task relationship matrix by framing the objective function as a Bregman divergence minimization problem for positive definite matrices. Subsequently, we exploit this adaptively learned task-relationship matrix to select the most informative samples in an online multitask active learning setting. Experimental results on a number of real-world datasets and comparisons with numerous baselines establish the efficacy of our proposed approach.
@InProceedings{daume11olmt,
author = {Avishek Saha and Piyush Rai and Hal {Daum\'e III} and Suresh
Venkatasubramanian},
title = {Online Learning of Multiple Tasks and Their Relationships},
booktitle = {Conference on Artificial Intelligence and Statistics (AI-Stats)},
year = {2011},
address = {Ft. Lauderdale, FL},
url = {http://hal3.name/docs/#daume11olmt}
}
Co-regularized Spectral Clustering with Multiple Kernels
Abhishek Kumar, Piyush Rai and Hal Daumé III
NeurIPS Workshop on New Directions in Multiple Kernel Learning, 2010
[Abstract] [BibTeX]
We propose a co-regularization based multiview spectral clustering algorithm which enforces the clusterings across multiple views to agree with each-other. Since each view can be used to define a similarity graph over the data, our algorithm can also be considered as learning with multiple similarity graphs, or equivalently with multiple kernels. We propose an objective function that implicitly combines two (or more) kernels, and leads to an improved clustering performance. Experimental comparisons with a number of baselines on several datasets establish the efficacy of our proposed approach.
@InProceedings{daume10spectral,
author = {Abhishek Kumar and Piyush Rai and Hal {Daum\'e III}},
title = {Co-regularized Spectral Clustering with Multiple Kernels},
booktitle = {NeurIPS Workshop on New Directions in Multiple Kernel Learning},
year = {2010},
address = {Whistler, Canada},
url = {http://hal3.name/docs/#daume10spectral}
}
Domain Adaptation meets Active Learning
Piyush Rai, Avishek Saha, Hal Daumé III and Suresh Venkatasubramanian
HLT/NAACL Workshop on Active Learning for NLP (ALNLP), 2010
[Abstract] [BibTeX]
In this work, we show how active learning in some (target) domain can leverage information from a different but related (source) domain. We present an algorithm that harnesses the source domain data to learn the best possible initializer hypothesis for doing active learning in the target domain, resulting in improved label complexity. We also present a variant of this algorithm which additionally uses the domain divergence information to selectively query the most informative points in the target domain, leading to further reductions in label complexity. Experimental results on a variety of datasets establish the efficacy of the proposed methods.
@InProceedings{daume10daal,
author = {Piyush Rai and Avishek Saha and Hal {Daum\'e III} and Suresh
Venkatasubramanian},
title = {Domain Adaptation meets Active Learning},
booktitle = {Proceedings of HLT/NAACL Workshop on Active Learning for NLP
(ALNLP)},
year = {2010},
address = {Los Angeles, CA},
url = {http://hal3.name/docs/#daume10daal}
}
Extracting Multilingual Topics from Unaligned Corpora
Jagadeesh Jagarlamudi and Hal Daumé III
European Conference on Information Retrieval (ECIR), 2010
[Abstract] [BibTeX]
Topic models have been studied extensively in the context of monolingual corpora. Though there are some attempts to mine topical structure from cross-lingual corpora, they require clues about document alignments. In this paper we present a generative model called JointLDA which uses a bilingual dictionary to mine multilingual topics from an unaligned corpus. Experiments conducted on different data sets confirm our conjecture that jointly modeling the cross-lingual corpora offers several advantages compared to individual monolingual models. Since the JointLDA model merges related topics in different languages into a single multilingual topic: a) it can fit the data with relatively fewer topics. b) it has the ability to predict related words from a language different than that of the given document. In fact it has better predictive power compared to the bag-of-word based translation model leaving the possibility for JointLDA to be preferred over bag-of-word model for cross-lingual IR applications. We also found that the monolingual models learnt while optimizing the cross-lingual copora are more effective than the corresponding LDA models.
@InProceedings{daume10multilingual,
author = {Jagadeesh Jagarlamudi and Hal {Daum\'e III}},
title = {Extracting Multilingual Topics from Unaligned Corpora},
booktitle = {Proceedings of the European Conference on Information Retrieval
(ECIR)},
year = {2010},
address = {Milton Keynes, United Kingdom},
url = {http://hal3.name/docs/#daume10multilingual}
}
Multiview Clustering with Incomplete Views
Piyush Rai, Anusua Trivedi, Hal Daumé III and Scott L. DuVall
NeurIPS Workshop on Machine Learning for Social Computing, 2010
[Abstract] [BibTeX]
Multiview clustering algorithms allow leveraging information frommultiple views of the data and therefore lead to improved clustering. A number of kernel based multiview clustering algorithms work by using the kernel matrices defined on the different views of the data. However, these algorithms assume availability of features from all the views of each example, i.e., assume that the kernel matrix for each view is complete. We present an approach that allows these algorithms to be applicable even when only one (the primary) view is complete and the auxiliary views are incomplete (i.e., features from these views are available only for some of the examples). Taking the kernel CCA based multiview clustering as an example, we apply our method on webpage clustering with multiple views of the data where one view is the page-text and other view is the social tags assigned to the webpage. We consider the case when the tags are available only for a small subset of the webpages which means that the tag view is incomplete. Experimental results establish the effectiveness of the proposed method.
@InProceedings{daume10mvincomplete,
author = {Piyush Rai and Anusua Trivedi and Hal {Daum\'e III} and Scott L.
DuVall},
title = {Multiview Clustering with Incomplete Views},
booktitle = {NeurIPS Workshop on Machine Learning for Social Computing},
year = {2010},
address = {Whistler, Canada},
url = {http://hal3.name/docs/#daume10mvincomplete}
}
Multitask Learning via Mixture of Linear Subspaces
Piyush Rai and Hal Daumé III
NeurIPS Workshop on Transfer Learning by Learning Rich Generative Models, 2010
[Abstract] [BibTeX]
We propose a probabilistic generative model for multitask learning that exploits the cluster structure of the task parameters, and additionally imposes a low-rank constraint on the set of task parameters within each cluster. This leads to a sharing of statistical strengths of multiple tasks at two levels: (1) via cluster assumption, and (2) via a subspace assumption within each cluster. Our work brings in the benefits of both these aspects of task relationship, each of which has been addressed only individually in prior work. We assume a mixture of linear subspaces model on the latent task parameters that can capture both these aspects simultaneously. Furthermore, the mixture of subspaces assumption can model the fact that the task parameters could potentially live on a non-linear manifold instead of a linear subspace which is a restriction of earlier work on multitask learning based on the linear subspace assumption.
@InProceedings{daume10mtlmls,
author = {Piyush Rai and Hal {Daum\'e III}},
title = {Multitask Learning via Mixture of Linear Subspaces},
booktitle = {NeurIPS Workshop on Transfer Learning by Learning Rich Generative
Models},
year = {2010},
address = {Whistler, Canada},
url = {http://hal3.name/docs/#daume10mtlmls}
}
A Co-regularization Based Semi-supervised Domain Adaptation
Abhishek Kumar, Avishek Saha and Hal Daumé III
Conference on Neural Information Processing Systems (NeurIPS), 2010
[Abstract] [BibTeX]
This paper presents a co-regularization based approach to semi-supervised domain adaptation. Our proposed approach (EA++) builds on the notion of augmented space (introduced in EASYADAPT (EA) [1]) and harnesses unlabeled data in target domain to further enable the transfer of information from source to target. This semi-supervised approach to domain adaptation is extremely simple to implement and can be applied as a pre-processing step to any supervised learner. Our theoretical analysis (in terms of Rademacher complexity) of EA and EA++ show that the hypothesis class of EA++ has lower complexity (compared to EA) and hence results in tighter generalization bounds. Experimental results on sentiment analysis tasks reinforce our theoretical findings and demonstrate the efficacy of the proposed method when compared to EA as well as a few other baseline approaches.
@InProceedings{daume10coreg,
author = {Abhishek Kumar and Avishek Saha and Hal {Daum\'e III}},
title = {A Co-regularization Based Semi-supervised Domain Adaptation},
booktitle = {Proceedings of the Conference on Neural Information Processing
Systems (NeurIPS)},
year = {2010},
address = {Vancouver, Canada},
url = {http://hal3.name/docs/#daume10coreg}
}
A geometric view of conjugate priors
Arvind Agarwal and Hal Daumé III
Machine Learning Journal (MLJ), 2010
[Abstract] [BibTeX]
In Bayesian machine learning, conjugate priors are popular, mostly due to mathematical convenience. In this paper, we show that there are deeper reasons for choosing a conjugate prior. Specifically, we formulate the conjugate prior in the form of Bregman divergence and show that it is the inherent geometry of conjugate priors that makes them appropriate and intuitive. This geometric interpretation allows one to view the hyperparameters of conjugate priors as the effective sample points, thus providing additional intuition. We use this geometric understanding of conjugate priors to derive the hyperparameters and expression of the prior used to couple the generative and discriminative components of a hybrid model for semi-supervised learning.
@article{daume10conjugate,
author = {Arvind Agarwal and Hal {Daum\'e III}},
title = {A geometric view of conjugate priors},
year = {2010},
booktitle = {Machine Learning Journal (MLJ)},
volume = {81},
number = {1},
url = {http://hal3.name/docs/#daume10conjugate}
}
Kernelized Sorting for Natural Language Processing
Jagadeesh Jagarlamudi, Seth Juarez and Hal Daumé III
Conference on Artificial Intelligence (AAAI), 2010
[Abstract] [BibTeX]
Kernelized sorting is an approach for matching objects from two sources (or domains) that does not require any prior notion of similarity between objects across the two sources. Unfortunately, this technique is highly sensitive to initialization and high dimensional data. We present variants of kernelized sorting to increase its robustness and performance on several Natural Language Processing (NLP) tasks: document matching from parallel and comparable corpora, machine transliteration and even image processing. Empirically we show that, on these tasks, a semi-supervised variant of kernelized sorting outperforms matching canonical correlation analysis.
@InProceedings{daume10sorting,
author = {Jagadeesh Jagarlamudi and Seth Juarez and Hal {Daum\'e III}},
title = {Kernelized Sorting for Natural Language Processing},
booktitle = {Proceedings of the Conference on Artificial Intelligence (AAAI)},
year = {2010},
address = {Atlanta, Georgia},
url = {http://hal3.name/docs/#daume10sorting}
}
Exploiting Tag and Word Correlations for Improved Webpage Clustering
Anusua Trivedi, Piyush Rai, Scott L. DuVall and Hal Daumé III
CIKM Workshop on Search and Mining User-generated Contents (SMUC), 2010
[Abstract] [BibTeX]
Automatic clustering of webpages helps a number of information retrieval tasks, such as improving user interfaces, collection clustering, introducing diversity in search results, etc. Typically, webpage clustering algorithms only use features extracted from the page-text. However, the advent of social-bookmarking websites, such as StumbleUpon1 and Delicious, has led to a huge amount of user-generated content such as the tag information that is associated with the webpages. In this paper, we present a subspace based feature extraction approach which leverages tag information to complement the page-contents of a webpage to extract highly discriminative features, with the goal of improved clustering performance. In our approach, we consider page-text and tags as two separate views of the data, and learn a shared subspace that maximizes the correlation between the two views. Any clustering algorithm can then be applied in this subspace. We compare our subspace based approach with a number of baselines that use tag information in various other ways, and show that the subspace based approach leads to improved performance on the webpage clustering task. Although our results here are on the webpage clustering task, the same approach can be used for webpage classification as well. In the end, we also suggest possible future work for leveraging tag information in webpage clustering, especially when tag information is present for not all, but only for a small number of webpages.
@inproceedings{daume10clustering,
author = {Anusua Trivedi and Piyush Rai and Scott L. DuVall and Hal {Daum\'e
III}},
title = {Exploiting Tag and Word Correlations for Improved Webpage Clustering},
booktitle = {Proceedings of {CIKM} Workshop on Search and Mining User-generated
Contents (SMUC)},
year = {2010},
address = {Toronto, Canada},
}
Learning Multiple Tasks using Manifold Regularization
Arvind Agarwal, Samuel Gerber and Hal Daumé III
Conference on Neural Information Processing Systems (NeurIPS), 2010
[Abstract] [BibTeX]
We present a novel method for multitask learning (MTL) based on manifold regularization. We assume that all task parameters lie on a manifold which is the generalization of the assumption made in the existing literature i.e., task parameters share a common linear subspace. The proposed method uses the projection distance from the manifold to regularize the task parameters. The manifold structure and the task parameters are learned using an alternating optimization framework. When the manifold structure is fixed, our method decomposes into learning independent tasks, making it appealing for learning new tasks. An approximation of the manifold regularization scheme is presented that preserves the convexity of the single task learning problem, and makes the proposed MTL framework efficient and easy to implement. We show the efficacy of our method on several datasets.
@InProceedings{daume10manifold,
author = {Arvind Agarwal and Samuel Gerber and Hal {Daum\'e III}},
title = {Learning Multiple Tasks using Manifold Regularization},
booktitle = {Proceedings of the Conference on Neural Information Processing
Systems (NeurIPS)},
year = {2010},
address = {Vancouver, Canada},
url = {http://hal3.name/docs/#daume10manifold}
}
Infinite Predictor Subspace Models for Multitask Learning
Piyush Rai and Hal Daumé III
Conference on Artificial Intelligence and Statistics (AI-Stats), 2010
[Abstract] [BibTeX]
Given several related learning tasks, we propose a nonparametric Bayesian model that captures task relatedness by assuming that the task parameters (i.e., predictors) share a latent subspace. More specifically, the intrinsic dimensionality of the task subspace is not assumed to be known a priori. We use an infinite latent feature model to automatically infer this number (depending on and limited by only the number of tasks). Furthermore, our approach is applicable when the underlying task parameter subspace is inherently sparse, drawing parallels with l1 regularization and LASSO-style models. We also propose an augmented model which can make use of (labeled, and additionally unlabeled if available) inputs to assist learning this subspace, leading to further improvements in the performance. Experimental results demonstrate the efficacy of both the proposed approaches, especially when the number of examples per task is small. Finally, we discuss an extension of the proposed framework where a nonparametric mixture of linear subspaces can be used to learn a nonlin- ear manifold over the task parameters, and also deal with the issue of negative transfer from unrelated tasks.
@InProceedings{daume10subspace,
author = {Piyush Rai and Hal {Daum\'e III}},
title = {Infinite Predictor Subspace Models for Multitask Learning},
booktitle = {Proceedings of the Conference on Artificial Intelligence and
Statistics (AI-Stats)},
year = {2010},
address = {Sardinia, Italy},
url = {http://hal3.name/docs/#daume10subspace}
}
Active Online Multitask Learning
Avishek Saha, Piyush Rai, Hal Daumé III and Suresh Venkatasubramanian
ICML 2010 Workshop on Budgeted Learning (Budget), 2010
[Abstract] [BibTeX]
In this paper, we propose an online multitask learning framework where the weight vectors are updated in an adaptive fashion based on inter-task relatedness. Our work is in contrast with the earlier work on online multitask learning (Cavallanti et al., 2008) where the authors use a fixed interaction matrix of tasks to derive (fixed) update rules for all the tasks. In this work, we propose to update this interaction matrix itself in an adaptive fashion so that the weight vector updates are no longer fixed but are instead adaptive. Our framework can be extended to an active learning setting where the informativeness of an incoming instance across all the tasks can be evaluated using this adaptive interaction matrix. Empirical results on standardized datasets show improved performance in terms of accuracy, label complexity and number of mistakes made.
@inproceedings{daume10aoml,
author = {Avishek Saha and Piyush Rai and Hal {Daum\'e III} and Suresh
Venkatasubramanian},
title = {Active Online Multitask Learning},
booktitle = {ICML 2010 Workshop on Budgeted Learning (Budget)},
year = {2010},
address = {Haifa, Israel},
}
Frustratingly Easy Semi-Supervised Domain Adaptation
Hal Daumé III, Abhishek Kumar and Avishek Saha
Workshop on Domain Adaptation for NLP, 2010
[Abstract] [BibTeX]
In this work, we propose a semi-supervised extension to a well-known supervised domain adaptation approach (EA) (Daume III, 2007). Our proposed approach (EA++) builds on the notion of augmented space (introduced in EA) and harnesses unlabeled data in target domain to ameliorate the transfer of information from source to target. This semi-supervised approach to domain adaptation is extremely simple to implement, and can be applied as a pre-processing step to any supervised learner. Experimental results on sequential labeling tasks demonstrate the efficacy of the proposed method.
@inproceedings{daume10easyss,
title = {Frustratingly Easy Semi-Supervised Domain Adaptation},
author = {Hal {Daum\'e III} and Abhishek Kumar and Avishek Saha},
booktitle = {Workshop on Domain Adaptation for NLP},
year = {2010},
url = {http://hal3.name/docs/#daume10easyss},
}
Exponential Family Hybrid Semi-Supervised Learning
Arvind Agarwal and Hal Daumé III
International Joint Conference on Artificial Intelligence (IJCAI), 2009
🏆 Best Paper Award
[Abstract] [BibTeX]
We present an approach to semi-supervised learning based on an exponential family characterization. Our approach generalizes previous work on coupled priors for hybrid generative/discriminative models. Our model is more flexible and natural than previous approaches. Experimental results on several data sets show that our approach also performs better in practice.
@InProceedings{daume09hybrid,
author = {Arvind Agarwal and Hal {Daum\'e III}},
title = {Exponential Family Hybrid Semi-Supervised Learning},
booktitle = {International Joint Conference on Artificial Intelligence (IJCAI)},
year = {2009},
address = {Pasadena, CA},
url = {http://hal3.name/docs/#daume09hybrid},
}
Fast Search for Infinite Latent Feature Models
Piyush Rai and Hal Daumé III
NeurIPS Workshop on Non-parametric Bayes (NP-Bayes), 2009
[Abstract] [BibTeX]
We propose several search based alternatives for inference in the Indian Buffet Process (IBP) based models. We consider the case when we only want a maximum a posteriori (MAP) estimate of the latent feature assignment matrix. If true posterior samples are required, these MAP estimates can also serve as intelligent initializers for MCMC based algorithms. Another advantage of the proposed methods is that they can process one observation at a time making it possible to do inference in an online setting. Experimental evidences suggest that these algorithms can give us computational benefits of an order of magnitude over Gibbs sampling (or its sequential variant - the particle filter) traditionally used in IBP based models.
@InProceedings{daume09ibpsearch,
author = {Piyush Rai and Hal {Daum\'e III}},
title = {Fast Search for Infinite Latent Feature Models},
booktitle = {Proceedings of NeurIPS Workshop on Non-parametric Bayes (NP-Bayes)},
year = {2009},
address = {Vancouver, Canada},
url = {http://hal3.name/docs/#daume09ibpsearch}
}
Bayesian Multitask Learning with Latent Hierarchies
Hal Daumé III
Conference on Uncertainty in Artificial Intelligence (UAI), 2009
[Abstract] [BibTeX]
We learn multiple hypotheses for related tasks under a latent hierarchical relationship between tasks. We exploit the intuition that for \emphdomain adaptation, we wish to share classifier structure, but for \emphmultitask learning, we wish to share covariance structure. Our hierarchical model is seen to subsume several previously proposed multitask learning models and performs well on three distinct real-world data sets.
@InProceedings{daume09hiermtl,
author = {Hal {Daum\'e III}},
title = {Bayesian Multitask Learning with Latent Hierarchies},
booktitle = {Conference on Uncertainty in Artificial Intelligence (UAI)},
year = {2009},
address = {Montreal, Canada},
url = {http://hal3.name/docs/#daume09hiermtl}
}
Unsupervised Search-based Structured Prediction
Hal Daumé III
International Conference on Machine Learning (ICML), 2009
[Abstract] [BibTeX]
We describe an adaptation and application of a search-based structured prediction algorithm "Searn" to unsupervised learning problems. We show that it is possible to reduce unsupervised learning to supervised learning and demonstrate a high-quality unsupervised shift-reduce parsing model. We additionally show a close connection between unsupervised Searn and expectation maximization. Finally, we demonstrate the efficacy of a semi-supervised extension. The key idea that enables this is an application of the predict-self idea for unsupervised learning.
@InProceedings{daume09unsearn,
author = {Hal {Daum\'e III}},
title = {Unsupervised Search-based Structured Prediction},
booktitle = {International Conference on Machine Learning (ICML)},
year = {2009},
address = {Montreal, Canada},
url = {http://hal3.name/docs/#daume09unsearn}
}
Multi-Label Prediction via Sparse Infinite CCA
Piyush Rai and Hal Daumé III
Conference on Neural Information Processing Systems (NeurIPS), 2009
[Abstract] [BibTeX]
Canonical Correlation Analysis (CCA) is a useful technique for modeling dependencies between two (or more) sets of variables. Building upon the recently suggested probabilistic interpretation of CCA, we propose a nonparametric, fully Bayesian framework that can automatically select the number of correlation components, and effectively capture the sparsity underlying the projections. In addition, given (partially) labeled data, our algorithm can also be used as a (semi)supervised dimensionality reduction technique, and can be applied to learn useful predictive features in the context of learning a set of related tasks. Experimental results demonstrate the efficacy of the proposed approach for both CCA as a stand-alone problem, and when applied to multi-label prediction.
@InProceedings{daume09cca,
author = {Piyush Rai and Hal {Daum\'e III}},
title = {Multi-Label Prediction via Sparse Infinite {CCA}},
booktitle = {Proceedings of the Conference on Neural Information Processing
Systems (NeurIPS)},
year = {2009},
address = {Vancouver, Canada},
url = {http://hal3.name/docs/#daume09cca}
}
Streamed Learning: One-Pass SVMs
Piyush Rai, Hal Daumé III and Suresh Venkatasubramanian
International Joint Conference on Artificial Intelligence (IJCAI), 2009
[Abstract] [BibTeX]
We present a streaming model for large-scale classification (in the context of l2-SVM) by leveraging connections between learning and computational geometry. The streaming model imposes the constraint that only a single pass over the data is allowed. The l2-SVMis known to have an equivalent formulation in terms of theminimumenclosing ball (MEB) problem, and an efficient algorithm based on the idea of core sets exists (CVM) [Tsang et al., 2005]. CVM learns a (1+ε)-approximate MEB for a set of points and yields an approximate solution to corresponding SVM instance. However CVM works in batch mode requiringmultiple passes over the data. This paper presents a single-pass SVM which is based on the minimum enclosing ball of streaming data. We show that the MEB updates for the streaming case can be easily adapted to learn the SVM weight vector in a way similar to using online stochastic gradient updates. Our algorithmperforms polylogarithmic computation at each example, and requires very small and constant storage. Experimental results show that, even in such restrictive settings, we can learn efficiently in just one pass and get accuracies comparable to other stateof- the-art SVM solvers (batch and online). We also give an analysis of the algorithm, and discuss some open issues and possible extensions.
@InProceedings{daume09onepass,
author = {Piyush Rai and Hal {Daum\'e III} and Suresh Venkatasubramanian},
title = {Streamed Learning: One-Pass {SVM}s},
booktitle = {International Joint Conference on Artificial Intelligence (IJCAI)},
year = {2009},
address = {Pasadena, CA},
url = {http://hal3.name/docs/#daume09onepass}
}
A Bayesian Statistics Approach to Multiscale Coarse Graining
Pu Liu, Qiang Shi, Hal Daumé III and Gregory Voth
Journal of Chemical Physics (J.ChPhys), 2009
[Abstract] [BibTeX]
Coarse-grained (CG) modeling provides a promising way to investigate many important physical and biological phenomena over large spatial and temporal scales. The multiscale coarse-graining (MS-CG) method has been proven to be a thermodynamically consistent way to systematically derive a CG model from atomistic force information, as shown in a variety of systems, ranging from simple liquids to proteins embedded in lipid bilayers. In the present work, Bayes' theorem, an advanced statistical tool widely used in signal processing and pattern recognition, is adopted to further improve the MS-CG force field obtained from the CG modeling. This approach can regularize the linear equation resulting from the underlying force-matching methodology, therefore substantially improving the quality of the MS-CG force field, especially for the regions with limited sampling. Moreover, this Bayesian approach can naturally provide an error estimation for each force field parameter, from which one can know the extent the results can be trusted. The robustness and accuracy of the Bayesian MS-CG algorithm is demonstrated for three different systems, including simple liquid methanol, polyalanine peptide solvated in explicit water, and a much more complicated peptide assembly with 32 NNQQNY hexapeptides.
@Article{daume09graining,
author = {Pu Liu and Qiang Shi and Hal {Daum\'e III} and Gregory Voth},
title = {A Bayesian Statistics Approach to Multiscale Coarse Graining},
journal = {Journal of Chemical Physics (J.ChPhys)},
year = {2009},
volume = {129},
number = {21},
pages = {214114},
month = {December},
}
Markov Random Topic Fields
Hal Daumé III
Association for Computational Linguistics (ACL), 2009
[Abstract] [BibTeX]
Most approaches to topic modeling assume an independence between documents that is frequently violated. We present an topic model that makes use of one or more user-specified graphs describing relationships between documents. These graph are encoded in the form of a Markov random field over topics and serve to encourage related documents to have similar topic structures. Experiments on show upwards of a $10\%$ improvement in modeling performance.
@InProceedings{daume09mrtf,
author = {Hal {Daum\'e III}},
title = {Markov Random Topic Fields},
booktitle = {Association for Computational Linguistics (ACL)},
year = {2009},
address = {Singapore},
url = {http://hal3.name/docs/#daume09mrtf}
}
Semi-supervised or Semi-unsupervised?
Hal Daumé III
Unpublished, 2009
[BibTeX]
@Misc{daume09sslnlp,
author = {Hal {Daum\'e III}},
title = {Semi-supervised or Semi-unsupervised?},
howpublished = {Invited paper: NAACL-HLT Workshop on Semi-supervised Learning in
NLP (SSLNLP)},
year = {2009},
address = {Boulder, CO},
url = {http://hal3.name/docs/#daume09sslnlp}
}
Multitask Learning using Nonparametrically Learned Predictor Subspaces
Piyush Rai and Hal Daumé III
NeurIPS Workshop on Learning from Multiple Sources, 2009
[Abstract] [BibTeX]
Given several related learning tasks, we propose a nonparametric Bayesian learning model that captures task relatedness by assuming that the task parameters (i.e., weight vectors) share a latent subspace. More specifically, the intrinsic dimensionality of this subspace is not assumed to be known a priori. We use an infinite latent feature model - the Indian Buffet Process - to automatically infer this number. We also propose extensions of this model where the subspace learning can incorporate (labeled, and additionally unlabeled if available) examples, or the task parameters share a mixture of subspaces, instead of sharing a single subspace. The latter property can allow learning nonlinear manifold structure underlying the task parameters, and can also help in preventing negative transfer from outlier tasks.
@InProceedings{daume09subspacemtl,
author = {Piyush Rai and Hal {Daum\'e III}},
title = {Multitask Learning using Nonparametrically Learned Predictor Subspaces},
booktitle = {NeurIPS Workshop on Learning from Multiple Sources},
year = {2009},
address = {Whistler, Canada},
url = {http://hal3.name/docs/#daume09subspacemtl}
}
Factor Regression Combining Heterogeneous Sources of Information
Amrish Kapoor, Piyush Rai and Hal Daumé III
NeurIPS Workshop on Learning From Multiple Sources with Applications to Robotics (LMS), 2009
[Abstract] [BibTeX]
We present a non-parametric Bayesian factor regression model that combines two heterogeneous sources of information: gene expression arrays and text from their corresponding PubMed abstracts. Our model approximates a pLSI style model and results in improved regression accuracy. We apply this model to gene-expression data analysis, but it is extendable to other problems exhibiting a similar heterogeneous multiplicity in sources of information, like financial analysis, weather prediction and others.
@InProceedings{daume09hetero,
author = {Amrish Kapoor and Piyush Rai and Hal {Daum\'e III}},
title = {Factor Regression Combining Heterogeneous Sources of Information},
booktitle = {Proceedings of NeurIPS Workshop on Learning From Multiple Sources
with Applications to Robotics (LMS)},
year = {2009},
address = {Vancouver, Canada},
url = {http://hal3.name/docs/#daume09hetero}
}
Search-based Structured Prediction
Hal Daumé III, John Langford and Daniel Marcu
Machine Learning Journal (MLJ), 2009
[Abstract] [BibTeX]
We present Searn, an algorithm for integrating search and learning to solve complex structured prediction problems such as those that occur in natural language, speech, computational biology, and vision. Searn is a meta-algorithm that transforms these complex problems into simple classification problems to which any binary classifier may be applied. Unlike current algorithms for structured learning that require decomposition of both the loss function and the feature functions over the predicted structure, Searn is able to learn prediction functions for any loss function and any class of features. Moreover, Searn comes with a strong, natural theoretical guarantee: good performance on the derived classification problems implies good performance on the structured prediction problem.
@article{daume09searn,
author = {Hal {Daum\'e III} and John Langford and Daniel Marcu},
title = {Search-based Structured Prediction},
year = {2009},
booktitle = {Machine Learning Journal (MLJ)},
url = {http://hal3.name/docs/#daume09searn}
}
HBC: Hierarchical Bayes Compiler
Hal Daumé III
Workshop on Bayesian Inference, 2008
[Abstract] [BibTeX]
These goals distinguish HBC from other Bayesian modeling software, such as Bugs (or WinBugs [3]). In particular, our primary goal is that models created in HBC can be used directly, rather than only as a firstpass test. Moreover, we aim for scalability with respect to data size. Finally, since the goal of HBC is to compile hierarchical models into standard programming languages (like C), these models can easily be used as part of a larger system. This last point is in the spirit of the dynamic programming language Dyna [2].
@inproceedings{daume08hbc,
title = {{HBC}: Hierarchical Bayes Compiler},
author = { Daum\'e, III, Hal},
booktitle = {Workshop on Bayesian Inference},
year = {2008},
url = {http://hal3.name/docs/#daume08hbc},
}
Perceptron-based Coherence Predictors
Devyani Ghosh, John Carter and Hal Daumé III
2nd Workshop on Chip Multiprocessor Memory Systems and Interconnects (ICSA), 2008
[Abstract] [BibTeX]
Coherence misses in shared-memory multiprocessors account for a substantial fraction of execution time in many important workloads. Just as branch predictors reduce the performance impact of branches, coherence predictors can reduce the performance impact of coherence misses. Two-level pattern-based coherence predictors have offered a general prediction method to trigger appropriate coherence actions. This paper presents the design and evaluation of a perceptron-based coherence predictor that extends a conventional directory-based write-invalidate protocol to predict when to push updates to remote nodes. When predicted correctly, the update eliminates a coherence miss on the remote node. We also present a simple mechanism for predicting to which nodes we should push updates. We evaluate our perceptron-based update predictor on a variety of SPLASH-2 and PARSEC benchmarks. Simulation indicates that the update predictor eliminates an average of 30\% of coherence misses. Our simple consumer prediction mechanism sent very few useless updates of updates were consumed (eliminated misses).
@InProceedings{daume08coherence,
author = {Devyani Ghosh and John Carter and Hal {Daum\'e III}},
title = {Perceptron-based Coherence Predictors},
booktitle = {Proceedings of the 2nd Workshop on Chip Multiprocessor Memory
Systems and Interconnects (ICSA)},
year = {2008},
address = {Beijing, China},
url = {http://hal3.name/docs/#daume08coherence}
}
Cross-Task Knowledge-Constrained Self Training
Hal Daumé III
Empirical Methods in Natural Language Processing (EMNLP), 2008
[Abstract] [BibTeX]
We present an algorithmic framework for learning multiple related tasks. Our framework exploits a form of prior knowledge that relates the output spaces of these tasks. We present PAC learning results that analyze the conditions under which such learning is possible. We present results on learning a shallow parser and named-entity recognition system that exploits our framework, showing consistent improvements over baseline methods.
@InProceedings{daume08hints,
author = {Hal {Daum\'e III}},
title = {Cross-Task Knowledge-Constrained Self Training},
booktitle = {Empirical Methods in Natural Language Processing (EMNLP)},
year = {2008},
address = {Honolulu, Hawaii},
url = {http://hal3.name/docs/#daume08hints}
}
The Infinite Hierarchical Factor Regression Model
Piyush Rai and Hal Daumé III
Conference on Neural Information Processing Systems (NeurIPS), 2008
[BibTeX]
@InProceedings{daume08ihfrm,
author = {Piyush Rai and Hal {Daum\'e III}},
title = {The Infinite Hierarchical Factor Regression Model},
booktitle = {Proceedings of the Conference on Neural Information Processing
Systems (NeurIPS)},
year = {2008},
address = {Vancouver, Canada},
url = {http://hal3.name/docs/#daume08ihfrm}
}
Structure Compilation: Trading Structure for Features
Percy Liang, Hal Daumé III and Dan Klein
International Conference on Machine Learning (ICML), 2008
[Abstract] [BibTeX]
Structured models often achieve excellent performance but can be slow at test time. We investigate structure compilation, where we replace structure with features, which are often computationally simpler but unfortunately statistically more complex. We analyze this tradeoff theoretically and empirically on three natural language processing tasks. We also introduce a simple method to transfer predictive power from structure to features via unlabeled data, while incurring a minimal statistical penalty.
@InProceedings{daume08flat,
author = {Percy Liang and Hal {Daum\'e III} and Dan Klein},
title = {Structure Compilation: Trading Structure for Features},
booktitle = {International Conference on Machine Learning (ICML)},
year = {2008},
address = {Helsinki, Finland},
url = {http://hal3.name/docs/#daume08flat}
}
Frustratingly Easy Domain Adaptation
Hal Daumé III
Conference of the Association for Computational Linguistics (ACL), 2007
🏆 Test of Time Award Nomination (2017)
[Abstract] [BibTeX]
We describe an approach to domain adaptation that is appropriate exactly in the case when one has enough "target" data to do slightly better than just using only "source" data. Our approach is incredibly simple, easy to implement as a preprocessing step (10 lines of Perl!) and outperforms state-of-the-art approaches on a range of datasets. The technique comes with several simple theoretical guarantees. Moreover, it is trivially extended to a multi-domain adaptation problem, where one has data from a variety of different domains.
@InProceedings{daume07easyadapt,
author = {Hal {Daum\'e III}},
title = {Frustratingly Easy Domain Adaptation},
booktitle = {Conference of the Association for Computational Linguistics (ACL)},
year = {2007},
address = {Prague, Czech Republic},
Fast search for Dirichlet process mixture models
Hal Daumé III
Eleventh International Conference on Artificial Intelligence and Statistics (AIStats), 2007
[Abstract] [BibTeX]
Dirichlet process (DP) mixture models provide a flexible Bayesian framework for density estimation. Unfortunately, their flexibility comes at a cost: inference in DP mixture models is computationally expensive, even when conjugate distributions are used. In the common case when one seeks only a maximum a posteriori assignment of data points to clusters, we show that search algorithms provide a practical alternative to expensive MCMC and variational techniques. When a true posterior sample is desired, the solution found by search can serve as a good initializer for MCMC. Experimental results show that using these techniques is it possible to apply DP mixture models to very large data sets.
@InProceedings{daume07astar-dp,
author = {Hal {Daum\'e III}},
title = {Fast search for Dirichlet process mixture models},
booktitle = {Proceedings of the Eleventh International Conference on Artificial
Intelligence and Statistics (AIStats)},
year = {2007},
address = {San Juan, Puerto Rico},
url = {http://hal3.name/docs/#daume07astar-dp}
}
Bayesian Agglomerative Clustering with Coalescents
Yee Whye Teh, Hal Daumé III and Daniel Roy
Conference on Neural Information Processing Systems (NeurIPS), 2007
[Abstract] [BibTeX]
We introduce a new Bayesian model for hierarchical clustering based on a prior over trees called Kingman's coalescent. We develop novel greedy and sequential Monte Carlo inferences which operate in a bottom-up agglomerative fashion. We show experimentally the superiority of our algorithms over others, and demonstrate our approach in document clustering and phylolinguistics.
@InProceedings{daume07coalescent,
author = {Yee Whye Teh and Hal {Daum\'e III} and Daniel Roy},
title = {Bayesian Agglomerative Clustering with Coalescents},
booktitle = {Proceedings of the Conference on Neural Information Processing
Systems (NeurIPS)},
year = {2007},
address = {Vancouver, Canada},
url = {http://hal3.name/docs/#daume07coalescent}
}
Practical Structured Learning Techniques for Natural Language Processing
Hal Daumé III
Ph.D. Thesis, 2006
[BibTeX]
@PhdThesis{daume06thesis,
author = {Hal {Daum\'e III}},
title = {Practical Structured Learning Techniques for Natural Language
Processing},
school = {University of Southern California},
year = {2006},
address = {Los Angeles, CA},
month = {August},
url = {http://hal3.name/docs/#daume06thesis}
}
Domain Adaptation for Statistical Classifiers
Hal Daumé III and Daniel Marcu
Journal of Artificial Intelligence Research (JAIR), 2006
[Abstract] [BibTeX]
The most basic assumption used in statistical learning theory is that training data and test data are drawn from the same underlying distribution. Unfortunately, in many applications, the "in-domain" test data is drawn from a distribution that is related, but not identical, to the "out-of-domain" distribution of the training data. We consider the common case in which labeled out-of-domain data is plentiful, but labeled in-domain data is scarce. We introduce a statistical formulation of this problem in terms of a simple mixture model and present an instantiation of this framework to maximum entropy classifiers and their linear chain counterparts. We present efficient inference algorithms for this special case based on the technique of conditional expectation maximization. Our experimental results show that our approach leads to improved performance on three real world tasks on four different data sets from the natural language processing domain.
@article{daume06megam,
author = {Hal {Daum\'e III} and Daniel Marcu},
title = {Domain Adaptation for Statistical Classifiers},
journal = {Journal of Artificial Intelligence Research (JAIR)},
year = {2006},
volume = {26},
pages = {101--126},
url = {http://hal3.name/docs/#daume06megam}
}
Searn in Practice
Hal Daumé III, John Langford and Daniel Marcu
Unpublished, 2006
[Abstract] [BibTeX]
We recently introduced an algorithm, Searn, for solving hard structured prediction problems. This algorithm enjoys many nice properties: efficiency, wide applicability, theoretical justification and simplicity. However, under a desire to fit a lot of information into the original paper, it may not be so clear how simple the technique is. This report is designed to showcase how Searn can be applied to a wide variety of techniques and what really goes on behind the scenes. We will make use of three example problems, ranging from simple to complex. These are: (1) sequence labeling, (2) parsing and (3) machine translation. (These were chosen to be as widely understandable, especially in the NLP community, as possible.) In the end, we will come back to discuss Searn for general problems.
@unpublished{daume06searn-practice,
author = {Hal {Daum\'e III} and John Langford and Daniel Marcu},
title = {Searn in Practice},
year = {2006},
url = {http://hal3.name/docs/#daume06searn-practice}
}
A Bayesian Model for Supervised Clustering with the Dirichlet Process Prior
Hal Daumé III and Daniel Marcu
Journal of Machine Learning Research (JMLR), 2005
[Abstract] [BibTeX]
We develop a Bayesian framework for tackling the supervised clustering problem, the generic problem encountered in problems such as reference matching, coreference resolution, identity uncertainty and record linkage. Our clustering model is based on the non-parametric Dirichlet process prior, which enables us to define distributions over the countably infinite sets that naturally arise in this problem. We add \emphsupervision to our model by positing the existence of a set of unobserved random variables (we call these "reference types") that are generic across all clusters. Inference in our framework, which require integrating over infinitely many parameters, is solved using Markov chain Monte Carlo techniques. We present algorithms for both conjugate and non-conjugate priors. We present a simple -- but general -- parameterization of our model based on a Gaussian assumption. We evaluate this model on one artificial task and three real-world tasks, comparing it against both unsupervised and state-of-the-art supervised algorithms. Our results show that our model is able to outperform other models for this task across a variety of performance metrics.
@Article{daume05dpscm,
author = {Hal {Daum\'e III} and Daniel Marcu},
title = {A {B}ayesian Model for Supervised Clustering with the {D}irichlet
Process Prior},
journal = {Journal of Machine Learning Research (JMLR)},
year = {2005},
month = {September},
volume = {6},
pages = {1551--1577},
url = {http://hal3.name/docs/#daume05dpscm}
}
Search-Based Structured Prediction as Classification
Hal Daumé III, John Langford and Daniel Marcu
NeurIPS Workshop on Advances in Structured Learning for Text and Speech Processing (ASLTSP), 2005
[Abstract] [BibTeX]
Solutions to computationally hard problems often require that search be used. Integrating search into the learning phase has been previously proposed in an ad-hoc manner (Daume & Marcu, 2005). In this paper, we show that structured prediction can be mapped into a search setting using language from reinforcement learning, and known techniques for reinforcement learning (Langford et al., 2005) can give formal performance bounds on the structured prediction task.
@InProceedings{daume05search,
author = {Hal {Daum\'e III} and John Langford and Daniel Marcu},
title = {Search-Based Structured Prediction as Classification},
booktitle = {NeurIPS Workshop on Advances in Structured Learning for Text and
Speech Processing (ASLTSP)},
year = {2005},
address = {Whistler, Canada},
url = {http://hal3.name/docs/#daume05search}
}
Learning as Search Optimization: Approximate Large Margin Methods for Structured Prediction
Errata: There are some technical errors in this paper; see On Learning Linear Ranking Functions for Beam Search by Xu and Fern, ICML 2007, at http://web.engr.oregonstate.edu/~afern/papers/beam-icml07.pdf for more details.
Hal Daumé III and Daniel Marcu
International Conference on Machine Learning (ICML), 2005
[Abstract] [BibTeX]
Mappings to structured output spaces (strings, trees, partitions, etc.) are typically learned using extensions of classification algorithms to simple graphical structures (eg., linear chains) in which search and parameter estimation can be performed exactly. Unfortunately, in many complex problems, it is rare that exact search or parameter estimation is tractable. Instead of learning exact models and searching via heuristic means, we embrace this difficulty and treat the structured output problem in terms of approximate search. We present a framework for learning as search optimization, and two parameter updates with convergence theorems and bounds. Empirical evidence shows that our integrated approach to learning and decoding can outperform exact models at smaller computational cost.
@InProceedings{daume05laso,
author = {Hal {Daum\'e III} and Daniel Marcu},
title = {Learning as Search Optimization: Approximate Large Margin Methods for
Structured Prediction},
booktitle = {International Conference on Machine Learning (ICML)},
year = {2005},
address = {Bonn, Germany},
errata = {There are some technical errors in this paper; see On Learning Linear
Ranking Functions for Beam Search by Xu and Fern, ICML 2007, at
http://web.engr.oregonstate.edu/~afern/papers/beam-icml07.pdf for
more details.},
url = {http://hal3.name/docs/#daume05laso}
}
From Zero to Reproducing Kernel Hilbert Spaces in Twelve Pages or Less
Hal Daumé III
Unpublished, 2004
[BibTeX]
@Unpublished{daume04rkhs,
author = {Hal {Daum\'e III}},
title = {From Zero to Reproducing Kernel Hilbert Spaces in Twelve Pages or Less},
note = {Available at \url{http://www.isi.edu/~hdaume/docs/daume04rkhs.ps}},
month = {February},
year = {2004}
}
A Tree-Position Kernel for Document Compression
Hal Daumé III and Daniel Marcu
Fourth Document Understanding Conference (DUC), 2004
[Abstract] [BibTeX]
We describe our entry into the DUC 2004 automatic document summarization competition. We competed only in the single document, headline generation task. Our system is based on a novel kernel dubbed the tree position kernel, combined with two other well-known kernels. Our system performs well on white-box evaluations, but does very poorly in the overall DUC evaluation. However, the latter results are offset by the fact that baseline systems consistently outperform well engineered systems.
@InProceedings{daume04treeposition,
author = {Hal {Daum\'e III} and Daniel Marcu},
title = {A Tree-Position Kernel for Document Compression},
booktitle = {Proceedings of the Fourth Document Understanding Conference (DUC)},
year = {2004},
address = {Boston, MA},
month = {May 6 -- 7},
url = {http://hal3.name/docs/#daume04treeposition}
}
Carefully Approximated Bayes Factors for Feature Selection in MaxEnt Models
Hal Daumé III
Unpublished, 2004
[BibTeX]
@Unpublished{daume04abffs,
author = {Hal {Daum\'e III}},
title = {Carefully Approximated {Bayes} Factors for Feature Selection in MaxEnt
Models},
note = {Available at \url{http://www.isi.edu/~hdaume/docs/daume04abffs.ps}},
month = {November},
year = {2004}
}
Supervised clustering with the Dirichlet process
Hal Daumé III and Daniel Marcu
NeurIPS Workshop on Learning With Structured Outputs (LwSO), 2004
[Abstract] [BibTeX]
The task of learning to partition data into similar sets occurs frequently in many disciplines. We construct a Bayesian model for learning to partition from labeled data. Our model is based on the nonparametric Dirichlet process prior. Experimental results show that our model is able to outperform existing solutions on real world datasets.
@InProceedings{daume04scm,
author = {Hal {Daum\'e III} and Daniel Marcu},
title = {Supervised clustering with the Dirichlet process},
booktitle = {NeurIPS Workshop on Learning With Structured Outputs (LwSO)},
year = {2004},
address = {Whistler, Canada},
url = {http://hal3.name/docs/#daume04scm}
}
Notes on CG and LM-BFGS Optimization of Logistic Regression
Hal Daumé III
Unpublished, 2004
[BibTeX]
@Unpublished{daume04cg-bfgs,
author = {Hal {Daum\'e III}},
title = {Notes on {CG} and {LM-BFGS} Optimization of Logistic Regression},
note = {Paper available at \url{http://hal3.name/docs/#daume04cg-bfgs},
implementation available at \url{http://hal3.name/megam/}},
month = {August},
year = {2004}
}
Bayesian Methods
Flexible Modeling of Latent Task Structures in Multitask Learning
Alexandre Passos, Piyush Rai, Jacques Wainer and Hal Daumé III
International Conference on Machine Learning (ICML), 2012
[Abstract] [BibTeX]
Multitask learning algorithms are typically designed assuming some fixed, a priori known latent structure shared by all the tasks. However, it is usually unclear what type of latent task structure is the most appropriate for a given multitask learning problem. Ideally, the "right" latent task structure should be learned in a data-driven manner. We present a flexible, nonparametric Bayesian model that posits a mixture of factor analyzers structure on the tasks. The nonparametric aspect makes the model expressive enough to subsume many existing models of latent task structures (e.g, meanregularized tasks, clustered tasks, low-rank or linear/non-linear subspace assumption on tasks, etc.). Moreover, it can also learn more general task structures, addressing the shortcomings of such models. We present a variational inference algorithm for our model. Experimental results on synthetic and realworld datasets, on both regression and classification problems, demonstrate the effectiveness of the proposed method.
@InProceedings{daume12flexiblemtl,
author = {Alexandre Passos and Piyush Rai and Jacques Wainer and Hal {Daum\'e
III}},
title = {Flexible Modeling of Latent Task Structures in Multitask Learning},
booktitle = {International Conference on Machine Learning (ICML)},
year = {2012},
address = {Edinburgh, Scotland},
url = {http://hal3.name/docs/#daume12flexiblemtl}
}
Computational methods are invaluable for typology, but the models must match the questions: Commentary on Dunn et al. (2011)
Roger Levy and Hal Daumé III
Unpublished, 2011
[BibTeX]
@Misc{daume11dunn,
author = {Roger Levy and Hal {Daum\'e III}},
title = {Computational methods are invaluable for typology, but the models must
match the questions: Commentary on Dunn et al. (2011)},
howpublished = {Journal of Linguistic Typology},
year = {2011},
url = {http://hal3.name/docs/#daume11dunn}
}
Beam Search based MAP Estimates for the Indian Buffet Process
Piyush Rai and Hal Daumé III
International Conference on Machine Learning (ICML), 2011
[BibTeX]
@InProceedings{daume11ibpsearch,
author = {Piyush Rai and Hal {Daum\'e III}},
title = {Beam Search based MAP Estimates for the Indian Buffet Process},
booktitle = {International Conference on Machine Learning (ICML)},
year = {2011},
address = {Bellevue, WA},
url = {http://hal3.name/docs/#daume11ibpsearch}
}
Multiview Clustering with Incomplete Views
Piyush Rai, Anusua Trivedi, Hal Daumé III and Scott L. DuVall
NeurIPS Workshop on Machine Learning for Social Computing, 2010
[Abstract] [BibTeX]
Multiview clustering algorithms allow leveraging information frommultiple views of the data and therefore lead to improved clustering. A number of kernel based multiview clustering algorithms work by using the kernel matrices defined on the different views of the data. However, these algorithms assume availability of features from all the views of each example, i.e., assume that the kernel matrix for each view is complete. We present an approach that allows these algorithms to be applicable even when only one (the primary) view is complete and the auxiliary views are incomplete (i.e., features from these views are available only for some of the examples). Taking the kernel CCA based multiview clustering as an example, we apply our method on webpage clustering with multiple views of the data where one view is the page-text and other view is the social tags assigned to the webpage. We consider the case when the tags are available only for a small subset of the webpages which means that the tag view is incomplete. Experimental results establish the effectiveness of the proposed method.
@InProceedings{daume10mvincomplete,
author = {Piyush Rai and Anusua Trivedi and Hal {Daum\'e III} and Scott L.
DuVall},
title = {Multiview Clustering with Incomplete Views},
booktitle = {NeurIPS Workshop on Machine Learning for Social Computing},
year = {2010},
address = {Whistler, Canada},
url = {http://hal3.name/docs/#daume10mvincomplete}
}
Multitask Learning via Mixture of Linear Subspaces
Piyush Rai and Hal Daumé III
NeurIPS Workshop on Transfer Learning by Learning Rich Generative Models, 2010
[Abstract] [BibTeX]
We propose a probabilistic generative model for multitask learning that exploits the cluster structure of the task parameters, and additionally imposes a low-rank constraint on the set of task parameters within each cluster. This leads to a sharing of statistical strengths of multiple tasks at two levels: (1) via cluster assumption, and (2) via a subspace assumption within each cluster. Our work brings in the benefits of both these aspects of task relationship, each of which has been addressed only individually in prior work. We assume a mixture of linear subspaces model on the latent task parameters that can capture both these aspects simultaneously. Furthermore, the mixture of subspaces assumption can model the fact that the task parameters could potentially live on a non-linear manifold instead of a linear subspace which is a restriction of earlier work on multitask learning based on the linear subspace assumption.
@InProceedings{daume10mtlmls,
author = {Piyush Rai and Hal {Daum\'e III}},
title = {Multitask Learning via Mixture of Linear Subspaces},
booktitle = {NeurIPS Workshop on Transfer Learning by Learning Rich Generative
Models},
year = {2010},
address = {Whistler, Canada},
url = {http://hal3.name/docs/#daume10mtlmls}
}
A geometric view of conjugate priors
Arvind Agarwal and Hal Daumé III
Machine Learning Journal (MLJ), 2010
[Abstract] [BibTeX]
In Bayesian machine learning, conjugate priors are popular, mostly due to mathematical convenience. In this paper, we show that there are deeper reasons for choosing a conjugate prior. Specifically, we formulate the conjugate prior in the form of Bregman divergence and show that it is the inherent geometry of conjugate priors that makes them appropriate and intuitive. This geometric interpretation allows one to view the hyperparameters of conjugate priors as the effective sample points, thus providing additional intuition. We use this geometric understanding of conjugate priors to derive the hyperparameters and expression of the prior used to couple the generative and discriminative components of a hybrid model for semi-supervised learning.
@article{daume10conjugate,
author = {Arvind Agarwal and Hal {Daum\'e III}},
title = {A geometric view of conjugate priors},
year = {2010},
booktitle = {Machine Learning Journal (MLJ)},
volume = {81},
number = {1},
url = {http://hal3.name/docs/#daume10conjugate}
}
Infinite Predictor Subspace Models for Multitask Learning
Piyush Rai and Hal Daumé III
Conference on Artificial Intelligence and Statistics (AI-Stats), 2010
[Abstract] [BibTeX]
Given several related learning tasks, we propose a nonparametric Bayesian model that captures task relatedness by assuming that the task parameters (i.e., predictors) share a latent subspace. More specifically, the intrinsic dimensionality of the task subspace is not assumed to be known a priori. We use an infinite latent feature model to automatically infer this number (depending on and limited by only the number of tasks). Furthermore, our approach is applicable when the underlying task parameter subspace is inherently sparse, drawing parallels with l1 regularization and LASSO-style models. We also propose an augmented model which can make use of (labeled, and additionally unlabeled if available) inputs to assist learning this subspace, leading to further improvements in the performance. Experimental results demonstrate the efficacy of both the proposed approaches, especially when the number of examples per task is small. Finally, we discuss an extension of the proposed framework where a nonparametric mixture of linear subspaces can be used to learn a nonlin- ear manifold over the task parameters, and also deal with the issue of negative transfer from unrelated tasks.
@InProceedings{daume10subspace,
author = {Piyush Rai and Hal {Daum\'e III}},
title = {Infinite Predictor Subspace Models for Multitask Learning},
booktitle = {Proceedings of the Conference on Artificial Intelligence and
Statistics (AI-Stats)},
year = {2010},
address = {Sardinia, Italy},
url = {http://hal3.name/docs/#daume10subspace}
}
Fast Search for Infinite Latent Feature Models
Piyush Rai and Hal Daumé III
NeurIPS Workshop on Non-parametric Bayes (NP-Bayes), 2009
[Abstract] [BibTeX]
We propose several search based alternatives for inference in the Indian Buffet Process (IBP) based models. We consider the case when we only want a maximum a posteriori (MAP) estimate of the latent feature assignment matrix. If true posterior samples are required, these MAP estimates can also serve as intelligent initializers for MCMC based algorithms. Another advantage of the proposed methods is that they can process one observation at a time making it possible to do inference in an online setting. Experimental evidences suggest that these algorithms can give us computational benefits of an order of magnitude over Gibbs sampling (or its sequential variant - the particle filter) traditionally used in IBP based models.
@InProceedings{daume09ibpsearch,
author = {Piyush Rai and Hal {Daum\'e III}},
title = {Fast Search for Infinite Latent Feature Models},
booktitle = {Proceedings of NeurIPS Workshop on Non-parametric Bayes (NP-Bayes)},
year = {2009},
address = {Vancouver, Canada},
url = {http://hal3.name/docs/#daume09ibpsearch}
}
Bayesian Multitask Learning with Latent Hierarchies
Hal Daumé III
Conference on Uncertainty in Artificial Intelligence (UAI), 2009
[Abstract] [BibTeX]
We learn multiple hypotheses for related tasks under a latent hierarchical relationship between tasks. We exploit the intuition that for \emphdomain adaptation, we wish to share classifier structure, but for \emphmultitask learning, we wish to share covariance structure. Our hierarchical model is seen to subsume several previously proposed multitask learning models and performs well on three distinct real-world data sets.
@InProceedings{daume09hiermtl,
author = {Hal {Daum\'e III}},
title = {Bayesian Multitask Learning with Latent Hierarchies},
booktitle = {Conference on Uncertainty in Artificial Intelligence (UAI)},
year = {2009},
address = {Montreal, Canada},
url = {http://hal3.name/docs/#daume09hiermtl}
}
Non-Parametric Bayesian Model Areal Linguistics
Hal Daumé III
North American Chapter of the Association for Computational Linguistics (NAACL), 2009
[Abstract] [BibTeX]
We describe a statistical model over linguistic areas and phylogeny. Our model recovers known areas and identifies a plausible hierarchy of areal features. The use of areas improves genetic reconstruction of languages both qualitatively and quantitatively according to a variety of metrics. We model linguistic areas by a Pitman-Yor process and linguistic phylogeny by Kingman's coalescent.
@InProceedings{daume09areal,
author = {Hal {Daum\'e III}},
title = {Non-Parametric {B}ayesian Model Areal Linguistics},
booktitle = {North American Chapter of the Association for Computational
Linguistics (NAACL)},
year = {2009},
address = {Boulder, CO},
url = {http://hal3.name/docs/#daume09areal}
}
Multi-Label Prediction via Sparse Infinite CCA
Piyush Rai and Hal Daumé III
Conference on Neural Information Processing Systems (NeurIPS), 2009
[Abstract] [BibTeX]
Canonical Correlation Analysis (CCA) is a useful technique for modeling dependencies between two (or more) sets of variables. Building upon the recently suggested probabilistic interpretation of CCA, we propose a nonparametric, fully Bayesian framework that can automatically select the number of correlation components, and effectively capture the sparsity underlying the projections. In addition, given (partially) labeled data, our algorithm can also be used as a (semi)supervised dimensionality reduction technique, and can be applied to learn useful predictive features in the context of learning a set of related tasks. Experimental results demonstrate the efficacy of the proposed approach for both CCA as a stand-alone problem, and when applied to multi-label prediction.
@InProceedings{daume09cca,
author = {Piyush Rai and Hal {Daum\'e III}},
title = {Multi-Label Prediction via Sparse Infinite {CCA}},
booktitle = {Proceedings of the Conference on Neural Information Processing
Systems (NeurIPS)},
year = {2009},
address = {Vancouver, Canada},
url = {http://hal3.name/docs/#daume09cca}
}
A Bayesian Statistics Approach to Multiscale Coarse Graining
Pu Liu, Qiang Shi, Hal Daumé III and Gregory Voth
Journal of Chemical Physics (J.ChPhys), 2009
[Abstract] [BibTeX]
Coarse-grained (CG) modeling provides a promising way to investigate many important physical and biological phenomena over large spatial and temporal scales. The multiscale coarse-graining (MS-CG) method has been proven to be a thermodynamically consistent way to systematically derive a CG model from atomistic force information, as shown in a variety of systems, ranging from simple liquids to proteins embedded in lipid bilayers. In the present work, Bayes' theorem, an advanced statistical tool widely used in signal processing and pattern recognition, is adopted to further improve the MS-CG force field obtained from the CG modeling. This approach can regularize the linear equation resulting from the underlying force-matching methodology, therefore substantially improving the quality of the MS-CG force field, especially for the regions with limited sampling. Moreover, this Bayesian approach can naturally provide an error estimation for each force field parameter, from which one can know the extent the results can be trusted. The robustness and accuracy of the Bayesian MS-CG algorithm is demonstrated for three different systems, including simple liquid methanol, polyalanine peptide solvated in explicit water, and a much more complicated peptide assembly with 32 NNQQNY hexapeptides.
@Article{daume09graining,
author = {Pu Liu and Qiang Shi and Hal {Daum\'e III} and Gregory Voth},
title = {A Bayesian Statistics Approach to Multiscale Coarse Graining},
journal = {Journal of Chemical Physics (J.ChPhys)},
year = {2009},
volume = {129},
number = {21},
pages = {214114},
month = {December},
}
Markov Random Topic Fields
Hal Daumé III
Association for Computational Linguistics (ACL), 2009
[Abstract] [BibTeX]
Most approaches to topic modeling assume an independence between documents that is frequently violated. We present an topic model that makes use of one or more user-specified graphs describing relationships between documents. These graph are encoded in the form of a Markov random field over topics and serve to encourage related documents to have similar topic structures. Experiments on show upwards of a $10\%$ improvement in modeling performance.
@InProceedings{daume09mrtf,
author = {Hal {Daum\'e III}},
title = {Markov Random Topic Fields},
booktitle = {Association for Computational Linguistics (ACL)},
year = {2009},
address = {Singapore},
url = {http://hal3.name/docs/#daume09mrtf}
}
Multitask Learning using Nonparametrically Learned Predictor Subspaces
Piyush Rai and Hal Daumé III
NeurIPS Workshop on Learning from Multiple Sources, 2009
[Abstract] [BibTeX]
Given several related learning tasks, we propose a nonparametric Bayesian learning model that captures task relatedness by assuming that the task parameters (i.e., weight vectors) share a latent subspace. More specifically, the intrinsic dimensionality of this subspace is not assumed to be known a priori. We use an infinite latent feature model - the Indian Buffet Process - to automatically infer this number. We also propose extensions of this model where the subspace learning can incorporate (labeled, and additionally unlabeled if available) examples, or the task parameters share a mixture of subspaces, instead of sharing a single subspace. The latter property can allow learning nonlinear manifold structure underlying the task parameters, and can also help in preventing negative transfer from outlier tasks.
@InProceedings{daume09subspacemtl,
author = {Piyush Rai and Hal {Daum\'e III}},
title = {Multitask Learning using Nonparametrically Learned Predictor Subspaces},
booktitle = {NeurIPS Workshop on Learning from Multiple Sources},
year = {2009},
address = {Whistler, Canada},
url = {http://hal3.name/docs/#daume09subspacemtl}
}
Factor Regression Combining Heterogeneous Sources of Information
Amrish Kapoor, Piyush Rai and Hal Daumé III
NeurIPS Workshop on Learning From Multiple Sources with Applications to Robotics (LMS), 2009
[Abstract] [BibTeX]
We present a non-parametric Bayesian factor regression model that combines two heterogeneous sources of information: gene expression arrays and text from their corresponding PubMed abstracts. Our model approximates a pLSI style model and results in improved regression accuracy. We apply this model to gene-expression data analysis, but it is extendable to other problems exhibiting a similar heterogeneous multiplicity in sources of information, like financial analysis, weather prediction and others.
@InProceedings{daume09hetero,
author = {Amrish Kapoor and Piyush Rai and Hal {Daum\'e III}},
title = {Factor Regression Combining Heterogeneous Sources of Information},
booktitle = {Proceedings of NeurIPS Workshop on Learning From Multiple Sources
with Applications to Robotics (LMS)},
year = {2009},
address = {Vancouver, Canada},
url = {http://hal3.name/docs/#daume09hetero}
}
Unsupervised Part of Speech Tagging Without a Lexicon
Adam R. Teichert and Hal Daumé III
NeurIPS Workshop on Grammar Induction, Representation of Language and Language Learning (GIRLLL), 2009
[BibTeX]
@InProceedings{daume09typpos,
author = {Adam R. Teichert and Hal {Daum\'e III}},
title = {Unsupervised Part of Speech Tagging Without a Lexicon},
booktitle = {NeurIPS Workshop on Grammar Induction, Representation of Language
and Language Learning (GIRLLL)},
year = {2009},
address = {Vancouver, Canada},
url = {http://hal3.name/docs/#daume09typpos}
}
The Infinite Hierarchical Factor Regression Model
Piyush Rai and Hal Daumé III
Conference on Neural Information Processing Systems (NeurIPS), 2008
[BibTeX]
@InProceedings{daume08ihfrm,
author = {Piyush Rai and Hal {Daum\'e III}},
title = {The Infinite Hierarchical Factor Regression Model},
booktitle = {Proceedings of the Conference on Neural Information Processing
Systems (NeurIPS)},
year = {2008},
address = {Vancouver, Canada},
url = {http://hal3.name/docs/#daume08ihfrm}
}
Fast search for Dirichlet process mixture models
Hal Daumé III
Eleventh International Conference on Artificial Intelligence and Statistics (AIStats), 2007
[Abstract] [BibTeX]
Dirichlet process (DP) mixture models provide a flexible Bayesian framework for density estimation. Unfortunately, their flexibility comes at a cost: inference in DP mixture models is computationally expensive, even when conjugate distributions are used. In the common case when one seeks only a maximum a posteriori assignment of data points to clusters, we show that search algorithms provide a practical alternative to expensive MCMC and variational techniques. When a true posterior sample is desired, the solution found by search can serve as a good initializer for MCMC. Experimental results show that using these techniques is it possible to apply DP mixture models to very large data sets.
@InProceedings{daume07astar-dp,
author = {Hal {Daum\'e III}},
title = {Fast search for Dirichlet process mixture models},
booktitle = {Proceedings of the Eleventh International Conference on Artificial
Intelligence and Statistics (AIStats)},
year = {2007},
address = {San Juan, Puerto Rico},
url = {http://hal3.name/docs/#daume07astar-dp}
}
Bayesian Agglomerative Clustering with Coalescents
Yee Whye Teh, Hal Daumé III and Daniel Roy
Conference on Neural Information Processing Systems (NeurIPS), 2007
[Abstract] [BibTeX]
We introduce a new Bayesian model for hierarchical clustering based on a prior over trees called Kingman's coalescent. We develop novel greedy and sequential Monte Carlo inferences which operate in a bottom-up agglomerative fashion. We show experimentally the superiority of our algorithms over others, and demonstrate our approach in document clustering and phylolinguistics.
@InProceedings{daume07coalescent,
author = {Yee Whye Teh and Hal {Daum\'e III} and Daniel Roy},
title = {Bayesian Agglomerative Clustering with Coalescents},
booktitle = {Proceedings of the Conference on Neural Information Processing
Systems (NeurIPS)},
year = {2007},
address = {Vancouver, Canada},
url = {http://hal3.name/docs/#daume07coalescent}
}
A Bayesian Model for Discovering Typological Implications
Hal Daumé III and Lyle Campbell
Conference of the Association for Computational Linguistics (ACL), 2007
[Abstract] [BibTeX]
A standard form of analysis for linguistic typology is the universal implication. These implications state facts about the range of extant languages, such as "if objects come after verbs, then adjectives come after nouns." Such implications are typically discovered by painstaking hand analysis over a small sample of languages. We propose a computational model for assisting at this process. Our model is able to discover both well-known implications as well as some novel implications that deserve further study. Moreover, through a careful application of hierarchical analysis, we are able to cope with the well-known sampling problem: languages are not independent.
@InProceedings{daume07implication,
author = {Hal {Daum\'e III} and Lyle Campbell},
title = {A {B}ayesian Model for Discovering Typological Implications},
booktitle = {Conference of the Association for Computational Linguistics (ACL)},
year = {2007},
address = {Prague, Czech Republic},
url = {http://hal3.name/docs/#daume07implication}
}
Bayesian Query-Focused Summarization
Hal Daumé III and Daniel Marcu
Conference of the Association for Computational Linguistics (ACL), 2006
[Abstract] [BibTeX]
We present BayeSum (for "Bayesian summarization"), a model for sentence extraction in query-focused summarization. BayeSum leverages the common case in which multiple documents are relevant to a single query. Using these documents as reinforcement for query terms, BayeSum is not afflicted by the paucity of information in short queries. We show that approximate inference in BayeSum is possible on large data sets and results in a state-of-the-art summarization system. Furthermore, we show how BayeSum can be understood as a justified query expansion technique in the language modeling for IR framework.
@InProceedings{daume06bqfs,
author = {Hal {Daum\'e III} and Daniel Marcu},
title = {Bayesian Query-Focused Summarization},
booktitle = {Proceedings of the Conference of the Association for Computational
Linguistics (ACL)},
year = {2006},
address = {Sydney, Australia},
url = {http://hal3.name/docs/#daume06bqfs}
}
A Bayesian Model for Supervised Clustering with the Dirichlet Process Prior
Hal Daumé III and Daniel Marcu
Journal of Machine Learning Research (JMLR), 2005
[Abstract] [BibTeX]
We develop a Bayesian framework for tackling the supervised clustering problem, the generic problem encountered in problems such as reference matching, coreference resolution, identity uncertainty and record linkage. Our clustering model is based on the non-parametric Dirichlet process prior, which enables us to define distributions over the countably infinite sets that naturally arise in this problem. We add \emphsupervision to our model by positing the existence of a set of unobserved random variables (we call these "reference types") that are generic across all clusters. Inference in our framework, which require integrating over infinitely many parameters, is solved using Markov chain Monte Carlo techniques. We present algorithms for both conjugate and non-conjugate priors. We present a simple -- but general -- parameterization of our model based on a Gaussian assumption. We evaluate this model on one artificial task and three real-world tasks, comparing it against both unsupervised and state-of-the-art supervised algorithms. Our results show that our model is able to outperform other models for this task across a variety of performance metrics.
@Article{daume05dpscm,
author = {Hal {Daum\'e III} and Daniel Marcu},
title = {A {B}ayesian Model for Supervised Clustering with the {D}irichlet
Process Prior},
journal = {Journal of Machine Learning Research (JMLR)},
year = {2005},
month = {September},
volume = {6},
pages = {1551--1577},
url = {http://hal3.name/docs/#daume05dpscm}
}
Supervised clustering with the Dirichlet process
Hal Daumé III and Daniel Marcu
NeurIPS Workshop on Learning With Structured Outputs (LwSO), 2004
[Abstract] [BibTeX]
The task of learning to partition data into similar sets occurs frequently in many disciplines. We construct a Bayesian model for learning to partition from labeled data. Our model is based on the nonparametric Dirichlet process prior. Experimental results show that our model is able to outperform existing solutions on real world datasets.
@InProceedings{daume04scm,
author = {Hal {Daum\'e III} and Daniel Marcu},
title = {Supervised clustering with the Dirichlet process},
booktitle = {NeurIPS Workshop on Learning With Structured Outputs (LwSO)},
year = {2004},
address = {Whistler, Canada},
url = {http://hal3.name/docs/#daume04scm}
}
Structured Prediction
Unsupervised Search-based Structured Prediction
Hal Daumé III
International Conference on Machine Learning (ICML), 2009
[Abstract] [BibTeX]
We describe an adaptation and application of a search-based structured prediction algorithm "Searn" to unsupervised learning problems. We show that it is possible to reduce unsupervised learning to supervised learning and demonstrate a high-quality unsupervised shift-reduce parsing model. We additionally show a close connection between unsupervised Searn and expectation maximization. Finally, we demonstrate the efficacy of a semi-supervised extension. The key idea that enables this is an application of the predict-self idea for unsupervised learning.
@InProceedings{daume09unsearn,
author = {Hal {Daum\'e III}},
title = {Unsupervised Search-based Structured Prediction},
booktitle = {International Conference on Machine Learning (ICML)},
year = {2009},
address = {Montreal, Canada},
url = {http://hal3.name/docs/#daume09unsearn}
}
Search-based Structured Prediction
Hal Daumé III, John Langford and Daniel Marcu
Machine Learning Journal (MLJ), 2009
[Abstract] [BibTeX]
We present Searn, an algorithm for integrating search and learning to solve complex structured prediction problems such as those that occur in natural language, speech, computational biology, and vision. Searn is a meta-algorithm that transforms these complex problems into simple classification problems to which any binary classifier may be applied. Unlike current algorithms for structured learning that require decomposition of both the loss function and the feature functions over the predicted structure, Searn is able to learn prediction functions for any loss function and any class of features. Moreover, Searn comes with a strong, natural theoretical guarantee: good performance on the derived classification problems implies good performance on the structured prediction problem.
@article{daume09searn,
author = {Hal {Daum\'e III} and John Langford and Daniel Marcu},
title = {Search-based Structured Prediction},
year = {2009},
booktitle = {Machine Learning Journal (MLJ)},
url = {http://hal3.name/docs/#daume09searn}
}
Cross-Task Knowledge-Constrained Self Training
Hal Daumé III
Empirical Methods in Natural Language Processing (EMNLP), 2008
[Abstract] [BibTeX]
We present an algorithmic framework for learning multiple related tasks. Our framework exploits a form of prior knowledge that relates the output spaces of these tasks. We present PAC learning results that analyze the conditions under which such learning is possible. We present results on learning a shallow parser and named-entity recognition system that exploits our framework, showing consistent improvements over baseline methods.
@InProceedings{daume08hints,
author = {Hal {Daum\'e III}},
title = {Cross-Task Knowledge-Constrained Self Training},
booktitle = {Empirical Methods in Natural Language Processing (EMNLP)},
year = {2008},
address = {Honolulu, Hawaii},
url = {http://hal3.name/docs/#daume08hints}
}
Structure Compilation: Trading Structure for Features
Percy Liang, Hal Daumé III and Dan Klein
International Conference on Machine Learning (ICML), 2008
[Abstract] [BibTeX]
Structured models often achieve excellent performance but can be slow at test time. We investigate structure compilation, where we replace structure with features, which are often computationally simpler but unfortunately statistically more complex. We analyze this tradeoff theoretically and empirically on three natural language processing tasks. We also introduce a simple method to transfer predictive power from structure to features via unlabeled data, while incurring a minimal statistical penalty.
@InProceedings{daume08flat,
author = {Percy Liang and Hal {Daum\'e III} and Dan Klein},
title = {Structure Compilation: Trading Structure for Features},
booktitle = {International Conference on Machine Learning (ICML)},
year = {2008},
address = {Helsinki, Finland},
url = {http://hal3.name/docs/#daume08flat}
}
Practical Structured Learning Techniques for Natural Language Processing
Hal Daumé III
Ph.D. Thesis, 2006
[BibTeX]
@PhdThesis{daume06thesis,
author = {Hal {Daum\'e III}},
title = {Practical Structured Learning Techniques for Natural Language
Processing},
school = {University of Southern California},
year = {2006},
address = {Los Angeles, CA},
month = {August},
url = {http://hal3.name/docs/#daume06thesis}
}
Searn in Practice
Hal Daumé III, John Langford and Daniel Marcu
Unpublished, 2006
[Abstract] [BibTeX]
We recently introduced an algorithm, Searn, for solving hard structured prediction problems. This algorithm enjoys many nice properties: efficiency, wide applicability, theoretical justification and simplicity. However, under a desire to fit a lot of information into the original paper, it may not be so clear how simple the technique is. This report is designed to showcase how Searn can be applied to a wide variety of techniques and what really goes on behind the scenes. We will make use of three example problems, ranging from simple to complex. These are: (1) sequence labeling, (2) parsing and (3) machine translation. (These were chosen to be as widely understandable, especially in the NLP community, as possible.) In the end, we will come back to discuss Searn for general problems.
@unpublished{daume06searn-practice,
author = {Hal {Daum\'e III} and John Langford and Daniel Marcu},
title = {Searn in Practice},
year = {2006},
url = {http://hal3.name/docs/#daume06searn-practice}
}
Search-Based Structured Prediction as Classification
Hal Daumé III, John Langford and Daniel Marcu
NeurIPS Workshop on Advances in Structured Learning for Text and Speech Processing (ASLTSP), 2005
[Abstract] [BibTeX]
Solutions to computationally hard problems often require that search be used. Integrating search into the learning phase has been previously proposed in an ad-hoc manner (Daume & Marcu, 2005). In this paper, we show that structured prediction can be mapped into a search setting using language from reinforcement learning, and known techniques for reinforcement learning (Langford et al., 2005) can give formal performance bounds on the structured prediction task.
@InProceedings{daume05search,
author = {Hal {Daum\'e III} and John Langford and Daniel Marcu},
title = {Search-Based Structured Prediction as Classification},
booktitle = {NeurIPS Workshop on Advances in Structured Learning for Text and
Speech Processing (ASLTSP)},
year = {2005},
address = {Whistler, Canada},
url = {http://hal3.name/docs/#daume05search}
}
Learning as Search Optimization: Approximate Large Margin Methods for Structured Prediction
Errata: There are some technical errors in this paper; see On Learning Linear Ranking Functions for Beam Search by Xu and Fern, ICML 2007, at http://web.engr.oregonstate.edu/~afern/papers/beam-icml07.pdf for more details.
Hal Daumé III and Daniel Marcu
International Conference on Machine Learning (ICML), 2005
[Abstract] [BibTeX]
Mappings to structured output spaces (strings, trees, partitions, etc.) are typically learned using extensions of classification algorithms to simple graphical structures (eg., linear chains) in which search and parameter estimation can be performed exactly. Unfortunately, in many complex problems, it is rare that exact search or parameter estimation is tractable. Instead of learning exact models and searching via heuristic means, we embrace this difficulty and treat the structured output problem in terms of approximate search. We present a framework for learning as search optimization, and two parameter updates with convergence theorems and bounds. Empirical evidence shows that our integrated approach to learning and decoding can outperform exact models at smaller computational cost.
@InProceedings{daume05laso,
author = {Hal {Daum\'e III} and Daniel Marcu},
title = {Learning as Search Optimization: Approximate Large Margin Methods for
Structured Prediction},
booktitle = {International Conference on Machine Learning (ICML)},
year = {2005},
address = {Bonn, Germany},
errata = {There are some technical errors in this paper; see On Learning Linear
Ranking Functions for Beam Search by Xu and Fern, ICML 2007, at
http://web.engr.oregonstate.edu/~afern/papers/beam-icml07.pdf for
more details.},
url = {http://hal3.name/docs/#daume05laso}
}
NP Bracketing by Maximum Entropy Tagging and SVM Reranking
Hal Daumé III and Daniel Marcu
Empirical Methods in Natural Language Processing, 2004
[Abstract] [BibTeX]
We perform Noun Phrase Bracketing by using a local, maximum entropy-based tagging model, which produces bracketing hypotheses. These hypotheses are subsequently fed into a reranking framework based on support vector machines. We solve the problem of hierarchical structure in our tagging model by modeling underspecified tags, which are fully determined only at decoding time. The tagging model performs comparably to competing approaches and the subsequent reranking increases our system's performance from an f-score of $81.7$ to $86.1$, surpassing the best reported results to date of $83.8$.
@InProceedings{daume04bracketing,
author = {Hal {Daum\'e III} and Daniel Marcu},
title = {NP Bracketing by Maximum Entropy Tagging and {SVM} Reranking},
booktitle = {Empirical Methods in Natural Language Processing},
year = {2004},
address = {Barcelona, Spain},
url = {http://hal3.name/docs/#daume04bracketing}
}
Domain Adaptation/Multitask Learning
Learning Task Grouping and Overlap in Multi-task Learning
Abhishek Kumar and Hal Daumé III
International Conference on Machine Learning (ICML), 2012
[Abstract] [BibTeX]
In the paradigm of multi-task learning, multiple related prediction tasks are learned jointly, sharing information across the tasks. We propose a framework for multi-task learning that enables one to selectively share the information across the tasks. We assume that each task parameter vector is a linear combination of a finite number of underlying basis tasks. The coefficients of the linear combination are sparse in nature and the overlap in the sparsity patterns of two tasks controls the amount of sharing across these. Our model is based on the assumption that task parameters within a group lie in a low dimensional subspace but allows the tasks in different groups to overlap with each other in one or more bases. Experimental results on four datasets show that our approach outperforms competing methods.
@InProceedings{daume12gomtl,
author = {Abhishek Kumar and Hal {Daum\'e III}},
title = {Learning Task Grouping and Overlap in Multi-task Learning},
booktitle = {International Conference on Machine Learning (ICML)},
year = {2012},
url = {http://hal3.name/docs/#daume12gomtl}
}
Online Learning of Multiple Tasks and Their Relationships
Avishek Saha, Piyush Rai, Hal Daumé III and Suresh Venkatasubramanian
Conference on Artificial Intelligence and Statistics (AI-Stats), 2011
[Abstract] [BibTeX]
We propose an Online MultiTask Learning (OMTL) framework which simultaneously learns the task weight vectors as well as the task relatedness adaptively from the data. Our work is in contrast with prior work on online multitask learning which assumes fixed task relatedness, a priori. Furthermore, whereas prior work in such settings assume only positively correlated tasks, our framework can capture negative correlations as well. Our proposed framework learns the task relationship matrix by framing the objective function as a Bregman divergence minimization problem for positive definite matrices. Subsequently, we exploit this adaptively learned task-relationship matrix to select the most informative samples in an online multitask active learning setting. Experimental results on a number of real-world datasets and comparisons with numerous baselines establish the efficacy of our proposed approach.
@InProceedings{daume11olmt,
author = {Avishek Saha and Piyush Rai and Hal {Daum\'e III} and Suresh
Venkatasubramanian},
title = {Online Learning of Multiple Tasks and Their Relationships},
booktitle = {Conference on Artificial Intelligence and Statistics (AI-Stats)},
year = {2011},
address = {Ft. Lauderdale, FL},
url = {http://hal3.name/docs/#daume11olmt}
}
Domain Adaptation meets Active Learning
Piyush Rai, Avishek Saha, Hal Daumé III and Suresh Venkatasubramanian
HLT/NAACL Workshop on Active Learning for NLP (ALNLP), 2010
[Abstract] [BibTeX]
In this work, we show how active learning in some (target) domain can leverage information from a different but related (source) domain. We present an algorithm that harnesses the source domain data to learn the best possible initializer hypothesis for doing active learning in the target domain, resulting in improved label complexity. We also present a variant of this algorithm which additionally uses the domain divergence information to selectively query the most informative points in the target domain, leading to further reductions in label complexity. Experimental results on a variety of datasets establish the efficacy of the proposed methods.
@InProceedings{daume10daal,
author = {Piyush Rai and Avishek Saha and Hal {Daum\'e III} and Suresh
Venkatasubramanian},
title = {Domain Adaptation meets Active Learning},
booktitle = {Proceedings of HLT/NAACL Workshop on Active Learning for NLP
(ALNLP)},
year = {2010},
address = {Los Angeles, CA},
url = {http://hal3.name/docs/#daume10daal}
}
A Co-regularization Based Semi-supervised Domain Adaptation
Abhishek Kumar, Avishek Saha and Hal Daumé III
Conference on Neural Information Processing Systems (NeurIPS), 2010
[Abstract] [BibTeX]
This paper presents a co-regularization based approach to semi-supervised domain adaptation. Our proposed approach (EA++) builds on the notion of augmented space (introduced in EASYADAPT (EA) [1]) and harnesses unlabeled data in target domain to further enable the transfer of information from source to target. This semi-supervised approach to domain adaptation is extremely simple to implement and can be applied as a pre-processing step to any supervised learner. Our theoretical analysis (in terms of Rademacher complexity) of EA and EA++ show that the hypothesis class of EA++ has lower complexity (compared to EA) and hence results in tighter generalization bounds. Experimental results on sentiment analysis tasks reinforce our theoretical findings and demonstrate the efficacy of the proposed method when compared to EA as well as a few other baseline approaches.
@InProceedings{daume10coreg,
author = {Abhishek Kumar and Avishek Saha and Hal {Daum\'e III}},
title = {A Co-regularization Based Semi-supervised Domain Adaptation},
booktitle = {Proceedings of the Conference on Neural Information Processing
Systems (NeurIPS)},
year = {2010},
address = {Vancouver, Canada},
url = {http://hal3.name/docs/#daume10coreg}
}
Learning Multiple Tasks using Manifold Regularization
Arvind Agarwal, Samuel Gerber and Hal Daumé III
Conference on Neural Information Processing Systems (NeurIPS), 2010
[Abstract] [BibTeX]
We present a novel method for multitask learning (MTL) based on manifold regularization. We assume that all task parameters lie on a manifold which is the generalization of the assumption made in the existing literature i.e., task parameters share a common linear subspace. The proposed method uses the projection distance from the manifold to regularize the task parameters. The manifold structure and the task parameters are learned using an alternating optimization framework. When the manifold structure is fixed, our method decomposes into learning independent tasks, making it appealing for learning new tasks. An approximation of the manifold regularization scheme is presented that preserves the convexity of the single task learning problem, and makes the proposed MTL framework efficient and easy to implement. We show the efficacy of our method on several datasets.
@InProceedings{daume10manifold,
author = {Arvind Agarwal and Samuel Gerber and Hal {Daum\'e III}},
title = {Learning Multiple Tasks using Manifold Regularization},
booktitle = {Proceedings of the Conference on Neural Information Processing
Systems (NeurIPS)},
year = {2010},
address = {Vancouver, Canada},
url = {http://hal3.name/docs/#daume10manifold}
}
Infinite Predictor Subspace Models for Multitask Learning
Piyush Rai and Hal Daumé III
Conference on Artificial Intelligence and Statistics (AI-Stats), 2010
[Abstract] [BibTeX]
Given several related learning tasks, we propose a nonparametric Bayesian model that captures task relatedness by assuming that the task parameters (i.e., predictors) share a latent subspace. More specifically, the intrinsic dimensionality of the task subspace is not assumed to be known a priori. We use an infinite latent feature model to automatically infer this number (depending on and limited by only the number of tasks). Furthermore, our approach is applicable when the underlying task parameter subspace is inherently sparse, drawing parallels with l1 regularization and LASSO-style models. We also propose an augmented model which can make use of (labeled, and additionally unlabeled if available) inputs to assist learning this subspace, leading to further improvements in the performance. Experimental results demonstrate the efficacy of both the proposed approaches, especially when the number of examples per task is small. Finally, we discuss an extension of the proposed framework where a nonparametric mixture of linear subspaces can be used to learn a nonlin- ear manifold over the task parameters, and also deal with the issue of negative transfer from unrelated tasks.
@InProceedings{daume10subspace,
author = {Piyush Rai and Hal {Daum\'e III}},
title = {Infinite Predictor Subspace Models for Multitask Learning},
booktitle = {Proceedings of the Conference on Artificial Intelligence and
Statistics (AI-Stats)},
year = {2010},
address = {Sardinia, Italy},
url = {http://hal3.name/docs/#daume10subspace}
}
Bayesian Multitask Learning with Latent Hierarchies
Hal Daumé III
Conference on Uncertainty in Artificial Intelligence (UAI), 2009
[Abstract] [BibTeX]
We learn multiple hypotheses for related tasks under a latent hierarchical relationship between tasks. We exploit the intuition that for \emphdomain adaptation, we wish to share classifier structure, but for \emphmultitask learning, we wish to share covariance structure. Our hierarchical model is seen to subsume several previously proposed multitask learning models and performs well on three distinct real-world data sets.
@InProceedings{daume09hiermtl,
author = {Hal {Daum\'e III}},
title = {Bayesian Multitask Learning with Latent Hierarchies},
booktitle = {Conference on Uncertainty in Artificial Intelligence (UAI)},
year = {2009},
address = {Montreal, Canada},
url = {http://hal3.name/docs/#daume09hiermtl}
}
Cross-Task Knowledge-Constrained Self Training
Hal Daumé III
Empirical Methods in Natural Language Processing (EMNLP), 2008
[Abstract] [BibTeX]
We present an algorithmic framework for learning multiple related tasks. Our framework exploits a form of prior knowledge that relates the output spaces of these tasks. We present PAC learning results that analyze the conditions under which such learning is possible. We present results on learning a shallow parser and named-entity recognition system that exploits our framework, showing consistent improvements over baseline methods.
@InProceedings{daume08hints,
author = {Hal {Daum\'e III}},
title = {Cross-Task Knowledge-Constrained Self Training},
booktitle = {Empirical Methods in Natural Language Processing (EMNLP)},
year = {2008},
address = {Honolulu, Hawaii},
url = {http://hal3.name/docs/#daume08hints}
}
Frustratingly Easy Domain Adaptation
Hal Daumé III
Conference of the Association for Computational Linguistics (ACL), 2007
🏆 Test of Time Award Nomination (2017)
[Abstract] [BibTeX]
We describe an approach to domain adaptation that is appropriate exactly in the case when one has enough "target" data to do slightly better than just using only "source" data. Our approach is incredibly simple, easy to implement as a preprocessing step (10 lines of Perl!) and outperforms state-of-the-art approaches on a range of datasets. The technique comes with several simple theoretical guarantees. Moreover, it is trivially extended to a multi-domain adaptation problem, where one has data from a variety of different domains.
@InProceedings{daume07easyadapt,
author = {Hal {Daum\'e III}},
title = {Frustratingly Easy Domain Adaptation},
booktitle = {Conference of the Association for Computational Linguistics (ACL)},
year = {2007},
address = {Prague, Czech Republic},
Domain Adaptation for Statistical Classifiers
Hal Daumé III and Daniel Marcu
Journal of Artificial Intelligence Research (JAIR), 2006
[Abstract] [BibTeX]
The most basic assumption used in statistical learning theory is that training data and test data are drawn from the same underlying distribution. Unfortunately, in many applications, the "in-domain" test data is drawn from a distribution that is related, but not identical, to the "out-of-domain" distribution of the training data. We consider the common case in which labeled out-of-domain data is plentiful, but labeled in-domain data is scarce. We introduce a statistical formulation of this problem in terms of a simple mixture model and present an instantiation of this framework to maximum entropy classifiers and their linear chain counterparts. We present efficient inference algorithms for this special case based on the technique of conditional expectation maximization. Our experimental results show that our approach leads to improved performance on three real world tasks on four different data sets from the natural language processing domain.
@article{daume06megam,
author = {Hal {Daum\'e III} and Daniel Marcu},
title = {Domain Adaptation for Statistical Classifiers},
journal = {Journal of Artificial Intelligence Research (JAIR)},
year = {2006},
volume = {26},
pages = {101--126},
url = {http://hal3.name/docs/#daume06megam}
}
Document Summarization
Bayesian Query-Focused Summarization
Hal Daumé III and Daniel Marcu
Conference of the Association for Computational Linguistics (ACL), 2006
[Abstract] [BibTeX]
We present BayeSum (for "Bayesian summarization"), a model for sentence extraction in query-focused summarization. BayeSum leverages the common case in which multiple documents are relevant to a single query. Using these documents as reinforcement for query terms, BayeSum is not afflicted by the paucity of information in short queries. We show that approximate inference in BayeSum is possible on large data sets and results in a state-of-the-art summarization system. Furthermore, we show how BayeSum can be understood as a justified query expansion technique in the language modeling for IR framework.
@InProceedings{daume06bqfs,
author = {Hal {Daum\'e III} and Daniel Marcu},
title = {Bayesian Query-Focused Summarization},
booktitle = {Proceedings of the Conference of the Association for Computational
Linguistics (ACL)},
year = {2006},
address = {Sydney, Australia},
url = {http://hal3.name/docs/#daume06bqfs}
}
Induction of Word and Phrase Alignments for Automatic Document Summarization
Hal Daumé III and Daniel Marcu
Computational Linguistics (CL), 2005
[Abstract] [BibTeX]
Current research in automatic single document summarization is dominated by two effective, yet na\"ive approaches: summarization by sentence extraction, and headline generation via bag-of-words models. While successful in some tasks, neither of these models is able to adequately capture the large set of linguistic devices utilized by humans when they produce summaries. One possible explanation for the widespread use of these models is that good techniques have been developed to extract appropriate training data for them from existing document/abstract and document/headline corpora. We believe that future progress in automatic summarization will be driven both by the development of more sophisticated, linguistically informed models, as well as a more effective leveraging of document/abstract corpora. In order to open the doors to simultaneously achieving both of these goals, we have developed techniques for automatically producing word-to-word and phrase-to-phrase \emphalignments between documents and their human-written abstracts. These alignments make explicit the correspondences that exist in such document/abstract pairs, and create a potentially rich data source from which complex summarization algorithms may learn. This paper describes experiments we have carried out to analyze the ability of \emphhumans to perform such alignments, and based on these analyses, we describe experiments for creating them automatically. Our model for the alignment task is based on an extension of the standard hidden Markov model, and learns to create alignments in a completely unsupervised fashion. We describe our model in detail and present experimental results that show that our model is able to learn to reliably identify word- and phrase-level alignments in a corpus of \docabs\ pairs.
@Article{daume05alignments,
author = {Hal {Daum\'e III} and Daniel Marcu},
title = {Induction of Word and Phrase Alignments for Automatic Document
Summarization},
journal = {Computational Linguistics (CL)},
year = {2005},
month = {December},
volume = {31},
number = {4},
pages = {505--530},
url = {http://hal3.name/docs/#daume05alignments}
}
Bayesian Summarization at DUC and a Suggestion for Extrinsic Evaluation
Hal Daumé III and Daniel Marcu
Document Understanding Conference (DUC), 2005
[Abstract] [BibTeX]
We describe our entry into the Document Understanding Conference competition for evaluating query-focused multi-document summarization systems. Our system is based on a Bayesian Query-Focused Summarization model, similar to the system we entered into the MSE competition. This paper begins by describing the (few) differences between our DUC system and our MSE system and describes our placement in the competition. The remainder of this paper argues in favor of performing \emphextrinsic evaluation of summarization systems, and suggests a method for doing so.
@InProceedings{daume05duc,
author = {Hal {Daum\'e III} and Daniel Marcu},
title = {Bayesian Summarization at DUC and a Suggestion for Extrinsic
Evaluation},
booktitle = {Proceedings of the Document Understanding Conference (DUC)},
year = {2005},
address = {Vancouver, B.C., Canada},
month = {October 9--10},
url = {http://hal3.name/docs/#daume05duc}
}
Bayesian Multi-Document Summarization at MSE
Hal Daumé III and Daniel Marcu
Workshop on Multilingual Summarization Evaluation (MSE), 2005
[Abstract] [BibTeX]
We describe our entry into the Multilingual Summarization Evaluation (MSE) competition for evaluating generic multi-document summarization systems, where documents are drawn both from English data and English translations of Arabic data. Our system is based on a Bayesian Query-Focused Summarization model, adapted to the generic, multi-document setting and tuned against the \textscRouge evaluation metric. In the human pyramid-based evaluation, our system scored an average of $0.530$, approximately $8\%$ better than the next best system, which scored $0.489$. In the automatic evaluation, our system scored $0.157$ (behind four other sites) with the skip-bigram evaluation, and $0.131$ (behind two other sites) with the standard bigram evaluation.
@InProceedings{daume05mse,
author = {Hal {Daum\'e III} and Daniel Marcu},
title = {Bayesian Multi-Document Summarization at MSE},
booktitle = {Proceedings of the Workshop on Multilingual Summarization Evaluation
(MSE)},
year = {2005},
address = {Ann Arbor, MI},
month = {June 29},
url = {http://hal3.name/docs/#daume05mse}
}
Linguistics
Computational methods are invaluable for typology, but the models must match the questions: Commentary on Dunn et al. (2011)
Roger Levy and Hal Daumé III
Unpublished, 2011
[BibTeX]
@Misc{daume11dunn,
author = {Roger Levy and Hal {Daum\'e III}},
title = {Computational methods are invaluable for typology, but the models must
match the questions: Commentary on Dunn et al. (2011)},
howpublished = {Journal of Linguistic Typology},
year = {2011},
url = {http://hal3.name/docs/#daume11dunn}
}
Non-Parametric Bayesian Model Areal Linguistics
Hal Daumé III
North American Chapter of the Association for Computational Linguistics (NAACL), 2009
[Abstract] [BibTeX]
We describe a statistical model over linguistic areas and phylogeny. Our model recovers known areas and identifies a plausible hierarchy of areal features. The use of areas improves genetic reconstruction of languages both qualitatively and quantitatively according to a variety of metrics. We model linguistic areas by a Pitman-Yor process and linguistic phylogeny by Kingman's coalescent.
@InProceedings{daume09areal,
author = {Hal {Daum\'e III}},
title = {Non-Parametric {B}ayesian Model Areal Linguistics},
booktitle = {North American Chapter of the Association for Computational
Linguistics (NAACL)},
year = {2009},
address = {Boulder, CO},
url = {http://hal3.name/docs/#daume09areal}
}
A Bayesian Model for Discovering Typological Implications
Hal Daumé III and Lyle Campbell
Conference of the Association for Computational Linguistics (ACL), 2007
[Abstract] [BibTeX]
A standard form of analysis for linguistic typology is the universal implication. These implications state facts about the range of extant languages, such as "if objects come after verbs, then adjectives come after nouns." Such implications are typically discovered by painstaking hand analysis over a small sample of languages. We propose a computational model for assisting at this process. Our model is able to discover both well-known implications as well as some novel implications that deserve further study. Moreover, through a careful application of hierarchical analysis, we are able to cope with the well-known sampling problem: languages are not independent.
@InProceedings{daume07implication,
author = {Hal {Daum\'e III} and Lyle Campbell},
title = {A {B}ayesian Model for Discovering Typological Implications},
booktitle = {Conference of the Association for Computational Linguistics (ACL)},
year = {2007},
address = {Prague, Czech Republic},
url = {http://hal3.name/docs/#daume07implication}
}
Asymmetry of Coordination
Hal Daumé III
Unpublished, 2001
[Abstract] [BibTeX]
The standard syntactic analysis of coordination gives equal value to both conjoined elements, and treats both elements equivalently. Nonetheless, in many languages (even English), coordination is much more than simply taking two constituents of the same type (or possibly not) and putting a conjunction between them, yielding a trinary branching node. In this paper I begin with an analysis of coordination in general, present cross-linguistic arguments in its favor, and finally discuss how this structure can account for otherwise unexplained raising data.
@Unpublished{daume01coordination,
author = {Hal {Daum\'e III}},
title = {Asymmetry of Coordination},
note = {Available at
\url{http://www.isi.edu/~hdaume/docs/daume01coordination.ps}},
month = {December},
year = {2001},
url = {http://hal3.name/docs/#daume01coordination}
}
Programming Languages
Yet Another Haskell Tutorial
Hal Daumé III
Unpublished, 2002
[BibTeX]
@Unpublished{daume02yaht,
author = {Hal {Daum\'e III}},
title = {Yet Another Haskell Tutorial},
note = {Available at \url{http://hal3.name/docs/#daume02yaht/}},
year = {2002}
}
credits: design and font inspired by Seth Able's LoRD, some images converted to ANSI using ManyTools, original drawing of me by anonymous.
last updated on nineteen november, two thousand twenty four.