Hard Data to the Model: Personalized, Diverse Preferences for Language Models

Project funded by the National Science Foundation (IIS-2403436)
PI: Jordan Boyd-Graber,

Overview

Tens of millions of Americans interact with AI tools to find information, answer questions, or help them solve problems. One key drawback of these systems is lack of personalization: since modern AI systems do not know whom they are talking to, they can only give generic answers to user questions. But the answer to the question “why is the sky blue?” should be different if the person asking the question is a college student or a young child. This project aims to enable an AI model to provide more appropriate responses to users depending on their unique backgrounds, experiences, and needs. It will first gather a diverse dataset in order to characterize what kinds of responses are preferred by different people. The project will then use these data to develop AI systems that can tailor their answers to individual users, as well as evaluate how well the AI systems personalize responses. To achieve this personalization, the AI systems will learn to explicitly represent the kind of person they are talking to, based on their background or previous interactions, and then use this representation to generate an appropriate response. This project will result in AIs that can provide personalized, specific responses based on the person asking the question as well as resources that will help other personalize AIs. These resources will include datasets of personalized questions and answers, interfaces and visualizations to understand why AI provides specific responses over others; interviews and discussions with community members to understand their needs; and code and models that will allow others to build, train, and deploy personalized AI systems.

While large language models (LLMs) trained on massive datasets have shown impressive performance on a variety of tasks, they still exhibit biases and struggle to be equally useful for everyone. While initially pre-trained on a language modeling objective, most LLMs are further fine-tuned to align their outputs with human preferences. However, existing techniques assume a “one size fits all” approach, ignoring diversity in user needs. This project will first construct probes to detect cases where models fail to adapt to the diverse needs of different users. Then, this project will develop Personalized Feedback for Diverse Populations (PFDP) to identify when models should be sensitive to the unique needs, knowledge, and background of users by examining the training trajectory of models and comparing models' answers to human preferences. PFDP will enable the development of models that can detect examples that are difficult for computers but not for humans, explain why such disparities in difficulty exist, and represent users’ needs and preferences within the model. To correct those shortcomings in the data, we focus on data curation: we propose techniques to automatically create new examples that ask questions about under-represented groups or require targeted responses to create adversarial prompt and response pairs with a human in the loop. Finally, with these new data, we develop techniques to allow modern architectures to make the most of these difficult (but few) examples. These techniques will allow for fine-tuning LLMs with a small curated subset of data that is robust to variations in prompts and will lead to the generation of acceptable answers for a diverse population of users.

<< back to top

Project Team

	Jordan Boyd-Graber Professor, Computer Science (Maryland)
	Alvin Grissom II Associate Professor, Haverford College
	Robin Jia Assistant Professor, University of Southern California
	John P. Lalor Assistant Professor, University of Notre Dame
	Swabha Swayamdipta Assistant Professor, University of Southern California
	Nishant Balepur PhD Student, University of Maryland
	Maharshi Gor PhD Student, University of Maryland
	Fenfei Guo PhD Student, Computer Science (UMD)
	Ahmed Haj Undergrad, Haverford College
	Tasnim Kabir PhD Student, University of Maryland
	John Kanu PhD Student, University of Maryland
	Yoo Yeon Sung PhD Student, University of Maryland
	Ryan Cook PhD Student, University of Notre Dame
	Brihi Joshi PhD Student, University of Southern California
	Sayan Ghosh PhD Student, University of Southern California
	Ameya Godbole PhD Student, University of Southern California

<< back to top

Publications (Selected)

Zongxia Li, Wenhao Yu, Chengsong Huang, Rui Liu, Zhenwen Liang, Fuxiao Liu, Jingxi Che, Dian Yu, Jordan Boyd-Graber, Haitao Mi, and Dong Yu. Self-Rewarding Vision-Language Model via Reasoning Decomposition. International Conference on Learning Representations, 2026. [Bibtex]

@inproceedings{Li:Yu:Huang:Liu:Liang:Liu:Che:Yu:Boyd-Graber:Mi:Yu-2026,
	Title = {Self-Rewarding Vision-Language Model via Reasoning Decomposition},
	Author = {Zongxia Li and Wenhao Yu and Chengsong Huang and Rui Liu and Zhenwen Liang and Fuxiao Liu and Jingxi Che and Dian Yu and Jordan Boyd-Graber and Haitao Mi and Dong Yu},
	Booktitle = {International Conference on Learning Representations},
	Year = {2026},
}

Ryan A Cook, John P Lalor, and Ahmed Abbasi. No Simple Answer to Data Complexity: An Examination of Instance-Level Complexity Metrics for Classification Tasks. Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics, 2025. [Bibtex]
```
@inproceedings{Cook:Lalor:Abbasi-2025,
	Title = {No Simple Answer to Data Complexity: An Examination of Instance-Level Complexity Metrics for Classification Tasks},
	Author = {Ryan A Cook and John P Lalor and Ahmed Abbasi},
	Booktitle = {Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics},
	Year = {2025},
	Location = {Albuquerque},
	Url = {http://cs.umd.edu/~jbg//docs/2025_naacl_answercomplexity.pdf},
}
```
Accessible Abstract: Instance-level complexity scores can be used for tasks such as filtering out noisy observations and subsampling informative examples. However, there exists a diverse taxonomy of complexity metrics that can be used for a classification task, making metric selection itself difficult. We examine the relationship between these metrics and find that simply storing training loss provides similar complexity rankings as other more computationally intensive techniques. Metric similarity allows us to subsample data with higher aggregate complexity along several metrics using a single a priori available meta-feature.
Nishant Balepur, Feng Gu, Abhilasha Ravichander, Shi Feng, Jordan Boyd-Graber, and Rachel Rudinger. Reverse Question Answering: Can an LLM Write a Question so Hard (or Bad) that it Can't Answer?. Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics, 2025. [Bibtex]
```
@inproceedings{Balepur:Gu:Ravichander:Feng:Boyd-Graber:Rudinger-2025,
	Title = {Reverse Question Answering: Can an LLM Write a Question so Hard (or Bad) that it Can't Answer?},
	Author = {Nishant Balepur and Feng Gu and Abhilasha Ravichander and Shi Feng and Jordan Boyd-Graber and Rachel Rudinger},
	Booktitle = {Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics},
	Year = {2025},
	Location = {Albuquerque},
	Url = {http://cs.umd.edu/~jbg//docs/2025_naacl_reverseqa.pdf},
}
```
Accessible Abstract: Language models like ChatGPT are pretty good at answering questions (e.g. "What is 12 * 12?"), but we show they can surprisingly struggle when asked to do the reverse task: generating questions for answers (e.g. "Give me a question with the answer 144"). We study when these errors happen, what might be causing them, and how they can be addressed.
Yoo Yeon Sung, Eve Fleisig, Yu Hope, Ishan Upadhyay, and Jordan Boyd-Graber. GRACE: A Granular Benchmark for Evaluating Model Calibration against Human Calibration. Association for Computational Linguistics, 2025. [Code/Data] [Bibtex]
```
@inproceedings{Sung:Fleisig:Hope:Upadhyay:Boyd-Graber-2025,
	Title = {GRACE: A Granular Benchmark for Evaluating Model Calibration against Human Calibration},
	Author = {Yoo Yeon Sung and Eve Fleisig and Yu Hope and Ishan Upadhyay and Jordan Boyd-Graber},
	Booktitle = {Association for Computational Linguistics},
	Year = {2025},
	Location = {Vienna, Austria},
	Url = {http://cs.umd.edu/~jbg//docs/2025_acl_grace.pdf},
}
```
Accessible Abstract: As AI use becomes more common, it's important to measure not just whether the systems are correct but whether they know when they're incorrect. We propose a new metric to measure this mismatch between correctness and confidence, compare computer ability with human ability, and show that computers have a long way to go before they're well-calibrated.
Nishant Balepur, Rachel Rudinger, and Jordan Boyd-Graber. Which of These Best Describes Multiple Choice Evaluation with LLMs? A) Forced B) Flawed C) Fixable D) All of the Above. Association for Computational Linguistics, 2025. [Bibtex]
```
@inproceedings{Balepur:Rudinger:Boyd-Graber-2025,
	Title = {Which of These Best Describes Multiple Choice Evaluation with LLMs? A) Forced B) Flawed C) Fixable D) All of the Above},
	Author = {Nishant Balepur and Rachel Rudinger and Jordan Boyd-Graber},
	Booktitle = {Association for Computational Linguistics},
	Location = {Vienna, Austria},
	Year = {2025},
	Url = {http://cs.umd.edu/~jbg//docs/2025_acl_mcqa_bad.pdf},
}
```
Accessible Abstract: Most people dislike taking multiple-choice tests, so why are they the default way we evaluate NLP systems? This position paper argues that, despite its simplicity and popularity, multiple-choice evaluation is flawed, both in its format and the datasets it relies on. Drawing from educational testing theory, we propose practical fixes for these issues, helping us build evaluations that better test knowledge and reflect how humans use NLP systems.
Nishant Balepur, Vishakh Padmakumar, Fumeng Yang, Shi Feng, Rachel Rudinger, and Jordan Boyd-Graber. Whose Boat Does it Float? Improving Personalization in Preference Tuning via Inferred User Personas. Association for Computational Linguistics, 2025. [Code/Data] [Bibtex]
```
@inproceedings{Balepur:Padmakumar:Yang:Feng:Rudinger:Boyd-Graber-2025,
	Title = {Whose Boat Does it Float? Improving Personalization in Preference Tuning via Inferred User Personas},
	Author = {Nishant Balepur and Vishakh Padmakumar and Fumeng Yang and Shi Feng and Rachel Rudinger and Jordan Lee Boyd-Graber},
	Booktitle = {Association for Computational Linguistics},
	Location = {Vienna, Austria},
	Year = {2025},
	Url = {http://cs.umd.edu/~jbg//docs/2025_acl_boat.pdf},
}
```
Accessible Abstract: Language models are optimized to learn which responses you prefer, but they don't learn why you preferred a particular response. This limits their ability to tailor to personalized requests (e.g., "What should I eat for dinner? I'm vegetarian"), so we introduce a simple fix: have models infer personas that explain why users could prefer responses. We show training on these inferred personas leads to responses that are significantly more personalized for user needs.
Nishant Balepur, Matthew Shu, Yoo Yeon Sung, Seraphina Goldfarb-Tarrant, Shi Feng, Fumeng Yang, Rachel Rudinger, and Jordan Boyd-Graber. A Good Plan is Hard to Find: Aligning Models with Preferences is Misaligned with What Helps Users. Empirical Methods in Natural Language Processing, 2025. [code+data] [video] [Bibtex]
```
@inproceedings{Balepur:Shu:Sung:Goldfarb-Tarrant:Feng:Yang:Rudinger:Boyd-Graber-2025,
	Title = {A Good Plan is Hard to Find: Aligning Models with Preferences is Misaligned with What Helps Users},
	Author = {Nishant Balepur and Matthew Shu and Yoo Yeon Sung and Seraphina Goldfarb-Tarrant and Shi Feng and Fumeng Yang and Rachel Rudinger and Jordan Boyd-Graber},
	Booktitle = {Empirical Methods in Natural Language Processing},
	Location = {Suzhou and China},
	Year = {2025},
	Url = {http://cs.umd.edu/~jbg//docs/2025_emnlp_planorama.pdf},
}
```
Accessible Abstract: One of the ways that AI can help users with a task is by developing a plan: a set of steps to solve a problem or complete a task. Through a user study with human--AI teams, we show that AIs are poor judges of what plan is going to be more helpful to more helpful to a user trying to answer math questions or questions that require multiple steps of research (e.g., what's the tallest building in the most populous city in Germany).

Ishani Mondal, Jack W. Stokes, Sujay Kumar Jauhar, Longqi Yang, Mengting Wan, Xiaofeng Xu, Xia Song, Jordan Boyd-Graber, and Jennifer Neville. Group Preference Alignment: Customizing LLM Responses from In-Situ Conversations Only When Needed. Empirical Methods in Natural Language Processing (Industry), 2025. [Bibtex]

@inproceedings{Mondal:Stokes:Jauhar:Yang:Wan:Xu:Song:Boyd-Graber:Neville-2025,
	Url = {http://cs.umd.edu/~jbg//docs/2025_emnlp_grouppreference.pdf},
	Author = {Ishani Mondal and Jack W. Stokes and Sujay Kumar Jauhar and Longqi Yang and Mengting Wan and Xiaofeng Xu and Xia Song and Jordan Boyd-Graber and Jennifer Neville},
	Booktitle = {Empirical Methods in Natural Language Processing (Industry)},
	Title = {Group Preference Alignment: Customizing LLM Responses from In-Situ Conversations Only When Needed},
	Location = {Suzhou, China},
	Year = {2025},
}

Yi Yang, John P Lalor, Ahmed Abbasi, and Daniel Dajun Zeng. Hierarchical Deep Document Model. IEEE Transactions on Knowledge and Data Engineering, 2025. [Bibtex]
```
@article{Yang:Lalor:Abbasi:Zeng-2025,
	Title = {Hierarchical Deep Document Model},
	Author = {Yi Yang and John P Lalor and Ahmed Abbasi and Daniel Dajun Zeng},
	Journal = {IEEE Transactions on Knowledge and Data Engineering},
	Year = {2025},
	Url = {https://doi.org/10.1109/TKDE.2024.3487523},
}
```
Accessible Abstract: Topic modeling is a commonly used text analysis tool for discovering latent topics in a text corpus. However, while topics in a text corpus often exhibit a hierarchical structure (e.g., cellphone is a sub-topic of electronics), most topic modeling methods assume a flat topic structure that ignores the hierarchical dependency among topics, or utilize a predefined topic hierarchy. In this work, we present a novel Hierarchical Deep Document Model (HDDM) to learn topic hierarchies using a variational autoencoder framework. We conduct experiments on four real-world text datasets to evaluate the topic modeling capability of the proposed HDDM method compared to state-of-the-art hierarchical topic modeling benchmarks. HDDM achieves considerable improvement over benchmarks and is capable of learning meaningful topics and topic hierarchies. We also apply HDDM to a real-world medical notes dataset for clinical prediction and show that HDDM can better summarize topics in medical notes, resulting in more accurate clinical predictions.

Zongxia Li , Xiyang Wu, Guangyao Shi, Yubin Qin, Hongyang Du, Tianyi Zhou, Dinesh Manocha, and Jordan Boyd-Graber. VideoHallu: Evaluating and Mitigating Multi-modal Hallucinations on Synthetic Video Understanding. Neural Information Processing Systems, 2025. [Bibtex]

@inproceedings{Li:Wu:Shi:Qin:Du:Zhou:Manocha:Boyd-Graber-2025,
	Title = {VideoHallu: Evaluating and Mitigating Multi-modal Hallucinations on Synthetic Video Understanding},
	Author = {Zongxia Li  and Xiyang Wu and Guangyao Shi and Yubin Qin and Hongyang Du and Tianyi Zhou and Dinesh Manocha and Jordan Lee Boyd-Graber},
	Booktitle = {Neural Information Processing Systems},
	Year = {2025},
	Location = {San Diego},
	Url = {http://cs.umd.edu/~jbg//docs/2025_neurips_videohallusion.pdf},
}

Yoo Yeon Sung, Maharshi Gor, Eve Fleisig, Ishani Mondal, and Jordan Boyd-Graber. ADVSCORE: A Metric for the Evaluation and Creation of Adversarial Benchmarks. North American Association for Computational Linguistics, 2025. [Bibtex]
```
@inproceedings{Sung:Gor:Fleisig:Mondal:Boyd-Graber-2025,
	Title = {ADVSCORE: A Metric for the Evaluation and Creation of Adversarial Benchmarks},
	Author = {Yoo Yeon Sung and Maharshi Gor and Eve Fleisig and Ishani Mondal and Jordan Lee Boyd-Graber},
	Booktitle = {North American Association for Computational Linguistics},
	Year = {2025},
	Url = {http://cs.umd.edu/~jbg//docs/2025_naacl_advscore.pdf},
}
```
This was one of ten papers selected as an Outstanding Paper at NAACL 2025

Accessible Abstract: Adversarial datasets should validate AI robustness by presenting samples that humans handle well but models struggle with. However, as models advance, these datasets risk becoming obsolete. Assessing whether a dataset remains adversarial is challenging due to the absence of a standardized metric for adversarialness. To address this, we introduce AdvScore, a human-grounded evaluation metric that quantifies a dataset's adversarial nature by accounting for the differing abilities of models and humans while also identifying low-quality examples.
Nishant Balepur, Matthew Shu, Alexander Hoyle, Alison Robey, Shi Feng, Seraphina Goldfarb-Tarrant, and Jordan Boyd-Graber. A SMART Mnemonic Sounds like "Glue Tonic": Mixing LLMs with Student Feedback to Make Mnemonic Learning Stick. Empirical Methods in Natural Language Processing, 2024. [Code and Data] [Research Talk] [Bibtex]
```
@inproceedings{Balepur:Shu:Hoyle:Robey:Feng:Goldfarb-Tarrant:Boyd-Graber-2024,
	Title = {A SMART Mnemonic Sounds like "Glue Tonic": Mixing LLMs with Student Feedback to Make Mnemonic Learning Stick},
	Author = {Nishant Balepur and Matthew Shu and Alexander Hoyle and Alison Robey and Shi Feng and Seraphina Goldfarb-Tarrant and Jordan Boyd-Graber},
	Booktitle = {Empirical Methods in Natural Language Processing},
	Year = {2024},
	Location = {Miami},
	Url = {http://cs.umd.edu/~jbg//docs/2024_emnlp_mnemonic.pdf},
}
```
Accessible Abstract: Learning vocabulary (e.g., benevolent) can be tedious, but using mnemonics (e.g., benevolent sounds like "benefits," and a kind boss gives benefits) makes it more engaging and effective. This paper introduces SMART, a large language model trained to produce mnemonics based on feedback from flashcard learners. Students struggle to predict which mnemonics will help them most. Still, by training SMART on both student preferences and learning outcomes, we can generate mnemonics as effectively as GPT-4, but at a much lower cost.
Maharshi Gor, Hal Daumé III, Tianyi Zhou, and Jordan Boyd-Graber. Do great minds think alike? Investigating Human-AI Complementarity in Question Answering with CAIMIRA. Empirical Methods in Natural Language Processing, 2024. [Talk] [Code] [Data] [Bibtex]
```
@inproceedings{Gor:Daume-III:Zhou:Boyd-Graber-2024,
	Title = {Do great minds think alike? Investigating Human-AI Complementarity in Question Answering with CAIMIRA},
	Author = {Maharshi Gor and Hal {Daum\'{e} III} and Tianyi Zhou and Jordan Boyd-Graber},
	Booktitle = {Empirical Methods in Natural Language Processing},
	Year = {2024},
	Location = {Miami},
	Url = {http://cs.umd.edu/~jbg//docs/2024_emnlp_caimira.pdf},
}
```
Accessible Abstract: CAIMIRA discovers the skills that humans and AIs use to answer questions. By scraping websites where trivia nerds answer really difficult questions and posing those questions to AI models like GPT-4 and LLaMA-3-70B, while humans excel in knowledge-based abductive reasoning, AI outperforms on fact-based historical recall. This research suggests future challenges should focus on more complex reasoning and nuanced language tasks to better align AI development with human cognitive strengths.
Ting-Yun Chang, Jesse Thomason, and Robin Jia. When Parts are Greater Than Sums: Individual LLM Components Can Outperform Full Models. Empirical Methods in Natural Language Processing, 2024. [Blog] [Code] [Bibtex]
```
@inproceedings{Chang:Thomason:Jia-2024,
	Title = {When Parts are Greater Than Sums: Individual LLM Components Can Outperform Full Models},
	Author = {Ting-Yun Chang and Jesse Thomason and Robin Jia},
	Booktitle = {Empirical Methods in Natural Language Processing},
	Location = {Miami},
	Year = {2024},
	Url = {https://arxiv.org/abs/2406.13131},
}
```
Accessible Abstract: Language models can learn to do new tasks by being shown a few demonstrations, also known as in-context learning. In this paper, we analyze how different internal components of these models contribute to in-context learning. We find that for any given task, a small subset of model components is consistently responsible for good performance, and can even surpass the accuracy of the full model. By quickly identifying which components are useful for a new task, we improve the accuracy of in-context learning across a range of different natural language processing datasets.
Tasnim Kabir, Yoo Yeon Sung, Saptarashmi Bandyopadhyay, Hao Zou, Abhranil Chandra, and Jordan Boyd-Graber. You Make me Feel like a Natural Question: Training QA Systems on Transformed Trivia Questions. Empirical Methods in Natural Language Processing, 2024. [ArXiv] [Research Talk] [Code] [Bibtex]
```
@inproceedings{Kabir:Sung:Bandyopadhyay:Zou:Chandra:Boyd-Graber-2024,
	Title = {You Make me Feel like a Natural Question: Training QA Systems on Transformed Trivia Questions},
	Author = {Tasnim Kabir and Yoo Yeon Sung and Saptarashmi Bandyopadhyay and Hao Zou and Abhranil Chandra and Jordan Lee Boyd-Graber},
	Booktitle = {Empirical Methods in Natural Language Processing},
	Location = {Miami},
	Year = {2024},
	Url = {http://cs.umd.edu/~jbg//docs/2024_emnlp_natural.pdf},
}
```
Accessible Abstract: Many of the questions for training AIs how to answer questions come from the queries users type into search engines (like Google's Natural Questions). Is there a cheaper---perhaps even better---way? We propose a "naturalization" technique to turn high-quality, rigorously edited trivia questions into examples that resemble Natural Questions. Training on our naturalized questions and testing on natural questions comes close to the results with using Natural Questions, and we can improve results on MMLU (a standard modern evaluation set) by using our data.

Xiyang Wu, Tianrui Guan, Dianqi Li, Shuaiyi Huang, Xiaoyu Liu, Xijun Wang, Ruiqi Xian, Abhinav Shrivastava, Furong Huang, Jordan Boyd-Graber, Tianyi Zhou, and Dinesh Manocha. AUTOHALLUSION: Automatic Generation of Hallucination Benchmarks for Vision-Language Models. Findings of the Empirical Methods in Natural Language Processing, 2024. [Bibtex]

@article{Wu:Guan:Li:Huang:Liu:Wang:Xian:Shrivastava:Huang:Boyd-Graber:Zhou:Manocha-2024,
	Title = {AUTOHALLUSION: Automatic Generation of Hallucination Benchmarks for Vision-Language Models},
	Author = {Xiyang Wu and Tianrui Guan and Dianqi Li and Shuaiyi Huang and Xiaoyu Liu and Xijun Wang and Ruiqi Xian and Abhinav Shrivastava and Furong Huang and Jordan Boyd-Graber and Tianyi Zhou and Dinesh Manocha},
	Journal = {Findings of the Empirical Methods in Natural Language Processing},
	Year = {2024},
	Location = {Miami},
	Url = {https://arxiv.org/abs/2406.10900},
}

Zongxia Li, Ishani Mondal, Huy Nghiem, Yijun Liang, and Jordan Boyd-Graber. PEDANTS (Precise Evaluations of Diverse Answer Nominee Text for Skinflints): Use Evaluation Metrics Wisely---Efficient Evaluation Analysis and Benchmarking for Open-Domain Question Answering. Findings of the Empirical Methods in Natural Language Processing, 2024. [Code] [Bibtex]

@article{Li:Mondal:Nghiem:Liang:Boyd-Graber-2024,
	Title = {PEDANTS (Precise Evaluations of Diverse Answer Nominee Text for Skinflints): Use Evaluation Metrics Wisely---Efficient Evaluation Analysis and Benchmarking for Open-Domain Question Answering},
	Author = {Zongxia Li and Ishani Mondal and Huy Nghiem and Yijun Liang and Jordan Boyd-Graber},
	Journal = {Findings of the Empirical Methods in Natural Language Processing},
	Location = {Miami},
	Year = {2024},
	Url = {https://arxiv.org/abs/2402.11161},
}

Media

Matteo Wong. Could ChatGPT Secretly Tell You How to Vote?. Atlantic, 2025. [Bibtex]

@online{Wong-2025,
	Author = {Matteo Wong},
	Journal = {Atlantic},
	Year = {2025},
	Title = {Could ChatGPT Secretly Tell You How to Vote?},
	Url = {https://www-theatlantic-com.proxy-um.researchport.umd.edu/technology/2025/12/chatbots-changing-votes/685137/?utm_source=copy-link&utm_medium=social&utm_campaign=share},
}

Jean Marbella. Can AI Surpass Human Intelligence?. Baltimore Sun, 2025. [Bibtex]

@online{Marbella-2025,
	Author = {Jean Marbella},
	Journal = {Baltimore Sun},
	Year = {2025},
	Title = {Can AI Surpass Human Intelligence?},
	Url = {http://baltimoresun.com/2025/09/18/can-ai-surpass-human-intelligence-achieving-singularity/},
}

Maryland Today Staff. At New AI Institute’s Celebration, a Question of 'Who's at the Table'. Maryland Today, 2024. [Bibtex]

@article{Staff-2024,
	Author = {Maryland Today Staff},
	Year = {2024},
	Title = {At New AI Institute’s Celebration, a Question of 'Who's at the Table'},
	Journal = {Maryland Today},
	Url = {https://today.umd.edu/at-new-ai-institutes-celebration-a-question-of-whos-at-the-table},
}

Courtney Ryan. NSF grant will help address AI's diversity problem. Mendoza College of Business, 2024. [Bibtex]

@article{Ryan-2024,
	Author = {Courtney Ryan},
	Year = {2024},
	Title = {NSF grant will help address AI's diversity problem},
	Journal = {Mendoza College of Business},
	Url = {https://mendoza.nd.edu/news/nsf-ai-diversity/},
}

Resources (software and datasets)

Acknowledgments

This work is supported by the National Science Foundation. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the researchers and do not necessarily reflect the views of the National Science Foundation.