FAQ for Prospective Advisees

I am lucky enough to receive fairly frequent e-mail from folks who express an interest in working with me. If you are such a person, thank you for your interest! However, before contacting me, please read all the information below. And then, if you decide to write, please tell me that you have read my FAQ; otherwise my first response, if I send one, will be to send you here.

Dear Sir... or Dear Professor...
Will you have any vacancies for Ph.D. students next year?
What kind of student are you looking for?
Can I come do a short internship or summer internship with you?
I'm a UMD student. Would you consider me as a research assistant?
Can I work with you on a Ph.D. in computational linguistics or natural language processing?
An important note for Computer Science Ph.D. applicants
Are my qualifications appropriate? How can I improve them?
How can start getting up to speed on NLP via self-study?
What are my chances of being admitted?
Which department should I apply to?
Can I do a postdoc with you?
Can I come to UMD as a visiting scholar?

I hope this FAQ is helpful. For prospective Ph.D. students, I'd also like to recommend this nice article on choosing a Ph.D. program in Computer Science, which contains a lot of information applicable to other disciplines, as well. For a much more general and comprehensive discussion of graduate school and life in academia, I recommend Cynthia Verba, Scholarly Pursuits: A Guide to Professional Development During the Graduate Years, which is old but still quite useful.

Dear Sir... or Dear Professor... [back to top]
I recognize that some people come from a culture where these modes of address are the natural way to approach a professor. However -- and I'm sorry to be strict about this -- I will immediately delete any e-mail inquiries that are not addressed to me personally (that is, explicitly using my name), since those often indicate that I am part of a bulk e-mail.
Will you have any vacancies for Ph.D. students next year? [back to top]
Please do not send me email to ask this question or any of the related questions listed below. If you're considering applying for grad school at UMD, that's wonderful, and I appreciate your interest in working with me. However, there are too many factors involved for me to know the answer to this question myself at the time you will be writing it, and please trust me that emailing me just to tell me that you're applying will make absolutely no difference.
That said, if you apply you are more than welcome to mention me in your application as a potential advisor and to talk about how your research vision fits with my own interests. Though, CS applicants, please make sure to read this important note.
Related questions and answers:
What kind of student are you looking for? [back to top]
I'm interested in prospective Linguistics or CS students who have strong computational skills, a deep interest in human language, and who care about the problems they work on.
In general, looking at my recent publications will help you get a sense of what I do and how I approach things. At the present time, I am most interested in students who are interested in one of the following problem areas. If you decide to apply, don't just tell me you're interested in one or more of these areas; tell me what interests you about them, how you are thinking of approaching the topic, and of course tell me about any relevant research experience.
- Computational psycholinguistics and neurolinguistics. Although language processing has long been a subject I'm interested in, I've also been looking in recent years at connections, and the lack of connection, between computational linguistics and modern computational neuroscience. Take a look at the questions that I asked in my 2019 seminar on topics in the computational cognitive neuroscience of language, at my 2022 seminar on computational cognitive neuroscience and AI, and particularly at the discussion of my current work on these topics. If what you see there relates to questions you're interested in, tell me what those are and how you're thinking of approaching them.
- Computational approaches to mental health. I am working on development of computational methods in connection with mental health conditions, particularly suicide risk, schizophrenia, and depression, and I am particularly interested in better prediction for early detection or monitoring, effective use of both data and domain knowledge, integration of machine and human capabilities, and improving our scientific understanding of the conditions themselves. I am not interested simply in throwing off-the-shelf machine learning methods at mental health data, so so if this is an area that interests you, tell me more about your research vision.
- Computational political science. I am particularly interested in how language influences and/or reflects our perspectives and decision making. Most of my work on that topic has tended to involve political actors (e.g. presidential candidates, Congress, members of the Supreme Court), with some attention to public attitudes and responses. But I'm increasingly interested in better understanding the way public perceptions are influenced, or even constructed, by language, as well as the language-oriented and psychological factors that make people susceptible to being manipulated by misleading or false information. If you have a vision of research along these lines let me know what it is.
If you are from a low-income, first generation, or traditionally underrepresented group in academia, or in computer science and/or linguistics in particular, I invite (indeed, encourage) you to let me know, if that information would not otherwise be apparent to me.
Can I come do a short internship or summer internship with you? [back to top]
Currently the answer is no, so please don't write to ask. I will change the answer here if that changes.
I'm a UMD student. Would you consider me as a research assistant? [back to top]
- UMD undergraduate:
  I'm happy to consider the possibility. I've had good experiences mentoring undergrad researchers, usually people with strong computational background. I've sent undergrads on to excellent industry positions (e.g. at BBN) and grad schools (e.g. JHU, MIT, UMichigan), and I'm generally happy to talk. Send me your resume and let me know how your capabilities and interests connect to my current research priorities (see above).
- UMD graduate student in CS, Linguistics, or the iSchool:
  
  I'm happy to consider the possibility. Please send me e-mail with information about your experience so far (including how you've been working with, and on what), and how your capabilities and interests connect to my current research priorities (see above).
- UMD graduate student in another department (e.g. Psychology):
  
  I'm happy to talk about collaboration with grad students in other departments, as long as their advisors are engaged or at least on board. That's different from advising --- I generally do not take on research advisees whose main focus is outside of computational linguistics. If you want to do work that is aligned with my current research interests, and you have already taken relevant coursework and are seriously interested in changing fields, please send me e-mail with information about your background, the kind of research you're interested in, and why you're interested in making the change.
Can I work with you on a Ph.D. in computational linguistics or natural language processing? [back to top]
The only way to get an answer to this question is to apply to a Ph.D. program here. And before reading further, please see this question and how I answered it.
In general, students who work with me have a strong background in computer science, as well as exposure to natural language processing, linguistics, or both. The ideal student for me in the Linguistics Department tends either to be a strong computer scientist who is so in love with language that they want the degree in Linguistics rather than CS, or someone coming from the cognitive science side (e.g. Linguistics, Psychology) who already has significant experience with computational work, at minimum someone who already has some degree of comfort with programming (typically Python).
An important thing to note: UMD does not offer degrees specifically in computational linguistics or NLP. Students interested in these topics typically enroll in the Ph.D. program in either Linguistics, Computer Science, or at the iSchool (i.e. College of Information Studies). (See also the Neural and Cognitive Science program.) Each of those programs offers information for prospective students on its Web page. I have advised or co-advised students from all of them.
Regardless of department, the strongest applicants will have an established track record in computational linguistics research, e.g. co-authorship on papers published at first-tier conferences. That said, it is often the case that people from low-income, first generation, or traditionally underrepresented groups have not had as many opportunities for these experiences, so if you are a member of one of those groups, I invite (indeed, encourage) you to let me know, if that information would not otherwise be apparent to me.
Please see this question and answer for information about research areas that I'm interested in.
An important note for Computer Science applicants:
The CS department here will often admit strong students regardless of whether or not they are matched in advance with a specific advisor. For many (even most) students in CS, the first year (or sometimes two years) involves a "shopping around" process where students get to know potential advisors and vice versa. So getting admitted to CS at University of Maryland isn't the same as getting admitted to work with a particular advisor, even if that advisor was mentioned in your application. If you are admitted to CS and want to work with me, it's essential to contact me directly before you make your acceptance decision, so that we can talk about it.
Are my qualifications appropriate? What can I do to improve them? [back to top]
Regarding qualifications, as I noted above, I am most interested in students who have both a strong computational background and a deep interest in human language. I can potentially co-advise students who have just one of these or the other, depending on who the other co-advisor is.
I am sometimes asked by undergraduates interested in computational linguistics what courses they should take, prior to grad school applications, to better prepare themselves. First of all, bravo! That's a good question to be asking, especially if you're asking it early enough to do something about it. In order of priority, I think the areas below are probably where you should focus on the computational side.
(Note also that I'm a really big fan of the 3Blue1Brown videos on YouTube, and I very highly recommend you look there for material related to some of these topics. The sequences on linear algebra, calculus, and neural networks are particularly good and I recommend them even if you're already familiar with the material.)
- Programming. Programming is more a skill than a body of knowledge you can learn from a book, so the sooner you start practicing the better. In terms of specific programming languages, these days Python is dominant in NLP and machine learning, and the popular Natural Language Toolkit and its associated book are a good way to get exposed to both programming basics and NLP concepts, although for NLP packages spaCy would be my go-to when it comes to something industry strength. R is very popular for statistical modeling, particularly in the social sciences, and there are R wrappers around some useful python packages. Perl has fallen out of favor, but it can still be very good for down-and-dirty text processing and it's particularly worth learning as a value-add if you already are fluent in major *nix command-line tools like awk and sed. Java is still popular, though in my opinion it requires too much overhead to learn compared to the value it provides, except perhaps in those industry settings where the company requires it. Scala seems to be an up-and-comer for people who like Java. The basics of UIs in javascript can be very useful, especially if you actually use jQuery to simplify your life. I'd say C and C++ probably shouldn't be your first choice.
- Basic probability and statistics. These should be part of everyone's education. How else can you understand what's going on in the economy, make informed decisions about medical treatments, or choose the right time to bail out on Deal or No Deal? An alternative: spend some time going through The Cartoon Guide to Statistics. It's important to be able to understand basic concepts like conditional probability and independence, and basic statistical concepts like mean, variance, and the like.
- Data structures and algorithms. This is the theoretical counterpart to the practical knowledge you acquire when you learn to program and then practice a bunch. If you do a lot of programming you may learn a lot of what you need to know about data structures and algorithms on the fly. But this kind of course will also help you understand key ideas like computational complexity.
- Basic calculus and linear algebra. To understand a lot of the work in computational modeling, some familiarity with these subjects is needed. That said, it's not like you need all of either subject, by a long shot. A very nice resource is Hal Daumé's Math for Machine Learning.
- Theory of grammars and automata -- particularly finite-state automata, finite-state transducers, and context-free grammars. I'm particularly fond of how these are introduced in Lewis and Papadimitriou, Elements of the Theory of Computation, though note that it's a rather rigorous and mathematical treatment of the subject.
- Machine learning. It may seem odd that I'm listing this last, when so much of computational linguistics these days is driven by machine learning and particularly when deep learning approaches seem to be taking over the universe. But even if you're going to apply deep learning as a primary tool, it's essential that you understand the foundations behind it rather than applying it blindly.
On the other side, if you've got computational background but you're new to language, some valuable courses to have include:
- Introductory syntax. Even if you're not going to do work specifically related to syntax, one of the most important defining properties of human language is the fact that it conveys structured information in a sequential form.
- Experimental methods (e.g. in a psychology course). This overlaps with the recommendation above to get some background in probability and statistics, but it's even better if you can get some background on human subject experimentation.
- Formal logic. This subject provides the foundation for understanding key elements of natural language semantics.
What are my chances of being admitted? [back to top]
Please do not ask me this question. From year to year, it is very difficult to say very far in advance whether or not I will even be taking new students, and unfortunately I cannot tell you your chances of being admitted. This depends not only on your background, but on many interacting details of the admissions process within each department, including the other applicants that year. So please do not ask me your chances of being admitted or of my advising you, thanks.
Which department should I apply to? [back to top]
This depends on your background and your goals. These programs are all quite competitive, so you'll want to apply to a department where your background provides a good chance of being accepted. Also think about your goals. If you're thinking about eventually going into academia, what kind of department do you imagine yourself applying to? During your grad school experience, you'll be taking a variety of required and elective courses outside your research area -- do you want to be a department where required coursework explores things like semantics and phonology, or one where you'll be spending time on things like databases and recursion theory?
If this is a question you're asking, it probably couldn't hurt you to look at this nice article on choosing a Ph.D. program in Computer Science, which contains a lot of information applicable to other disciplines, as well.
Can I do a postdoc with you? [back to top]
Generally if I have an open postdoc position, I'll advertise on the CORPORA mailing list. On the other hand, I'm always happy to hear from people who are finishing up their Ph.D. if their research interests are closely aligned with mine. Send me some mail and tell me about yourself. Please include a current c.v.
Can I come to UMD as a visiting scholar? [back to top]
Please do not e-mail me with this question. It does not matter if you are self-funded. I am unlikely to host visitors if I do not personally know them or their advisor. The best way to approach this is by asking your advisor to introduce us.
How can start getting up to speed on NLP via self-study? [back to top]
If you're looking for traditional textbooks, the unquestionable standard is Jurafsky and Martin, now in its 3rd edition. As of this writing it's publicly available for free on the Web as a pre-publication draft. There are lots of people who have done online lectures based on this book; it's not hard to find slide decks for the chapters done by various academics, via web search.
If you want to watch lectures, I think Chris Manning's CS224N Natural Language Processing with Deep Learning is brilliant. Note that lectures get updated periodically so it's worth looking for the most up-to-date lecture playlist for the course on YouTube by searching on the course title.
Having said the above, though, if you're interested in getting up to speed on NLP what I really recommend is structuring a hands-on effort around the Natural Language Toolkit. This is a python package that covers traditional NLP, but even more important, the package is tightly linked to a NLTK Book that introduces the basic concepts with python code that one can just copy/paste and then play around with. It's also worth noting that the NLTK package/book combination is a good fit for someone who programs in some other language but needs an introduction to python focused on NLP. In general one would not typically use NLTK in a serious production setting -- for that my go-to recommendation would be spaCy -- but in terms of a great pedagogical combination of bare-minimum explanation and associated code, NLTK is the way to go if you want to get ramped up on the ideas, especially for someone who's interested in doing that in a very hands-on way. One could then use Jurafsky and Martin as a textbook reference to look in more detail when NLTK's explanations are a little too sparse; conversely, once you understand the basics through NLTK, getting up to speed on doing things in spacy will be a lot easier. Note that NLTK is probably lighter than one might want on the most recent deep learning approaches, but if you've got Jurafsky and Martin for the core ideas and Manning's lectures for detailed and more up to date discussion then I think you've got pretty much everything you need.