Frequently Asked Questions
This is really intimidating. Do you expect all students to have read this before asking you a question?
No! I've put this here for a couple of reasons:
- I get a lot of e-mail. I want to get less e-mail (or at least better e-mail), so I'm hoping that if people find the answer here, they'll send me less e-mail.
- I get a lot of e-mail. I want to answer my e-mail quickly, but sometimes I don't. I'm hoping that if people find the information they need here, they won't have to wait for me to answer.
- Some students are shy about asking questions; if they can find an answer here that they were too afraid to ask, then everyone is better off.
- I want to make sure I treat students fairly. If I put policies/expectations out here publicly, it gives me less leeway for my latent biases to impact students.
I want to work with you, and I'm currently a grad student at the University of Maryland. How do I do that?
In general, the matching process at UMD between professor and student typically happens during the first year (not before). This allows the student to get comfortable and set up within the university, adjust to the region, and to figure out how much time they have to devote to research. This is good because this also lets you get a sense for the research going on and what the personalities of the various professors/groups are.
If you want to work specifically with me, then take computational linguistics or machine learning (because of the huge number of students who want to work with me, this is non-negotiable; it will teach you what you need to know and will provide me a good sense of your abilities). Once you've done that, send me an e-mail with a high-level view of what sorts of things you're interested in and your courseload for when we'd be working together (I don't want to work with students who have no time for research). There are also more day-to-day stuff that you should know about using computers like git, command lines, debugging, and profiling.
Then schedule a meeting, and we'll figure out a project to work on together to show that you're able to work independently on a self-contained project. After we finish the project, we can discuss longer-term arrangements.
I want to work with you, and I'm currently an undergrad student at the University of Maryland. How do I do that?
First, take either the undergraduate natural language processing course or the undergraduate machine learning course. In other words, you first need to learn how to program and learn some specialized skills. But try to take these courses as quickly as you possibly can!
Then, send me an e-mail. I will send you a challenge problem to complete in about a week. If you don't have a week to work on a challenging problem, then wait to e-mail me when you have some time to work on such a problem (warning: if I'm having a particularly busy week, I may not get back to you quickly, as putting together a challenge problem takes some time too).
You'll need to sign up to work with me as an independent study. I'll also ask you to have a fairly light course load the semester you work with me, as undergraduate students have a tendency to take on too much.
I want to work with you, and I'm not currently a student at Maryland. How do I work with you?
Then you should apply to be a student at Maryland. The best way would be to apply to computer science and mention me specifically in your application. After you submit an application, please drop me an e-mail (put GRADAPP-20XX in the subject) with your CV letting me know you applied. I may not reply, but it's still very useful!
Please also consider applying to the MPI-SWS joint program!
I also have students who are now faculty; if you want to work on the sorts of things that I work on but want an energetic, knowledgeable advisor instead of a has been consider applying to work with them: Mohit Iyyer and He He.
To help me quickly search for such e-mails (and to show that you've done your homework by reading this FAQ), please put GRADAPP-20XX in the subject line, where XX is the year you hope to enroll.
See the openings page.
Are PhD students at Maryland funded?
The University of Maryland, like all top American universities, makes a commitment to fund PhD students so long as they're making adequate progress. This includes supporting tuition, a stipend, and health insurance. This is typically through a combination of research assistant positions and teaching assistant positions (my students typically TA once or twice).
Can I mention you in my statement of purpose?
You don't need to specifically ask my permission to list my name in a statement of purpose so long as our interests are a good match. If you think they are, go ahead! (However, it may not help you to list me if I'm not taking students in a particular year.)
Do you have any postdocs available?
I'm fairly junior, so I'm trying to fund students right now. I'm fairly good about keeping my webpage updated, so see the openings page.
I want to work with you, and I'm currently a student at Colorado
I moved to the University of Maryland in August 2017. I will no longer be advising new students at the University of Colorado.
Can I work with you as an intern?
Unfortunately, it's very hard to evaluate the quality of candidates without a formal system (e.g., as we have for university admissions). As a result, it is my policy only to work with people directly recommended to me by a professor or researcher with whom I already have a relationship.
Will I get admitted? Why was I rejected?
I will not answer this sort of question. Don't even bother asking. I cannot give opinions on whether you will get accepted without seeing a full application. There are numerous venues where you can get uninformed opinions about your chances. In any given year, which students I accept depends on funding amounts, match between project and students, and who says yes or no and when. It's a very stochastic process, and I wish we had a more logical system.
Where should I do my PhD?
The most important thing is finding a PhD program where you will be happy. Hopefully that will be UMD; students often put too much weight on rankings. Don't ignore rankings, but pay attention to the people involved and your fit with the group.
If you absolutely must look at rankings, I think CSRankings.org are the least bad rankings available.
You asked me to do a virtual interview after I applied for a PhD position. What does that mean and what should I do to prepare?
First, it means that you really stood out in the pool of applicants! I typically only interview five to ten applicants a year to select the candidates I will eventually invite to attend.
In many ways, it's a sanity check. If you say in your application that you're really good at X and you want to do Y, I'll ask about those things in a little more detail to better understand your background and your skills. I'll also ask about what you want out of a PhD program.
This really is a two-way conversation, however. We're going to work with each other for N years, and we both need to be sure that we can stand each other and work well together. So it's important for candidates to ask whatever questions they're concerned about too.
Can you give me a letter of reference?
I typically only write letters for students whose committee I've served on, whom I've worked on a research project with, or who did very well in my class. Unless you are my direct advisee, you must ask me before giving my name out.
When you ask, please send a list of bulleted points that answers the following questions: how we know each other (e.g. took class X, received grade Y, completed project on Z), what research we have worked on (what the project was about, where it was published, your role in the project), what you're applying to, when you will send me all of the rec requests, and when the deadline is. Good rec letters contain details, and the more details you can provide that I can then surround with context, the better your letter will be.
For example, if I relied on my memory to write a letter of recommendation, I would be able to say something like "Susan took my class and did great, she did a project on music stuff". That's not as good as "Susan took my class Fall 2015, earned an A, and presented a final project on distinguishing musical styles automatically given the waveform of a song. Their group used a variety of techniques (support vector machines, convolutional neural nets, and k-nearest neighbors) to decrease the error rate of a strong baseline from 0.4 to 0.2". Obviously the second one is better, but I can't recall of the details myself. Your bullet points will help me recall details and to put your work into context.
I do have some rules about writing letters for grad school, though. These are non-negotiable given the large volume of letters I have to send out.
- I must get all of the requests at once. I don't want them to trickle in; I need to be able to submit all my letters for you in once sitting.
- I must have at least three weeks warning (e.g., if the first deadline is December 1, I need to know by November 10 that you want me to write a letter). I cannot write a good letter if I don't have time to prepare.
- I must have at least a one week window to submit your letters. So you need to have sent out ALL recommendation requests a week before the deadline. I must have the request and all of the information needed to submit the letter.
I will get really annoyed if you only give me a day to submit a letter. I will get more annoyed if it's Saturday night. I will get even more annoyed if it's Easter. If you do this, you risk not only me not sending that letter but refusing to send any more letters for you.
What are your expectations / preferences in terms of what a student should know?
I personally like C++ and Python, but the culture here leans to Java, which I've been using more and more (and likely will continue to). I prefer writing tests to debugging, but debugging is a necessary evil. I do like reinventing the wheel somewhat to keep things self-contained and consistent, but I contribute the result to things like NLTK so that other people don't have to do the same. I also like using style checkers and the like to keep myself organized. (Though I say this, you can get a more honest picture of my coding style by looking at what I've actually written.)
Students who want to work with me should
- have basic knowledge of Python, C++, or Java (e.g. be able to write a dynamic program in that language),
- understand probability (Bayes rule, conditional probabilities, smoothing),
- compile LaTeX documents using BibTeX, and
- use version control software (e.g. git or svn)
These are the bare minimum requirements. If you do not meet these requirements, please take some classes to acquire these skills (preferably mine!) before asking to collaborate on research.
You should already code in some language pretty well, and conforming to my coding style will increase the probability that I'll be more hands-on in helping you code and debug, but if you want to program in LISP or Prolog, that's perfectly fine too, as long as it works for you.
Being comfortable with probability is probably the more important requirement. You'll likely have to deal with messy probability distributions, take expectations, derive conditional distributions given a joint distribution, implement dynamic programming to sample from PCFG grammars, do Taylor approximations, do some optimizations, etc. This shouldn't be taken as a laundry list of things you should know (it's great if you do) but just as a heads up of the kinds of things you might run into; part of a graduate education (life, for that matter) is learning new stuff. There will be many opportunities to learn: from classes, your peers, and reading group.
I think attending (and contributing to) a reading group or two is critical for learning about a field and being a good scholar; it's fun and not a chore at all, but I want to be up front in saying that any student of mine should be an active participant (i.e., don't just show up; you need to present paper and be involved in the discussion of every paper. If you didn't understand a paper, ask smart questions until you do. If you did understand a paper well, answer other people's questions.) in a reading group or two.
Reading groups are also important for being able to "look smart" when you're interviewing. You'll need to be able to connect your work to what other people do. A reading group lets you know how your thesis connects to other research topics and talk intelligently about them. Unfortunately, this can't be done quickly; it requires dedication over many years to learn about the breadth of research that folks explore. So while you might feel like skipping reading group once is a good decision to get more work done, it's ultimately a bad decision because you need to consistently go to understand a broad range of topics.
There are also some other skills that I expect students to have. If you don't feel comfortable with these things, you should work on learning how to do them well. You should have learned how to do this during your undergraduate program.
- Write professional e-mails
- Use a proxy server, VPN, etc. to access articles from the library
- Make a webpage
- Interact with a *nix system using tmux / screen for a persistent SSH session
- Edit a LaTeX document
- Use version control
How do you interact with students?
I like to have a group meeting every other week with students I'm working with (broadly construed), and one-on-one meetings as needed with students. I use Google calendar to set up my appointments, so students can grab a meeting whenever they need to. I expect students working with me full time to meet with me on average once a week (sometimes much more, such as before a paper deadline, and sometimes less). I use this online system so that my meetings are contiguous and that students always know when I'm available (and I can change things without e-mail). Students should sign up for a meeting at least 24 hours in advance. It's okay to schedule meetings outside of that time, but that should be the exception (I try to maximize the amount of contiguous time I have to research, write, and think).
In addition, everyone in my group (me included) sends a weekly e-mail to everybody saying:
- What they worked on that week
- What they plan to work on next week
- Anything that's holding them up or blocking their progress
Anyone who is working for me full time or who is my direct advisee must send me such an e-mail (with the subject [Snippet YYYY MM DD]) sometime between Friday evening and noon Eastern time Monday. I find that this is very helpful because I sometimes ask myself (or have funding agencies ask me) what I (and my students) did in a particular time period. These e-mails really help me figure that out without bugging other people. It also helps me stay productive by setting realistic goals; I use this weekly todo list to populate my daily todo list.
So what makes a good goal? You should have "Big Picture Goals" that carry over from week to week; these are often at the level of something you want to make happen this year or semester. Every week you should do something that brings you closer to achieving those big goals. Within a week, your goals should be smart. Don't have vague goals like "write code" or "continue reading". It should be obvious whether you succeeded or not in your goal (specific and measurable), it should fit in with the big picture (relevant), it should be doable in a week (time-bound and attainable).
It's okay to send it earlier than Monday; the weekend is fine too. If you're a day late, that's less good, but better than not sending it at all. However, keep the Monday date in the subject so I can search for it. (Or just reply to the first Snippet that gets sent; no need to wait for me, it's okay to start the chain.)
The snippet should be sent to both the project you're working on the group e-mail list.
How should I decide the right mode of communication?
The big questions for which communication mode to use should be sensitive to the person that you're talking to: their schedule, how long it will take to respond to your communication, and how focused they are on what you're asking about. Beyond the recipient, there's also the urgency of the communication.
Schedule: Some communication is synchronous while other communication is not. Instant messages, phone calls, and in-person visits are synchronous, while e-mail is asynchronous. Thanks to having a young daughter at home, I have three blocks of time when I work: before she wakes up, when she's at school, and after she goes to bed (I don't always work in all three slots, but this is when it's possible for me to do any work that requires concentration). However, I don't like synchronous communication outside of normal working hours as a rule; I use this as an opportunity to catch up on my asynchronous conversations (although there are exceptions, which we'll get to in a bit).
What it Takes to Reply: Chats are good for really quick pieces of information: "what was the name of that Python plotting package again?", "who does reimbursements for iSchool". However, questions that can be posed easily are not answered easily "Can you send a rec letter in Italian to the Vatican focusing on my Latin NLP?"; e-mail is a better way of keeping track of these requests (I have better integration with my todo lists). It's okay to send occasional (no more than one per week) reminders over chat or e-mail.
Focus: If you know I'm making a big push on the thing that you want to talk about right now, chat may be a better avenue for talking about it if it might change what/how I'm working on something. If I'm working intently on something else, I would probably prefer an asynchronous communication so I can maintain my focus and get your message a little later. I try to let people know my plans (which often change) in my weekly snippet.
Urgency: Of course, sometimes there are exceptions. Sometimes you're having an emergency, I've dropped the ball on something, or there's leftover food from Krazy Kabob. In such cases, the above rules go out the window. If a student sends me an instant message on a holiday or 9pm saying "can we talk?", I'll naturally do whatever I can to talk to them ASAP. However, please do not be the boy who cried wolf: if you just needed clarification on a logistic regression, that should have been an e-mail.
This is also true when urgency is created by leaving something to the last second. I often have many submissions for conferences, camera readies, etc. If you wait until the last minute, you're stealing time from other students who have been responsible in managing their work.
Outside of that, I prefer face-to-face communication (when I'm not sitting down at my computer being productive) or e-mail as a communication mechanism. Instant messages are also sometimes okay for quick questions, but never send an e-mail and then ask via IM "Did you see my e-mail?"
Do I have to be in lab?
The below answer is obviously not in effect during a pandemic.
One of the great things about academia is the ability to have a flexible schedule, working when and where you want. However, there are limits to this. On days where we have meetings, it's best if you come in person to those meetings. Within reason, it's okay to join remotely some of the time, but the norm should be to attend in person.
Beyond meetings, it's also good to work a full day at least once a week. It's important to have a place where you can work productively in the lab, be a part of the lab community, and to absorb the lab culture and its tacit knowledge. Don't just appear on campus for meetings and then disappear.
Finally, when we have a big paper deadline, you absolutely must make every effort to be available and responsive in the leadup to the paper deadline. If a collaborator is not physically present in lab with you (e.g., there's a global pandemic), then figure out a way to keep everyone in the loop: Slack channel, daily video standups, etc. I work very hard to make sure I'm able to do this (e.g., flying in relatives to help with my daughter), and I expect you to do the same. The most important part of your job is publishing papers, and while there are good electronic tools to facilitate collaboration, they are not a replacement for in-person communication. Particularly for inexperienced students, being around older students working on papers is a very valuable experience. You don't know what you don't know, and you can get valuable information from being in the same room as other people working on papers.
I'm working with undergrads / high school students. How do I lead a research project?
First, you need to set the research agenda. Create a roadmap that lays out all of the things that need to be done and pointers and references to help them along. The sorts of things that should be included in this roadmap are: references, sketches of models, pointers to where data / tools live, etc. This document should be very complete; try to anticipate the questions that will come up (i.e., preempt frustration).
In communications, make sure you keep me (faculty member) in the loop. I'm ultimately responsible for what happens, so I should know what's going on. Ideally, I'll leave the two of you to do good research on your own, but I like to see a good project develop. I also want to be able to step in if problems are developing.
It's important to keep the faculty in the loop from the start because eventually you'll need to add them, and you don't want it to look like somebody is getting in trouble! If they're always involved, there isn't a problem.
It's important that the students are constantly able to make progress. If they're waiting on you for something, you're losing momentum on the project and increasingly the likelihood that they'll abandon the project (the biggest risk of working with undergrads). Make sure the project is structured in small bite-sized pieces that can be tackled linearly.
Make sure that the lines of communication are open. Junior students should be making regular updates (sending snippets, make sure to explain this concept to students). If a student disappears for a while, make sure you follow up and see what's going on.
Ideally, leading a research project should be a low bandwidth, low latency interaction. You need to be prompt and responsive, but if things are working well, junior researchers will be doing most of the work. You need to make sure that they have the tools and information to make progress.
I need you to do something (look over a draft, send an e-mail, etc.). How should I best make sure that happens?
The most important thing is to make sure it's on my radar. If you have an important deadline, make sure it appears in your snippet that you send me weekly. I will make sure I budget my time to ensure that it gets taken care of. Give me as much warning as possible. I get grumpy if I have to rearrange my schedule for you at the last minute.
It's fine (and helpful) for you to remind me. However, I'd like to make the following caveats. Unless the deadline is hours away, the best way is over e-mail; not phone or IM. It's less intrusive and I have systems for dealing with tasks that arrive over e-mail. The frequency of the reminder is also important. No more than once every five days, I would suggest.
Finally, make it as easy as possible for me to do what you need me to do. Have your reminder e-mail reference all of the material I need to do the task. If I'm reviewing a paper, remind me where in the repository it lives and send me a compiled PDF. If I need to write a letter, provide the background material and the contact information in one place.
How important are classes once I'm a PhD student?
One very frequent problem I see is that young first year PhD students want to do very well in their classes and think of research as a hobby.
For RAs, it is very much a job. Your professor has secured funding for PhD students to do research and to produce results. If you fail to produce, it makes the professor look bad to his funders, and the professor will not want to pay you to do research in the future (i.e., like a job, you can get fired).
Grades are not important whatsoever, so long as you're not getting kicked out of the program. You should use classes to become a better researcher, but if you're chasing after an A when a B would suffice and your research suffers, that's detrimental to yourself, your professor, and to science.
If you're not an RA (on fellowship or TA), then doing research is often a tryout for an RA. Unless you're 100% sure you'll have fellowship funding your entire time as a PhD student, you should make sure your professor would take you on as an RA in a heartbeat if needed.
How often should I be publishing?
You should always have an idea that you're actively working on for a paper. Publishing between 1-2 papers a year is a good average (however, this does not mean that you'll always have a publication every year). Under normal circumstances, I expect students have one publication at least submitted before the end of their second year, two by their proposal, and three by their defense (it's of course fine to have more, but don't prioritize quantity over quality).
If you haven't submitted a strong paper in two consecutive years as a first author in a top venue (regardless of whether it is accepted, which can be unfortunately unfair/unpredictable), that's a huge problem, and you're unlikely to get an RA in the future.
I'm submitting a paper we talked about, can I add you as an author?
I should not be surprised by a paper. If I'm going to be an author, I want to: 1) see a draft with the "big picture" at least two weeks before the deadline 2) see a nearly complete draft at least a week before the deadline. (I reserve the right to still say no to papers even if you follow these rules, e.g., if I'm on vacation.)
For students working directly with me in my group, this is less of an issue, I know what's going on and can judge what's going on and whether we can submit (a collaborative discussion). But for students who come to me to discuss an idea, vanish for two months, and then suddenly appear and want me to be a coauthor, this can be pretty annoying. My likely response is "no", I will not be a coauthor, and I will not contribute to the paper. If you wait until the last minute, the paper likely won't be any good, and I have other papers with authors who were responsible and played by the rules.
You can still choose to submit, but do not list me as an author.
Can I work on projects that don't involve you?
First, there's a question of funding. If you're funded on a fellowship, TA, or self-funded, then you just need to make sure that I'm happy to continue advising you (i.e., making good progress to your degree, fulfiling the requirements of the research group like sending snippets). It's fine to take a break and explore your interests, but don't ignore your thesis.
However, If you're funded on a grant, you need to be working on work that's consistent with the goals of the grant. Maintaining these relationships is necessary for me (and future students) to have funding. If your only publication in three years has a majority of authors not working on the grant, that will also look suspicious.
This isn't to say that you can never work on a project that doesn't involve me. For example, many students need a week or two to wrap up their internship projects. This is totally fine, and it's not reasonable (or appropriate) for me to get involved. However, if you're still working with your internship advisors six months afterward and it's interfering with your grant-funded work, then I either need to be involved or you need to give it up. At the very least there needs to be a frank conversation between me and the internship host (it's not fair for you to have to manage these conflicting relationships/priorities).
Why is it important to cite related work? Can't you just add the citations for me?
I often ask students to cite papers when we're working on a draft. Sometimes a citation will be very trivial to add (e.g., at the end of a sentence), and students may rightly wonder why I don't just add it myself. Am I really that lazy?
Sometimes I am so rushed that it is indeed partly time pressure that prevents me from citing something myself. But often I say this because I want you to read the paper. It may not be a paper you're familiar with. If I just cite it, then you don't learn the material in the paper (and since this is a paper you're writing, you should know about that material).
Sometimes I'll be deliberately vague ("you should cite Eisner/Dreyer here"). Again, this could be me being lazy, but sometimes multiple papers could be relevant, and I'm not sure which is the best paper that should be cited in this circumstance. Moreover, particularly when an author (or group of authors) have written a number of papers on a topic, you should be aware of the whole trajectory (and there could be follow on papers I may not know about).
Cool. So this means I can just ignore your citations until I get to the related work section (which I'll save for last)?
NO! Knowing about previous work could impact all aspects of the paper. You might find out about a dataset, evaluation, or framing of the problem that could help you write other sections of the paper. Science is about standing on the shoulders of giants: if you don't know what has come before, how can you improve on it?
This has become an increasingly vexing issue in the age of deep learning; students believe that neural networks are magic and that any technique that doesn't have a hidden layer and a nonlinearity isn't worth their attention. I can confidently say that this is not the case (a least in 2018), and older or non-neural papers are still worth reading, even if your model is neural.
I did a websearch and found some webpages. Should I just cite them?
Peer-reviewed publications are typically built on other peer-reviewed publications. Citing unvetted ArXiV publications or random webpages is not as authoritative as citing a published book or article. So you'll need to look at journals, books, and start tracing the references backward until you start finding primary sources. This could be microfiche of newspaper articles, old journal publications, or dusty books in a library.
This sounds like a lot of work, but there are people whose job it is to help you do this: reference librarians. Talk to them, figure out how to do these sorts of bibliographic searches. The first time you talk to them you'll learn all sorts of tricks (including many that don't require to physically go to a library). Once you learn the tools available to you, you'll be in far better shape to do a good literature review.
However, sometimes, the best resource is a webpage. But this should be your last resort. If your citations are peppered with old journal articles, newspaper articles, and you've proved that you can do your legwork, they'll trust you that a website is the correct citation for a fact and not just the lazy way out.
I didn't become a computer scientist to spend time with books. If I can't find it online without leaving my chair, it's worthless. Why should I do this?
Getting a PhD is about learning how to do research. Research in computer science is not just hacking code and running experiments. To do research correctly, you must enter a conversation that can span decades or centuries. It is hubris to imagine that everything relevant on a subject is contained in ArXiV articles and that few searches on the Internet reveal everything there is to say on a topic.
To do that kind of research, you need to understand, write about, and cite the relevant related research. While much is online, a good quantity of older material is trapped behind paywalls or on paper (computational linguistics is indeed better than many other fields in this regard though, but not all of your research will be strictly within computational linguistics). This is a consequence of a complicated combination of copyright law, history, and inertia. You will look foolish if you claim novelty where someone has done it before or if you don't understand what other people have written about the subject.
Many students come in with the goal of being a professor. A professor needs to be facile with many sources of information and in possession of an understanding of the broad sweep of a field's intellectual history. Call me old fashioned, but I don't know of a way to do this without sometimes setting foot in a library (that said, if there's a way to do it without setting foot in a library, let me know, because I'm lazy too).
You asked for a PDF/A version of a document, how do I make that and how do I make sure that my files are compliant to start with?
If you already have a PDF, you can convert it online.
But make sure your files are compliant without conversion.
I need to ask ISS for an extension of my student visa. What information do I need to give you.
You need to give me a draft of a letter (including your UID) explaining why you need an extension (written as I would write it), a date you plan to complete the program, and a date of defense, a date of thesis submission, and all of the courses you will be registrered in until your graduation (with the number of units).
Academia and Research
Is topic modeling dead? Should we all be doing deep learning?
Deep learning should be part of any modern researcher's toolkit. However, I do not think that this means that we should completely abandon topic models. Topic models are still very useful for use cases where interpretability is important. You'll still see many researchers in digital humanities using topic models, for instance, because they care about telling a good story and understanding their data.
As topic models become more of a utility, I think we'll see less of the "topic model of the week" that we saw 2005-2010. I think the important questions are how to incorporate topic models into real-world workflows and measuring whether topic models help users with those tasks. At the risk of self-promotion, I think a good example of that is Forough's paper on how topic models help people annotate data more effectively.
One place where we will see less activity is topic modeling is as a feature for downstream model, which was quite popular for a while. Here, word embedding have completely taken over. They obviously do a better job, but perhaps the interpretability of topic models was a nice side effect that we're missing out on.
For a more complete overview of where I think topic modeling has been, what's it's been useful for, and where it's going, Yuening, David and I have a new book on Applications of Topic Models.
How should I collect/store data?
Google spreadsheets are good in most cases during collection. But once we're done, they should be stored in a way that's long-term readable (e.g., JSON/CSV) and deposited with a library.
You cited me. Can you write a letter for me for immigration purposes?
I have done a number of these letters and most are quite easy and painless. However, I've had a few bad experiences writing these letters, and I'd like to save myself some pain and frustration.
First, I will only write these letters if I know who you are. Just citing a paper (of which there are many authors) is not enough of a relationship. It is much more helpful if we've meet somewhere or if your contribution to the paper is clear (a letter of introduction from your advisor may be useful).
Second, you will need to help me out by sending me material to help me write my letter. This information must be accurate and well-written. I have had people send me information with obvious errors (confusing me with another letter writer; confusing dates; or not understanding the difference between models, datasets, and inferred parameters). If you send me materials with enough errors, I will ask you to find another letter writer (sorry!).
You can get a sense of what I consider to be good writing from my style page. Your material should obey my stylistic conventions and not have useless words. I will obviously edit it myself, but make it as easy as possible for me to do that first. I highly recommend that you have a non-lawyer native English speaker review your material before sending it to me. If you send me materials that are not well written, I will ask you to find another letter writer (sorry!).
Often, these letters will need to be reviewed by a lawyer. I do not want to be exchanging marked-up Word or WordPerfect documents. I am only willing to work with Google Docs for document review. If you and your lawyer are not willing to work with this mode of collaboration, please ask someone else (sorry!).
So, even after a conditional yes, I reserve the right to say no if the material or the collaboration don't meet these conditions.
You're part of an iSchool? What's that?
It's fun. Unlike computer science, which can sometimes ignore humans, iSchools care about the intersection of information, technology, and society. It's a good fit for me because I'm interested in computational social science and human-in-the-loop machine learning.
I'm trying to use your code, but I'm having trouble. How should I get help?
E-mail all of the people who worked on the paper associated with the code with
- a minimal (simple as possible) example that can replicate your problem;
- the inputs that replicate your problem (again, this should be as simple as possible; sending multi-megabyte files is usually not minimal);
- exactly what you did (the exact command line used);
- what you expected to see;
- what you got instead (include error messages and any output); and
- what versions of various resources you're using (NLTK, Java, gcc, boost, protocol buffers, etc.).
This information is necessary for us to help you with your problem. The simpler it is to replicate your problem, the faster you will get a response. More complicated setup take longer for us to try out and debug. If your example is simple enough, we can often see the problem ourselves without running code.
Each e-mail should be self-contained. All the information to reproduce the bug should be in one place. This helps us quickly reproduce the bug, and it also ensures that you've not tweaked anything that might prevent us from isolating the issue.
I have a question about a paper you wrote. What's the best way to ask it?
First, it's great that you are interested in the paper. Thanks so much. E-mail all of the people (not just the first author) who worked on the paper. If you want to increase the probability of a quick reply (or avoid needing one), please:
- Ask a self-contained question. Assume that we've forgotten section numbers, page numbers, etc. associated with the paper.
- Link to the version of the paper that you're reading (try to find the most current one); often mistakes / questions are resolved in later versions. E.g., workshop papers sometimes become conference papers, conference papers sometimes become journal articles.
- See if the answer is in the relevant student's thesis.
What are your pronoun preferences?
For myself, he/him/his and they/them/their are both fine. I prefer the latter for academic writing and the former for day-to-day communication.
If I use the wrong pronoun for you, please let me know ASAP.
What's up with your name? Why is it hyphenated? What should I call you? Why is your UMD username "ying"?
My parents' last names are Boyd and Graber. When I was born they hyphenated (why people whose nicknames were "Toni the Body" and "Little Grabber" would do so is beyond me; my nickname is obvious). As a result, I am deeply, personally, against hyphenating names. Don't do it. It's not a sustainable practice, and it leads to all sorts of problems. People think my last name is just "Boyd" or "Graber", web forms don't think I have a valid name, and there's only about a forty percent chance someone will get my name right after one telling.
Most people call me Jordan, which is just fine by me. I also answer to JBG.
Our family calls itself the "Ying"s (wife's name). That's why my UMD username is ying (and why "Ying" is listed on Testudo). My wife, who got to UMD after me, is zying.
I'm a TA or grader for one of your courses; what do I need to know?
- First, make sure that we have a meeting before the semester starts.
- Attend at least a class or two to get a feel of what's going on.
- As each assignment is posted, look it over to make sure I haven't done anything stupid (e.g., a confusing problem); it will make your life easier.
- Once assignments arrive, create an ontology of all of the mistakes that people have made (do this before you start "grading"); this will allow you to fairly and consistently deduct points.
- Using that ontology, create a template that you can use to provide feedback to students (e.g. by copy/paste or deleting). This allows you to explain each mistake in detail without having to retype the same thing over and over again. It also ensures that you give consistent feedback for each mistake people make.
- Post a synopsis of the mistakes that people made and how to correct them.
- Never give a grade without explaining why people got the grade they did.
Why did you leave the University of Maryland and then why did you go back?
I came to Colorado (where I was born) to be close to
my family (especially my dad, who had some health scares) and to start
of a family of our own. By 2016, all went according to plan: my dad
was doing much better and we had a healthy daughter. However, there
were four big reasons I left Colorado: TA allocation, my wife's
career, tenure, and lab space.
Who gets what TAs seems like a really minor thing. How did that affect whether you stayed at Colorado?
While other faculty in my research area taught twenty person courses or forty person courses with a TA, I was assigned a hundred person class without a TA. (More detail: This was in August 2016; in February 2016 I had a meeting where I thought we had agreed on a single TA for my 100-person class—I even sent a e-mail Jim Comey-style afterward to confirm this agreement, but only in August 2016 was I told that no such agreement was reached.)
How does your wife's career fit into all of this?
My wife was laid off while five months pregnant; afterward, we hoped that the university could help. There were open positions at the college that were a good fit, but she was rejected without an interview (they ended up not hiring any of the people they did interview and relisted the job without talking to my wife). I repeatedly asked for an explanation/discussion on the subject, but didn't get anywhere.
Were you really concerned about tenure? You had a bunch of publications, a CAREER award, the Karen Spärk Jones award, "best of" awards at NAACL, CoNLL, and NIPS. Why don't you just shut up about tenure?
I was worried about tenure. Not because I didn't think I had done okay research-wise (still a little worried about this until it was over, to be honest), but because Colorado wouldn't let me go up for tenure in the first place (I wanted it over with and I was worried Colorado was setting me up to fail on teaching to string it out three to four years, which seemed like a possibility given the lack of certainty and promises that didn't pan out).
The TA allocation issue (above) wouldn't be so bad if it also wasn't tied into whether or not I would get tenure. When I came to Colorado, the offer letter treated me like a fresh out of PhD assistant professor (despite four strong years at Maryland…other faculty from Maryland who came to Colorado CS that year did get credit toward tenure despite less time on the tenure track). Thus, I was told that I wouldn't be able to get tenure until 2021, eleven years after I started on the tenure-track (there was another non-negotiable delay because I got teaching relief when my daughter was born). After some negotiation, it seemed that there might be some wiggle room there, but only after I taught that low-level undergrad course.
However, I was stuck with a giant, new (i.e., taught for the very first time by me or anyone else) course with no TA (see above). In addition to being a lot of work, this course would determine if I would be able to go up for tenure "early" (to Colorado, anyway, the very earliest I could get tenure at Colorado was three years after I would have gotten tenure at Maryland if I stayed). At the time, I felt like I would have a very difficult time getting tenure at Colorado.
But you announced that Colorado gave you tenure in 2017. Why wasn't that the end of it?
The same week I learned that the College of Engineering would finally get my tenure case rolling (thanks to an offer from UMD), the department moved my lab/office from a space I was perfectly happy with to a windowless basement (which I was very unhappy with). There was no discussion or consultation before the changes were announced. Also, my wife was still unhappy with career prospects in Colorado.
I came to the realization I am not the bureaucratic bare-knuckled brawler required to make it on my own at a place like Colorado, and I didn't have the kind of support a crappy negotiator like me would need to survive. Even after getting an offer from Maryland, Colorado did not write anything down on paper (i.e., no retention offer) to resolve the three outstanding issues that prevented me from being happy at Colorado: TA allocation, office/lab with natural light, and a two-body solution. While various verbal assurances were made, nothing was ever written down. In the end, it became clear we would need to find our own solution and leave Colorado; I had gotten verbal assurances from Colorado that didn't pan out before.
Why put this academic dirty laundry on your webpage?
I don't want people to make the same mistakes I did. One of my closest friends was contemplating a pre-tenure move and was about to sign a standard assistant professor contract. I doubt it would have turned out as badly as it did for me, but if I can prevent that from happening to someone else, it's worth it.
Also, I think there's value in sharing when career plans and tenure don't work out perfectly. It helps people understand the reality better than social media's rosy sample bias.
How did this make you feel?
I answered this question on Quora, which caused a bit of a stir at Colorado.
My response was fairly typical for these sorts of events, since in many ways I was grieving for what seemed to me to be the end of my academic career. First I was in denial, thinking that everything was okay and that I would get a TA for my class despite warning signs to the contrary; when there was clear evidence of a problem I thought that all of my precautions (follow-up memos, having mentor in room when I met with department chair) would save me. I also put off actually meeting with the chair because I assumed that things would just work out without difficult conversations.
Once it was clear that I didn't have a TA and that my tenure plan was derailed, I was angry. I remember one evening when I was with my sleeping daughter buying more formula at King Soopers. I had just gotten an e-mail from the chair that said that they didn't believe that I was ever offered a TA (after my attempts to clear up the confusion failed, and things like my after meeting memos were ignored). Another faculty member saw me and wanted to chat about research. I was both angry but also too humiliated to share what was happening to me. I had a barely coherent conversation with them and then hurried back home to sulk and rage alone until my daughter woke up.
Then I tried bargaining to make sure that tenure would still work out for me. I asked if, given the circumstances, the actual course evaluations from this specific class could be excluded from the tenure evaluation. The chair said they passed on the request to the dean but the dean didn't reply, which the chair explained, "I never heard back [...] That only happens when the question was inappropriate". This was depressing because I wasn't important enough for the department to make a strong case for me and not important enough for the college to even respond to.
Then I got depressed. I kept going through the motions of research and teaching, but didn't do a particularly good job of either. My wife eventually got things moving with her own job search, which helped get me out of my rut.
After I accepted that I wasn't going to get tenure at Colorado without a fight and more tribulations, I started to hope that I might be able to get tenure elsewhere, which is how I eventually got out of the mess.
Name names. Who were your enemies and Colorado?
At times it felt like there was a grand conspiracy to prevent me from getting tenure (sure, we'll move up your earliest tenure date, but only if you pass the gauntlet teaching the largest course we offer without a TA! Survived that course? To the dungeon (basement) with you(r lab)!). I don't think that was actually the case. Balls were dropped, things got delayed, and people were super busy. I still have nightmares about it, and I sometimes wonder if I had said the right magical incantation or been a little more assertive (e.g., shouting "written retention offer" at the top of my lungs) I might have been able to stay close to my extended family and avoid uprooting immediate family. I think I did the best I could at the time, though, and I also believe the people around me were doing the best they could under the (unusual) circumstances. It didn't help that there was a change of leadership in both the department and college. While I still hold a grudge against the institution (which stings as a native Coloradan), I don't blame any of the people (whom I'm not mentioning by name).
There is one person I will mention by name, however: Martha Palmer. Martha is completely blameless in all of this. She is one of the finest, most honorable people I have ever met. As my mentor, she did everything in her power to help both me and my wife. One of the best things to come out of my time in Colorado was to get to know her better. Colorado is lucky to have someone like her there.
Why did you go back to Maryland?
Thankfully, in late 2016, my wife got a great job offer from UMD and six months later I was very happy to be able to follow her (there were some stressful months in between, though!). I was very excited to be returning to the great research environment with supportive senior faculty across computer science (tenure home), UMIACS, language science, and the iSchool (each chipped in for my position).
Outside of working hours, there's a lot I will miss about Colorado (where I was born and paradise on Earth!), and I am hoping for lots of opportunities to spend time there and maintain the professional connections I've made to the great faculty at Colorado. It's too bad we couldn't advance our careers in Colorado.
I had described going to Colorado as "returning home". However, Maryland made me a postdoc offer when nobody else would in 2009 (eventually two other fantastic postdoc offers made this a very difficult decision, but Maryland's was the first in a particularly brutal hiring season and thus remains psychologically significant), hired me as faculty when nobody else would in 2010, and then finally gave me a path to tenure when nobody else would. While Colorado is the place where I was born and where I have the most relatives, Maryland is my academic home, and I'm glad to return.
Why didn't you consider other places?
When I moved to Colorado, I only applied to schools in Colorado. When I left Colorado, I only got an offer from Maryland. I didn't apply to too many places because I had students on the market and didn't want to compete with them (thankfully they all got tenure track jobs!), but I got turned down from three schools in Illinois (where my wife's family is), two West Coast schools, two East Coast schools, and one school in Europe.
In hindsight, I was probably too vague in my cover letters, and I deferred discussing my tenure trauma to personal e-mails to specific faculty I knew. Several of those faculty were on sabbatical or were (unbeknownst to me) changing affiliation themselves, so the schools didn't know how motivated I was.
Apart from researchers, was there nothing you liked about Colorado the institution? Would you ever consider going back?
There was a lot that I did like. I liked using Concur for reimbursements (this seems minor, but reimbursements are a major source of stress for me, and I hate Maryland's systems with a vengance), and the centralized grant administration did seem like a good way to handle things. I loved that RTD worked well with campus and that I could get all public transportation for free.
With the right leadership (at the department, college, and university level), fair TA allocations, livable grad student stipends, and better lab space, I would consider going back. However, having been cast out of paradise, I don't think I will be able to return.
I'm a crowdworker who did one of your tasks and I need help / don't feel things are fair / etc.
First, I'm sorry that you didn't have a good experience. We want everyone to be treated fairly, to know what is going on, and hopefully also to have fun. However, because we potentially have hundreds of people working on a single task, we are not able to have the level of personal communication that we would like to with all of our crowdworkers.
Talk to the right people in the right order. Please do not contact various people at the university before talking to us. Make sure to talk to the relevant grad student first; if after a week you don't get a reply, it's okay to ping the relevant professor.
Make it clear that you understand the task. Many of our tasks qualify workers by making sure they get the right answer on very easy questions. It is possible that your work was rejected for getting those wrong. It could be that our answer key is wrong and that your answer was correct. If so, please make it as easy as possible to understand what is going on.
We make mistakes. But we correct them when we do. Help us recognize the mistake faster, and we can help you faster. Provide context and rationale for why you think something is a mistake.
Be Patient. We don't work 24/7, and sometimes there are things that prevent us from giving you a quick reply. Please do not contact us every eight hours (although it's fine to touch back after a couple of days).
Be Professional. I have had students crying in my office because of the abuse that they've gotten from crowdworkers. Please do not do that. I also do not like crowdworkers digging up my phone number and calling me at 9PM on the weekend. This causes us to be be defensive and less willing to be helpful.
I'm a recruiter for Amabooksoft Finance. When can we schedule a call?
Having tried out corporate research during my sabbatical, I can safely say that it's not for me (at least full time). The only thing that could possibly tempt me is a lab with good funding, the ability to teach/collaborate with a nearby university, excuses to visit German-speaking Europe, and freedom to publish. So unless these things are on offer, the answer is probably no.
What might be tempting for me would be: a new NLP lab in Colorado focused on publications with academic freedom, joining a publication-focused lab with academic freedom in German-speaking Europe, or a publication-focused fully remote position.
I'm interested in more creative ways of connecting up, though. Sponsor my Youtube channel (over 500k views ... where thousands of people learn about machine learning and NLP), sponsor a competition that we run for high school and college students, or let us take some old GPUs off your hands.
Oh, but this company/contractor/pet shop is different. We have a VISION. When can we grab coffee?
I'm sorry, I get a lot of these e-mails, and I'm fairly happy with my research portfolio, compensation, and flexibility. It would be unfair to my students to spend too much of my time talking to recruiters (and I've been burned in the past). If you're really serious, make it clear to me that you understand what I care about in terms of research freedom, geography, topics, and flexibility via a personal e-mail pitch.
We have a tenure-track opening at Oxbridge State, would you be interested?
Only if there's a simultaneous two body hire and a good academic research environment (particularly with strong PhD students) that can compete with the University of Maryland in the areas of machine learning and NLP.
I have an innovative solution to cloud computing / class management / office chairs. When can we chat about how you can start as a customer?
I'm almost certainly the wrong person to talk to. In many cases, the University of Maryland has strict rules about procurement that prevent me from actually starting a contract with anybody. Moreover, I prefer development on open platforms that prevent lock-in with a single vendor. Most of the innovation that's relevant to our research comes from within the community; if your company isn't part of that ecosystem already, it's unlikely to be a good fit.
Moreover, we academics are pretty poor. We need to scrounge for cloud compute credits and our projects are often fairly bursty in development and usage.
What's your Erdös-Bacon number?
My Bacon number is 3 if TV appearances are permissible links: I appared on Jeopardy! with Alex Trebek, who had a cameo in Short Cuts (1993) with Fred Ward, who was in Tremors (1990) with Kevin Bacon. Thus, my Erdös-Bacon number is 5, higher than Feynman and Carl Sagan, tied with Bernard Chazelle, and lower than Natalie Portman and Noam Chomsky.
Sprichst du Deutsch? Darf ich mit dir auf Deutsch reden / e-mailen?
Zuhause sprechen wir Deutsch, und meine Familie stammt aus Deutschsprächigen Gebiete. Mein Deutsch ist nicht perfekt, aber ich übe gern. Wenn ich wenig Zeit habe, werde ich viellecht Deutsche e-mails auf Englisch antworten (sonst antworte ich nie!), aber bitte schreiben Sie weiter auf Deutsch!
You seem to understand Chinese. Can I talk to you in Chinese and send you e-mail?
I'm okay in understanding spoken Chinese, and I can understand basic things if spoken slowly (I cannot understand scientific topics). However, if you talk to me in Chinese and I think you can understand English, I will likely reply in English because I've very uncertain of my Chinese. However, please do not send me Hanzi e-mails (particularly PDFs, which I can't easily put through Google Translate); the limit of my Chinese reading skills is finding 红烧茄子 and 小麦啤酒 on a menu. I'm otherwise nearly completely illiterate.