Why I Stopped Working on Machine Translation

Although I'm happy to engage with students who are working on machine translation problems, I'm no longer actively working on MT myself. I've found myself on occasion explaining why and thought I'd write a little bit as food for thought.

I have strong feelings about the overall enterprise of MT (not neural MT per se, but the whole big picture of how people are approaching the problem). After spending a decade leading a superb statistical MT research team (which included David Chiang's invention of hierarchical phrase-based translation and a generation of stellar graduates like Chris Dyer, Vlad Eidelman, and Adam Lopez), I came to the conclusion that the vast majority of the research, as driven largely by the funders and their evaluation paradigms, is simply working on the wrong problem. As far as I can tell, the majority of MT work (e.g. as useful overviews see https://www.aclweb.org/anthology/W17-3204.pdf and https://www.aclweb.org/anthologyf/W19-5301.pdf) still basically operates within the narrow box of trying to obtain a fully automatic, single-best, minimally contextualized sentence in the target language given a sentence in the source language, regardless of the reason a translation is being sought in the first place, which is to say that the mainstream pursuit of MT as fully automatic high quality machine translation (FAHQMT) of sentences has not really changed substantively since the advent of computerized methods for MT in the 1950s.

In contrast, my personal view (which is certainly open to debate!) is that both the scientific questions (to the extent they're still asked at all, which is a different conversation) and real-world value related to communication across the language barrier would be better served by starting from the position that the goal of the enterprise is translation, defined in terms of input that includes all the available contextual information, and defined with the goal of obtaining a representation of meaning for downstream consumption that is (a) available when you need it, (b) cost effective to obtain, and (c) good enough for the specific use case. Presumably sometimes for that the best option will, in fact, be fully automatic, context-agnostic, single-best translation of an output sentence from an input sentence. But in my view that should be a definition of a sometimes-appropriate subset of solutions to the problem, not the definition of the problem itself.

This view motivates much more serious consideration of language and task context, something very different from the current overwhelmingly technology-driven approach. A big picture consistent with my view would, for example, take things like visualized output and human-in-the-loop translation (e.g. https://www.aclweb.org/anthology/N10-1078.pdf, https://www.cs.umd.edu/~changhu/publications/gi2010-hu-v2.pdf) much more seriously, as opposed to those largely being curiosities outside the mainstream. And since we will never have sufficient parallel data for supervised training (using paradigms like the current ones, and even with transfer learning) for even minimally decent translation beyond a tiny fraction of the world's languages, the criteria I propose would trigger a serious move toward trying to identify widely useful inductive biases generalizable across language or at least across wide categories of languages --- in other words, a move back to toward mainstream work guided in part by the recognition that computational modeling of human language is not co-extensive with using machine learning architectures with text, in that the former necessarily includes some degree of insight about human language from a scientific perspective.

Just to be clear, I'm not saying there isn't insightful, interesting work going on within the narrower confines of how MT is currently defined, including neural machine translation. Some of my best friends do research in machine translation. :) Doing work in neural MT has lots of advantages not just because of the huge momentum in the field but because of what it connects to: studying neural MT and doing research in it creates an opportunity to encounter methods and problems that are widely applicable to NLP methods and problems more generally. Indeed, the connection between the problem of translation and practically everything else in NLP is why I decided MT would be a good sandbox to start playing in in the first place. That said, the actual mainstream practice of MT research --- the things that get valued by the funders and the program committees --- is hugely dominated by the narrow problem formulation and the evaluation paradigms associated with it. (See Church and Hestness 2019 for must-read discussion of evaluation, for MT or anything else.)

As a result, despite having built a very successful MT group, I found myself struggling to work on what I consider to be the real problems (see my position on the "goal of the enterprise", above), within an ecosystem where other very different aspects of the enterprise have to be kept central if you want to succeed. I'm not a shy person, and on many occasions I took the opportunity to try to push the people who define the agenda to support a broader view. But after taking a fairly serious shot at it, I found that I was not able to have the kind of impact I want to have, and I got really tired of spending only a fraction of my (and my students') time on the problems I really cared about. And life is way too short to spend the majority of your time on stuff you don't care about. So I got out.

Now, if you're reading this, it's possibly because you are interested in doing work on machine translation, most likely neural MT. If that's the case, I'd give the following advice, which applies to MT or any other research area you're considering:

Start by deciding what you care about. Is it the idea of building stuff people in the real world will use, like the industry MT systems? Is it applying your technological skills to particular kinds of problems, like improving technology for under-represented populations? Is it getting a better understanding of how human language works? Et cetera. You don't need to pick just one thing, but you definitely should think about what matters to you, and then you should look at the field of research you're thinking about joining and the extent to which it does, or does not, spend its time on what you care about. Not its potential in that regard, but the actual time spent in the day-to-day life of the researcher.
Ask how the smaller problems are related to the larger problems. The old joke is that getting a PhD involves learning more and more about less and less until finally you know everything about nothing. Another funny but true angle on PhD research is presented visually here. A useful trick when you're thinking about what to do is to ask the question: "Then what?" As in, for a moment let's suppose this smaller thing that I've been focusing on (because that's the nature of things, you make progress to the bigger things in smaller steps) has been incredibly successful --- now what? How does it get used in service of what I care about? (See previous bullet.) How does it contribute toward the next thing to work on, for me or for other people? (See next bullet.)
Drive the technology, don't let the technology drive you. Let's face it: it's incredibly easy to get seduced into just building stuff. There's some problem or task, and you see how you could do it, and these days probably there's already a really good starting point (existing datasets, packages, toolkits), so, hey, you dive in, you get some code written; you make some progress; it kind-of works; you figure out what you need to do next to improve it; you do it; now it kind-of works better; you see a cool new technological tweak in some ArXiv paper that would fit in well, so you try adding it in... etc. This kind of thing is incredibly familiar to most computational people I know.
But here's the thing: (just) building stuff isn't research. In fact, (just) trying different ways to build stuff isn't research either.
Take a second and read that again. It's important. Sometimes (often for us computational types) research involves building stuff, and conversely sometimes building (better) stuff involves research. Unfortunately frequently these days, though, people get so into being driven by the technology and building things that they forget there's a difference between the two things. But remember: they are not the same thing.
What makes work research is that it contributes to generalizable knowledge. The existence of a new artifact in the world is not the same as new knowledge, even an artifact that's really good at solving a really important problem. If there's a wide stream and a way to cross it is badly needed, and you try a bunch of ways to build a bridge to get across it, and a bunch of ways fail but finally one of them works... well, you've built something, you've solved a problem, and you might in fact have made the world a better place --- but you haven't necessarily contributed anything to human knowledge. The key question is, was anything learned in the process of getting that bridge built that can be communicated to and used by others, beyond the knowledge that this particular bridge did best at this problem for this particular stream? It doesn't have to be a ton of new, generalizable knowledge, but it needs to be something, or it's not a research contribution. (There's a nice discussion of this in the context of human subjects research here, but the same notion of a contribution to generalizable knowledge is the gold standard for what research means pretty much universally.)
Go back to the previous paragraph, replace "bridge" with "system", replace "crossing the stream" with whatever your task/evaluation are about, and replace "did best at this problem" with "improved the state of the art"; and now re-read it. And once you're done with that, whether you choose to do MT or anything else, get out there and do some research! (Unless you prefer to build stuff, of course. Which is totally ok also, and not any less valuable or important in the world -- the point here is not to fall into the trap of thinking that the two are the same thing.)

I've gone further afield than I intended, from a bit of explanation of personal research history and choices related to MT, to a rather strident generally commentary on the state of NLP in general. So I'll stop --- but first, let me observe that what I've written here, albeit possibly a bit more extreme in its expression, is not unique to me: other experienced NLP researchers are talking this way about the state of the field and trying to figure out what should be done about it. I don't know who agrees or disagrees, nor if a majority of people would agree with this view, but there are enough of us out there that if you're in or planning to be in a PhD program involving NLP, I strongly recommend you see what your potential advisors think.