Why I Stopped Working on Machine Translation

Why I Stopped Working on Machine Translation


Although I'm happy to engage with students who are working on machine translation problems, I'm no longer actively working on MT myself. I've found myself on occasion explaining why and thought I'd write a little bit as food for thought.

I have strong feelings about the overall enterprise of MT (not neural MT per se, but the whole big picture of how people are approaching the problem). After spending a decade leading a superb statistical MT research team (which included David Chiang's invention of hierarchical phrase-based translation and a generation of stellar graduates like Chris Dyer, Vlad Eidelman, and Adam Lopez), I came to the conclusion that the vast majority of the research, as driven largely by the funders and their evaluation paradigms, is simply working on the wrong problem. As far as I can tell, the majority of MT work (e.g. as useful overviews see https://www.aclweb.org/anthology/W17-3204.pdf and https://www.aclweb.org/anthologyf/W19-5301.pdf) still basically operates within the narrow box of trying to obtain a fully automatic, single-best, minimally contextualized sentence in the target language given a sentence in the source language, regardless of the reason a translation is being sought in the first place, which is to say that the mainstream pursuit of MT as fully automatic high quality machine translation (FAHQMT) of sentences has not really changed substantively since the advent of computerized methods for MT in the 1950s.

In contrast, my personal view (which is certainly open to debate!) is that both the scientific questions (to the extent they're still asked at all, which is a different conversation) and real-world value related to communication across the language barrier would be better served by starting from the position that the goal of the enterprise is translation, defined in terms of input that includes all the available contextual information, and defined with the goal of obtaining a representation of meaning for downstream consumption that is (a) available when you need it, (b) cost effective to obtain, and (c) good enough for the specific use case. Presumably sometimes for that the best option will, in fact, be fully automatic, context-agnostic, single-best translation of an output sentence from an input sentence. But in my view that should be a definition of a sometimes-appropriate subset of solutions to the problem, not the definition of the problem itself.

This view motivates much more serious consideration of language and task context, something very different from the current overwhelmingly technology-driven approach. A big picture consistent with my view would, for example, take things like visualized output and human-in-the-loop translation (e.g. https://www.aclweb.org/anthology/N10-1078.pdf, https://www.cs.umd.edu/~changhu/publications/gi2010-hu-v2.pdf) much more seriously, as opposed to those largely being curiosities outside the mainstream. And since we will never have sufficient parallel data for supervised training (using paradigms like the current ones, and even with transfer learning) for even minimally decent translation beyond a tiny fraction of the world's languages, the criteria I propose would trigger a serious move toward trying to identify widely useful inductive biases generalizable across language or at least across wide categories of languages --- in other words, a move back to toward mainstream work guided in part by the recognition that computational modeling of human language is not co-extensive with using machine learning architectures with text, in that the former necessarily includes some degree of insight about human language from a scientific perspective.

Just to be clear, I'm not saying there isn't insightful, interesting work going on within the narrower confines of how MT is currently defined, including neural machine translation. Some of my best friends do research in machine translation. :) Doing work in neural MT has lots of advantages not just because of the huge momentum in the field but because of what it connects to: studying neural MT and doing research in it creates an opportunity to encounter methods and problems that are widely applicable to NLP methods and problems more generally. Indeed, the connection between the problem of translation and practically everything else in NLP is why I decided MT would be a good sandbox to start playing in in the first place. That said, the actual mainstream practice of MT research --- the things that get valued by the funders and the program committees --- is hugely dominated by the narrow problem formulation and the evaluation paradigms associated with it. (See Church and Hestness 2019 for must-read discussion of evaluation, for MT or anything else.)

As a result, despite having built a very successful MT group, I found myself struggling to work on what I consider to be the real problems (see my position on the "goal of the enterprise", above), within an ecosystem where other very different aspects of the enterprise have to be kept central if you want to succeed. I'm not a shy person, and on many occasions I took the opportunity to try to push the people who define the agenda to support a broader view. But after taking a fairly serious shot at it, I found that I was not able to have the kind of impact I want to have, and I got really tired of spending only a fraction of my (and my students') time on the problems I really cared about. And life is way too short to spend the majority of your time on stuff you don't care about. So I got out.

Now, if you're reading this, it's possibly because you are interested in doing work on machine translation, most likely neural MT. If that's the case, I'd give the following advice, which applies to MT or any other research area you're considering:

I've gone further afield than I intended, from a bit of explanation of personal research history and choices related to MT, to a rather strident generally commentary on the state of NLP in general. So I'll stop --- but first, let me observe that what I've written here, albeit possibly a bit more extreme in its expression, is not unique to me: other experienced NLP researchers are talking this way about the state of the field and trying to figure out what should be done about it. I don't know who agrees or disagrees, nor if a majority of people would agree with this view, but there are enough of us out there that if you're in or planning to be in a PhD program involving NLP, I strongly recommend you see what your potential advisors think.