Advanced Seminar in Computational Linguistics: Computational Social Science
Back in 2009, Mark Liberman argued on Language Log that "corpus based social science" was poised to go mainstream, despite a general historical tendency toward "linguistic anemia" in the social sciences. As computational linguists, we have known for the past several decades that language use in corpora can serve as a useful proxy for world knowledge. With so much of people's lives going online, it was time to start imagining technology that would exploit large-scale language use as evidence for people's individual properties, behaviors, and social interactions, as well.
Since then, there's no doubt that Mark was right. A September 9, 2013 Time magazine article entitled "What Twitter Says to Linguists" mentions that "upwards of 150 Twitter-based studies have come out in 2013 so far", and has a nice quote from Jacob Eisenstein, whose work is featured in the article: "Language is really a window into people's sense of personal identity".
Now, "social science" is an unmanageably huge topic. The Wikipedia page on Social Science includes disciplines ranging from archaeology to social work. In this seminar, we'll narrow the field somewhat. I'm particularly interested in the idea of perspective, in the Oxford Dictionary sense of "a particular attitude toward or way of regarding something; a point of view". This seems to me to be a focal issue in the social sciences: our perspectives help to define who we are, and they influence the decisions we make and the nature of the other individuals and groups with which we consider ourselves associated. Within computational linguistics, sentiment analysis is a familiar subset of perspective emphasizing the way that language communicates positive, neutral, and negative polarity, but there are many other ways of looking at perspective that are to my mind equally (and more!) interesting.
With that in mind I'm going to organize the content primarily, although perhaps not exclusively, around two main themes.
This seminar will mainly involve readings and in-class discussion, helped along by participation in discussions on Piazza. The class will be graded on participation (30%), which includes leading class discussions, as well as a term paper/project (70%). I hope to encourage hands-on projects that involve real problems, aiming for papers suitable for submission to appropriate conferences.
Philip Resnik, Professor
Department of Linguistics and Institute for Advanced Computer Studies
Department of Linguistics
1401 Marie Mount Hall UMIACS phone: (301) 405-6760
University of Maryland Linguistics phone: (301) 405-8903
College Park, MD 20742 USA Fax: (301) 314-2644 / (301) 405-7104
http://umiacs.umd.edu/~resnik E-mail: resnik AT umd _DOT.GOES.HERE_ edu