Readings


This is the schedule for Advanced Seminar in Computational Linguistics: Computational Social Science, Fall 2015.

THIS SCHEDULE IS A WORK IN PROGRESS!
In addition, some topic areas may take longer than expected, so keep an eye on the class mailing list or e-mail me for "official" dates.

Also note that some links point to pay-for-access publishers, but the links are accessible for free from UMD IP addresses.



Sep 2.


Sep 9.

Big picture questions about computational social science: scientific and social tensions

Before delving into particular methods for computational social science, let's start with a few of the big picture questions. There are two fundamental tensions that arise repeatedly when people talk about computational social science. The first is a scientific tension connected with the idea of using analysis of large, naturally occurring datasets to do science, as contrasted with more traditional theoretical and experimental methods. The second is a social tension connected with the idea that machine learning and statistical modeling can be misleading or, worse, can further institutionalize existing societal biases in the automated infrastructure of the future.

I've organized this week's readings into two groups: readings that we can use to structure the discussion and readings that will inform the discussion. There are a lot of readings listed in the latter group, but none of them have any technical complexity (not even the emotional contagion paper) and most of them are quite short, so please at least look them over.

Readings to structure the discussion

Readings that should also inform the discussion


Sep 16.

Ideal point models and the Supreme Court

The Supreme Court is a fascinating domain of study for computational social science. From a social perspective, Supreme Court decisions literally have the power to shape the future of society. From a scientific perspective, there is a long tradition of research in political science asking fundamental questions about the role of the Supreme Court and the nature of its decision making -- as just one example, do amicus ("friend of the court") briefs actually influence justices' decisions?

With regard to computational social science, the Supreme Court is a great area of study. There are large, available sources of data that include voluminous language of various types (e.g. merits briefs, amicus briefs, majority and minority opinions, oral arguments), along with tons of metadata (vote data, age, party of the appointing president, etc.).

And from the perspective of this seminar, the Supreme Court is a fascinating area of study because in this setting the connection between language and mental state is paramount. What can language tell us about the underlying opinion or ideology of justices or the people arguing the case? To what extent do we see linguistic evidence of influence or persuasion? What approach do opposing sides take to framing the same issue? How might power relationships be reflected in language use?

Appetizers

Main course

Some additional notes on ideal point models

For those who are helped a lot by understanding the intuitive basis for the model, the first section of Sim et al. (2015) has a nice summary of previous ideas (see next week's readings). See also Section 1 of Bafumi et al. http://www.stat.columbia.edu/~gelman/research/published/171.pdf, which has a nice explanation of what this kind of model is doing.

For those who want to see the mathematical discussion in a little more detail, Clinton et al. http://politics.as.nyu.edu/docs/IO/4756/jackman_nemp.pdf (Section 3) fleshes out the discussion in Martin and Quinn Section 3.1 where they formalize the justice's decision process. See also Clinton et al. (2004), http://www.cs.princeton.edu/courses/archive/fall09/cos597A/papers/ClintonJackmanRivers2004.pdf.

Ideal point models in political science are related to item response theory (IRT), which is discussed in the educational assessment literature: probability of a yes/no vote as related to ideological point, in politics, is analogous to the probability of giving a correct answer on a test as related to your ability. There is a nice discussion of IRT at Partchev (2004), https://www.metheval.uni-jena.de/irt/VisualIRT.pdf; see in particular the 2PL model (Section 5). Slides 22-23 at http://jonathantemplin.com/files/irt/irt11icpsr/irt11icpsr_lecture14.pdf derive the form of the model we're looking at from the 2PL IRT model.

Data resources

We have access to pretty much anything one could ask for with regard to the Supreme Court: judge-case-level metadata, case-level metadata, processed opinion content, merits briefs, amicus briefs, transcripts of oral arguments. Here are a few useful links.

More generally, there are lots of really interesting sources out there for code and data. One nice compendium I've found is from Bicoastal Datafest: analyzing money's influence on politics, which includes a nice list of well defined project ideas as well as pointers to projects that were done, along with great lists of data and tools.


Sep 23.

More on ideal point models and the Supreme Court

This week we will continue with last week's topic, the Supreme Court.

Sep 30.

Agenda setting and framing

Two central concepts in the study of communications, particularly political communication, are agenda setting and framing. Roughly speaking, agenda setting is about what content gets brought to public attention, traditionally with a focus on political elites and the media as communicators; the classic quote from Cohen (1963) reads that the press the press "may not be successful much of the time in telling people what to think, but it is stunningly successful in telling its readers what to think about. The world will look different to different people depending on the map that is drawn for them by writers, editors, and publishers of the paper they read."

Framing, on the other hand, is not about what gets talked about but how; Entman (1993) writes that framing "framing essentially involves selection and salience. To frame is to select some aspects of a perceived reality and make them more salient in a communicating text, in such a way as to promote a particular problem definition, causal interpretation, moral evaluation, and/or treatment recommendation for the item described."

There is a truly extensive literature on both of these topics -- enough for an entire course in itself. This week we will get familiar with one widely cited discussion, Sheufele's article considering these concepts from the perspective of cognitive effects. Then we'll look at two computational papers related to these concepts. Related to agenda setting, we discuss Leskovec et al., which was a very innovative development in understanding how "memes" get on the radar in news and blogs. Related to framing, we cover Nguyen et al., who develop a model extending the idea of using topic models to define ideal point dimensions (cf. Lauderdale and Clark, last week) to a hierarchical topic model inspired by the treatment of framing as second level agenda setting.

Readings

Also of interest:


October 7.

Agendas and influence in debates and other conversations

Also of interest


October 14.

Ideological bias

Readings


Oct 21.

Bridge: All Politics is Local Psychological


Oct 28.

Personality

One of the better studied applications of language analysis to psychology is the assessment of personality.

Also of interest
Nov 4.

Social media analysis: depression

Mental health problems are among the most pressing challenges we face. The numbers in the U.S. alone are staggering: to cite just a few, between 1996 and 2011, annual expenditures on mental disorders rose from $35.2B to $113B, some 25 million American adults will have an episode of major depression this year, suicide is the third leading cause of death for people between 10 and 24 years old, and 89.3 million Americans live in federally-designated Mental Health Professional Shortage Areas.

The papers this week look at the ability to identify depression based on people's behavior in social media. Issues to watch for... How is ground truth being defined? What signals are being included as potential features? How does the experimental setting relate to what one would actually need to accomplish for this to be useful in the real world?

Also of interest:
Nov 11.

Clinical assessments of patient language

Last week we focused on social media analysis and what it can tell us about people's mental state, particularly with respect to depression. This week we shift to the analysis of language collected in clinical settings, such as therapist-patient sessions, cognitive assessment tasks, and patient interviews.

Clinician-patient interactions

Psychosis
Nov 18.

More on clinical assessments of patient language

Given interests from last week we'll talk more about clinical assessments.

More on psychosis

Cognitive assessments


December 2.

Interpersonal relationships

Also of interest; in particular look over the first two if you have time.