This is the schedule for Advanced Seminar in Computational Linguistics: Computational Social Science, Fall 2015.
THIS SCHEDULE IS A WORK IN PROGRESS!
In addition, some topic areas may take longer than expected, so keep an eye on the class mailing list or e-mail me for "official" dates.
Also note that some links point to pay-for-access publishers, but the links are accessible for free from UMD IP addresses.
I've organized this week's readings into two groups: readings that we can use to structure the discussion and readings that will inform the discussion. There are a lot of readings listed in the latter group, but none of them have any technical complexity (not even the emotional contagion paper) and most of them are quite short, so please at least look them over.
Readings to structure the discussion
Readings that should also inform the discussion
With regard to computational social science, the Supreme Court is a great area of study. There are large, available sources of data that include voluminous language of various types (e.g. merits briefs, amicus briefs, majority and minority opinions, oral arguments), along with tons of metadata (vote data, age, party of the appointing president, etc.).
And from the perspective of this seminar, the Supreme Court is a fascinating area of study because in this setting the connection between language and mental state is paramount. What can language tell us about the underlying opinion or ideology of justices or the people arguing the case? To what extent do we see linguistic evidence of influence or persuasion? What approach do opposing sides take to framing the same issue? How might power relationships be reflected in language use?
Some additional notes on ideal point models
For those who are helped a lot by understanding the intuitive basis for the model, the first section of Sim et al. (2015) has a nice summary of previous ideas (see next week's readings). See also Section 1 of Bafumi et al. http://www.stat.columbia.edu/~gelman/research/published/171.pdf, which has a nice explanation of what this kind of model is doing.
For those who want to see the mathematical discussion in a little more detail, Clinton et al. http://politics.as.nyu.edu/docs/IO/4756/jackman_nemp.pdf (Section 3) fleshes out the discussion in Martin and Quinn Section 3.1 where they formalize the justice's decision process. See also Clinton et al. (2004), http://www.cs.princeton.edu/courses/archive/fall09/cos597A/papers/ClintonJackmanRivers2004.pdf.
Ideal point models in political science are related to item response theory (IRT), which is discussed in the educational assessment literature: probability of a yes/no vote as related to ideological point, in politics, is analogous to the probability of giving a correct answer on a test as related to your ability. There is a nice discussion of IRT at Partchev (2004), https://www.metheval.uni-jena.de/irt/VisualIRT.pdf; see in particular the 2PL model (Section 5). Slides 22-23 at http://jonathantemplin.com/files/irt/irt11icpsr/irt11icpsr_lecture14.pdf derive the form of the model we're looking at from the 2PL IRT model.
We have access to pretty much anything one could ask for with regard to the Supreme Court: judge-case-level metadata, case-level metadata, processed opinion content, merits briefs, amicus briefs, transcripts of oral arguments. Here are a few useful links.
More generally, there are lots of really interesting sources out there for code and data. One nice compendium I've found is from Bicoastal Datafest: analyzing money's influence on politics, which includes a nice list of well defined project ideas as well as pointers to projects that were done, along with great lists of data and tools.
Framing, on the other hand, is not about what gets talked about but how; Entman (1993) writes that framing "framing essentially involves selection and salience. To frame is to select some aspects of a perceived reality and make them more salient in a communicating text, in such a way as to promote a particular problem definition, causal interpretation, moral evaluation, and/or treatment recommendation for the item described."
There is a truly extensive literature on both of these topics -- enough for an entire course in itself. This week we will get familiar with one widely cited discussion, Sheufele's article considering these concepts from the perspective of cognitive effects. Then we'll look at two computational papers related to these concepts. Related to agenda setting, we discuss Leskovec et al., which was a very innovative development in understanding how "memes" get on the radar in news and blogs. Related to framing, we cover Nguyen et al., who develop a model extending the idea of using topic models to define ideal point dimensions (cf. Lauderdale and Clark, last week) to a hierarchical topic model inspired by the treatment of framing as second level agenda setting.
Also of interest:
Also of interest
The papers this week look at the ability to identify depression based on people's behavior in social media. Issues to watch for... How is ground truth being defined? What signals are being included as potential features? How does the experimental setting relate to what one would actually need to accomplish for this to be useful in the real world?
Also of interest:
Also of interest:
Editorial Summary: Diagnostics: Automated speech analysis predicts later psychosis. A computer program that analyses natural speech could help predict the onset of psychosis in young people at risk. People with schizophrenia have subtle disorganization in speech, even before they first develop psychosis. In a collaboration between IBM, Columbia University Medical Center, and researchers in South America, an automated program that simulates how the human brain understands language was used to analyze interview transcripts from 34 ‘at risk’ youths. Decrease in the flow of meaning from one spoken phrase to the next, and grammatical markers of speech complexity, identified the five individuals who later developed psychosis. The computer program outperformed clinical assessments in predicting psychosis. While numbers are small in this proof-of-principle study, the authors suggest automated analysis could lay the foundation for a simple clinical test of emerging schizophrenia, which would inform early intervention.
More on psychosis
Also of interest:
Also of interest; in particular look over the first two if you have time.