Schedule of Topics
This is the schedule of topics for
Seminar on Corpus-based Social Science, Fall 2009.
THIS SCHEDULE IS A WORK IN PROGRESS!
In addition, some topic areas may take longer than expected, so keep
an eye on the class mailing list or e-mail me for "official"
dates.
Premise for the seminar
- September 2 Administrivia, group collaboration and discussion tools,
overview of course, narrowing down "social science" for our purposes this semester
Subjectivity and sentiment
- September 9 Subjectivity in dialogue, and sentiment analysis basics
- Tim Hawes, Computational Analysis of the Conversational Dynamics of the United States Supreme Court, UMD Master's thesis. Chapters 1, 2, 4, 6 (Chs 1,6 are very short intro/conclusion), Chapter 5 and Chapter 3 optional (in that order of priority). Encouraged: attendance at the thesis defense, Tuesday September 8th, 10:00am - 11:30 am in MMH 3416.
- Bo Pang and Lillian Lee, Opinion mining and sentiment analysis, Foundations and Trends in Information Retrieval 2(1-2), pp. 1-135, July 2008. Read up through Section 3.
- Recommended but optional: Bing Liu, Sentiment Analysis and Subjectivity, to appear in Handbook of Natural Language Processing, Second Edition, (editors: N. Indurkhya and F. J. Damerau), 2010. (Completed on April 10, 2009). Read up through Section 2 (pp. 1-16).
- September 16 Gold standards and annotation
- Theresa Wilson and Jan Wiebe, Annotating
Opinions in the World Press (SIGDIAL 2003). This is the very
early Wiebe and Wilson paper describing the original structure of the
MPQA annotation back in 2003, including how to compute interannotator
agreement in this context.
- Ruppenhofer, Somasundaran, and Wiebe, Finding
the Sources and Targets of Subjective Expressions (LREC
2008). Argues that because of issues revealed during manual
annotation, semantic role labeling along does not suffice to extract
the participants in subjective expressions.
- Chapter 7 of Theresa Wilson's 2008 doctoral dissertation, Fine-grained
subjectivity and sentiment analysis: recognizing the intensity,
polarity, and attitudes of private states. She describes some
extensions on the original MPQA that specify "attitudes", including
opinion target annotations.
- September 23 Acquisition of subjectivity lexicons and patterns
- Optional, but worth reading through quickly for
background: Vasileios Hatzivassiloglou; Kathleen R. McKeown, Predicting the
Semantic Orientation of Adjectives (ACL 1997).
- Turney, P.D. (2002), Thumbs up or thumbs down?
Semantic orientation applied to unsupervised classification of
reviews, Proceedings of the 40th Annual Meeting of the Association
for Computational Linguistics (ACL'02), Philadelphia, Pennsylvania,
417-424. (Optionally, for a more extensive discussion and related
work, see also Turney, P.D., and Littman, M.L. (2003), Measuring
praise and criticism: Inference of semantic orientation from
association, ACM Transactions on Information Systems (TOIS), 21
(4), 315-346.)
- Riloff, E., Wiebe, J., and Wilson, T. (2003) Learning
Subjective Nouns Using Extraction Pattern Bootstrapping,
Proceedings of the Seventh Conference on Natural Language Learning
(CoNLL-2003).
- Riloff, E. and Wiebe, J. (2003) Learning
Extraction Patterns for Subjective Expressions, Proceedings of the
2003 Conference on Empirical Methods in Natural Language Processing
(EMNLP-03) .
- Guang Qiu, Bing Liu, Jiajun Bu, and Chun Chen, Expanding
Domain Sentiment Lexicon through Double Propagation (IJCAI 2009).
- September 30 Sentiment labeling at the phrase level
- Theresa Wilson, Janyce Wiebe, and Paul Hoffmann, Recognizing
Contextual Polarity: An Exploration of Features for Phrase-Level
Sentiment Analysis Computational Linguistics, September 2009, Vol. 35, No. 3: 399-433.
- Apoorv Agarwal, Fadi Biadsy, and Kathleen McKeown, Contextual
phrase-level polarity analysis using lexical affect scoring and
syntactic n-grams, In EACL2009, Athens, Greece, 2009.
- Optional: Eric Breck, Yejin Choi, and Claire Cardie, Identifying
Expressions of Opinion in Context. Twentieth International Joint
Conference on Artificial Intelligence (IJCAI), 2007.
- October 7 Financial forecasting using sentiment analysis
(Guest: Nitish Ranjan, RH Smith School of Business)
- E. Fama and K. French (1992), Common
risk factors in the returns on stocks and bonds. (Look this over for background, but Nitish
says what you'll need is mainly a "passing familiarity".)
- Paul Tetlock, In a sentimental mood,
The Economist, June 1, 2006. (Quick and useful background.)
-
Seth Grimes, Event
Processing Meets Text: Reuters at Gartner , Blog posting, Sept 18, 2008. (Quick and useful background.)
- Shimon Kogan, Dimitry Levin, Bryan R. Routledge, Jacob S. Sagi,
and Noah A. Smith, Predicting
Risk from Financial Reports with Regression, In Proceedings of
the North American Association for Computational Linguistics Human
Language Technologies Conference, Boulder, CO, May/June 2009. See also the
slides from Noah's talk.
(This paper is not about sentiment analysis, per se, but it provides some nice background connecting language analysis
and finance; the talk's intro slides are really nice in this respect, as well. More generally, a nice angle on
connecting observable language to underlying real-world variables of interest, and this may also turn out to be relevant
to our discussion of "spin" later in the semester.)
- Optional: Wikipedia page on Calais. (Optionally
look over some of the references on that page.)
Identifying and tracking agendas
- October 14 Tracking online discussions
- Leskovesk et al., Meme-tracking and the dynamics of the news cycle,
Proc. 15th ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining, 2009. (See also the
MemeTracker site, which uses the ideas in this paper to visualize the news cycle.
- Prabowo, R., Thelwall, M., Hellsten I., & Scharnhorst A., (2008). Evolving debate in online communication: A graph analytical approach,
Internet Research.18(5), 520-540.
- Recommended: Attend Leskovek's talk, Thursday Oct 15, CSIC 1115, 4pm, refreshments at 3:30pm
- October 21 Lexical framing
- Required background: Golitsinski comments on agenda setting and framing
- Recommended background:
D. A. Scheufele, Framing
as a theory of media effects, Journal of Communication,
Volume 49 Issue 1, Pages 103 - 122.
- Optional background:
Robert Entman, Framing: Toward Clarification of a Fractured Paradigm,
Journal of Communication, Volume 43 Issue 4, Pages 51 - 58, 1993.
- Gentzkow and Shapiro, 2007: What
drives media slant? Evidence from U.S. daily newspapers
- Monroe, Colaresi, and Quinn, 2009:
Fightin' words: Lexical feature selection and evaluation for identifying the content of political conflict
- Also of interest: Cardie and Wilkerson, eds., Special issue on Text Annotation for Political Science Research,
Journal of Information Technology and Politics, 5:1, 2008.
- October 28 Syntactic framing
- Stephan Greene and Philip Resnik, More Than Words: Syntactic Packaging and Implicit Sentiment, NAACL 2009, Boulder, CO, May 31 - June 5, 2009.
- Greene (2007) Spin: Lexical Semantics, Transitivity, and the Identification of Implicit Sentiment,
unpublished doctoral dissertation.
- Chapter 2 up through Section 2.3 (lexical semantics background)
- Optional: Remainder of Chapter 2 (psycholinguistics study, summarized in Greene and Resnik 2009)
- Optional: Chapter 4 up through Section 4.2 (text classification experiments, summarized in Greene and Resnik 2009)
- Chapter 4, Section 4.2 (text classification extended to ConVote corpus and min-cut approach)
- Chapter 5, Section 5.2 (future work)
- Optional, related to lexical semantics background:
Optional, relating to the idea that syntactic constructions carry semantics:
Hmmm. Maybe it's time for me to do another
lexical semantics seminar...
Methods for connecting linguistic variables with social variables
- November 4 Simple frequency-based approaches to language use and personality
- Tausczik, Y., & Pennebaker, J.W. (2009, in press). The psychological meaning of words: LIWC and computerized text analysis methods.. Journal of Language and Social Psychology, in press.
- Optional (details): Pennebaker, J. W., Chung,
C. K., Ireland, M., Gonzales, A. L., & Booth, R. J. (2007). The
development and psychometric properties of LIWC2007. Austin, TX:
LIWC.net.
- Argamon, S., Koppel, M., Pennebaker, J. W., and Schler, J. 2009. Automatically profiling the author of an anonymous text., Commun. ACM 52, 2 (Feb. 2009), 119-123. DOI= http://doi.acm.org/10.1145/1461928.1461959. [Alternative PDF version]
- Shlomo Argamon, Jeff Dodick, Paul Chase. Language Use Reflects Scientific Methodology: A Corpus-Based Study of Peer-Reviewed Journal Articles. Scientometrics, December 2006.
- For fun: Personality analysis of Twitter feeds. Take someone you know well, and rate them on
dimensions of emotional style (upbeat/worried/angry/depressed), social style (plugged in/personable/arrogant-distant/spacy-valleygirl), and
thinking style (analytic/sensory/in-the-moment). Then analyze their tweets and let us know in class how well AnalyzeWords did.
- November 11 Bayesian inference and Gibbs sampling
This will be a short class, ending at 3pm so that
people can attend the Blackwell Lecture in Linguistics (details TBA).
- November 18 Latent Dirichlet Allocation for topic and sentiment modeling
- Griffiths, T., & Steyvers, M. (2004).
Finding
Scientific Topics.
Proceedings of the National Academy of Sciences, 101 (suppl. 1), 5228-5235.
- D. Blei, J. McAuliffe.
Supervised
topic models.
In Advances in Neural Information Processing Systems 21, 2007.
[digg
data]
[Code]
- Further reading: Yano, T., Cohen, W. W., and Smith, N. A. 2009.
Predicting
response to political blog posts with topic models. In Proceedings of
Human Language Technologies: the 2009 Annual Conference of the North
American Chapter of the Association For Computational Linguistics
(Boulder, Colorado, May 31 - June 05, 2009). Human Language Technology
Conference. Association for Computational Linguistics, Morristown, NJ,
477-485.
- November 25 No class -- Happy Thanksgiving!
- December 2 Improving category proportion estimates
- December 9 Tentative: Language use in persuasion and negotiation across languages and cultures
Philip Resnik, Associate Professor
Department of Linguistics and Institute for Advanced Computer Studies
Department of Linguistics
1401 Marie Mount Hall UMIACS phone: (301) 405-6760
University of Maryland Linguistics phone: (301) 405-8903
College Park, MD 20742 USA Fax: (301) 314-2644 / (301) 405-7104
http://umiacs.umd.edu/~resnik E-mail: resnik AT umd _DOT.GOES.HERE_ edu