MPQA Releases - Corpus and Opinion Recogntion System
MPQA Opinion Corpus
annotated for opinions and sentiments
The MPQA Opinion Corpus contains news articles from a wide
variety of news sources manually annotated for opinions and other
private states (i.e., beliefs, emotions, sentiments, speculations, etc.).
The corpus was initially collected and annotated as part of the summer 2002
NRRC Workshop on Multi-Perspective Question Answering (MPQA)
sponsored by ARDA. To learn more about the subjectivity and sentiment research that produced MPQA, please visit Dr. Janyce Wiebe's
page of related publications and the CERATOPS site.
To download the MPQA Opinion Corpus click here.
OpinionFinder
OpinionFinder is a system that processes documents and automatically
identifies subjective sentences as well as various aspects of subjectivity within
sentences, including agents who are sources of opinion, direct subjective
expressions and speech events, and sentiment expressions. OpinionFinder
was developed by researchers at the University of Pittsburgh, Cornell
University, and the University of Utah.
In addition to OpinionFinder, we are also releasing the automatic annotations
produced by running OpinionFinder on a subset of the Penn Treebank.
To go to the OpinionFinder download page click here.
Please note that OpinionFinder only runs on Linux.
Subjectivity Lexicon
The list of subjectivity clues (the subjectivity lexicon) that is part
of OpinionFinder and was supported in part by NSF Grants
IIS-0208798 and IIS-0208985, is also available for separate download. These clues were compiled
from several sources (see the enclosed README) and were used in:
Theresa Wilson, Janyce Wiebe, and Paul Hoffmann (2005).
Recognizing Contextual
Polarity in Phrase-Level Sentiment Analysis.
Proc. of HLT-EMNLP-2005.
Manual Subjectivity Sense Annotations
The gold standard manual subjectivity sense annotations used in Wiebe and Mihalcea ACL 2006 and the larger set of annotations used in Gyamfi, Wiebe, Mihalcea, Akkaya NAACL 2009 are both available for download. To download them click here and choose the desired dataset after filling the form. Both annotation efforts follow the annotation schema described in Wiebe and Mihalcea 2006 and rely on WordNet 2.0 as the sense inventory. Further information on the data can be found in the README of the archive you download.
Janyce Wiebe and Rada Mihalcea (2006). Word Sense and Subjectivity. Joint conference of the International Committee on Computational Linguistics and the Association for Computational Linguistics. (COLING-ACL 2006).
Yaw Gyamfi, Janyce Wiebe, Rada Mihalcea and Cem Akkaya (2009). Integrating Knowledge for Subjectivity Sense Labeling. Joint Conference of the North American Chapter of the Association for Computational Linguistics and the Human Language Technologies Conference (NAACL-HLT 2009).