We are excited to finally be wrapping up an alpha version of VoxPop for the Parsons thesis exhibition. Come by the Sheila C. Johnson Design Center, 66 5th Ave (right at 13th St.) between 6 and 8 PM on Tuesday, June 1st, to see our installation and enjoy the show’s opening reception. The gallery will also be open to the public through saturday from noon to 6 PM every day. More information can be found on the exhibition website.
While we aren’t quite ready to open up VoxPop publicly on the web (still have a bit of QA and scaling to work out), here is a sneak peak of what we have cooking, and be sure to swing by Parsons School of Design over the next few days to play with it a bit.
We have a first pass at our thesis show installation proposal. VoxPop would be displayed on a Samsung 47″ HD Screen through a Mac Mini housed within a pedestal supporting a Wacom Bamboo Touch Tablet, allowing viewers to interact with the web application.
The New York Times online consistently delivers interesting data visualizations to help enrich the stories surrounding popular news topics. The New York Times Innovation Portfolio provides a beautiful overview of all of these interactive explorations, organized by topic with project overviews, documention, and links to the actual interactive pieces.
Since VoxPop is working with New York Times data, this collection of existing data visualizations is a treasure trove of strong precedents, several of which relate very closely to our project. Here are a handful that live within the realm of reader sentiment.
Health Care Debate is a conversation platform that allows users to discuss various issues within the health care debate. The most interesting aspect of this tool is how the relevance of specific sub-topics within the debate can be instantly comprehended at first glance, with the surface of the tool depicting multiple “rooms” that are scaled relative to the number comments relating to that subtopic.
This interactive video of Obama’s speech to the Muslim world allows users to provide comments along the timeline of the speech, allowing a global discussion to unfold in the context of the time-based content that is seeding the discussion.
Election Word Train asked New York times readers to share one word that describes their current state of mind on the day of the 2008 presidential election. Much like a tag cloud, words are scaled relative to the number of people sharing the sentiment, and can be filtered to show words shared by Obama or McCain supporters. By leveraging scale and letting these words ’speak for themselves’ does effectively provide a general glimpse of reader sentiment, even if the forum is is somewhat contrived, specifically with the goal reducing group sentiment into a few dozen words, possibly hindering truly organic sentiment visualization.
Inaugural Words ranks the frequency of words used by presidents in Inaugural Addresses, showing what words each president used the most. While is not really reflecting reader sentiment. it does show an interesting break down of word frequency across time and political position.
The Twitter Bowl interactive visualization maps twitter chatter over the course of the 2009 Super Bowl, according to key topic mentions. This hits an interesting cross section of communicating time, space, and group sentiment, even if it is somewhat cryptic in what is actually being communicated. There is something very satisfying about seeing topics grow and shrink geographically over time, although it does not reveal what specifically about “steelers” or “ads” or “springsteen” people are sharing.
These projects all have several aspects that worth analyzing and building upon. As we begin to re-think how people engage with the news, its exciting to see major players like the New York Times continuing to push the envelope, and continue to keep their data open so that others can do the same.
The last design iteration I wrote about a couple weeks ago started to take a departure from earlier iterations by exploring the idea of representing the personality of every comment on the New York Times website (that relates to any given topic) as it’s own entity, and visually describing it’s sentiment or personality.
After reflecting back on our original reasons for wanting to visualize online discussions, our thesis question really centers around how can the ‘Vox Populi‘ still be heard as reader participation in the journalistic process scales to hundreds of thousands of comments spread across hundreds of articles and blog posts for even just one news source.
So this resulted in a design prototype that involved rendering comments for a given topic as balls swarming around the article that seeded the conversation, representing sentiment with color, opacity, and speed of movement, describing each comment’s polarity (how positive or negative), strength (strong or weak) and activity (active or passive) respectively.
While this iteration was both readable and interesting to look at, it suffered in terms of scalability. We could realistically only look at a couple conversations at a time for any given topic. So the next phase of the design process involved trying pull some of the more successful aspects of this iteration into a more real estate friendly composition. The logical progression of this involved breaking conversation into a linear organization (all of the following mock ups are not visualizing real data, but rather serving as design explorations).
Of course horizontal flows of information are rarely web-friendly, despite it being a logical way to organize content chronologically. So this quickly evolved into a vertical orientation, and opened the door for exploring the concept of possibly showing when commenters reference each other within a conversation.
While exploring existing sentiment analysis processes, we stumbled across what looks like a fully integrate open source solution to several issues identified in our recent round of research.
OpinionFinder appears to be hosted and primarily developed at the University of Pittsburgh with contributions from Cornell University and University of Utah. While the OpinionFinder system was only mentioned off hand in Bo Pang’s article Opinion Mining and Sentiment Analysis, it appears to include some of the best solutions available for a lot of the common challenges that accompany effective sentiment analysis.
OpinionFinder, which was initially released in 2006, employs a multi-stage NLP process. As stated in the project’s extended abstract,
“OpinionFinder aims to identify subjective sentences and to mark various aspects of subjectivity in these sentences, including the source (holder) of the subjectivity and words that are included in phrases expressing positive or negative sentiments.”
Working in “batch” mode as more of a back-end pipe, OpinionFinder works as follows:
Document Processing
Taking any incoming text source, HTML or XML meta info is removed, and sentences are split and POS tagged using OpenNLP. Next, stemming is accomplished using Steven Abney’s SCOL v1K stemmer program. SUNDANCE (Sentence UNDerstanding And Concept Extraction), a partial parser from the NLP laboratory at the University of Utah, is used by Autoslog-TS to identify extraction patterns needed by the sentence classifiers and the SourceFinder (which identifies the source of subjective content, distinguishing author statements from related or quoted statements). A final parse in batch mode establishes constituency parse trees which are converted to dependency parse trees for Named Entity and subject detection.
Subjectivity and Sentiment Analysis
At this point a Naive Bayes classifier identifies subjective sentences. The specs seem to indicate that the classifier is trained against subjective and objective sentences generated by two additional “rule-based” (unsupervised?) classifiers drawing from “a large corpus.” This point in the process will require some exploration and validation.
Next a direct subjective expression and speech event classifier, built by Eric Breck, tags the direct subjective expressions and speech events found within the document using WordNet.
The final step applies actual sentiment analysis to sentences that have been identified as subjective. This is accomplished with two classifiers that were developed using the BoosTexter machine learning program and trained on the MPQA Corpus.
Evaluation
While we still need to rigorously explore the source code, this system appears to be a gold mine of solutions to both previously unresolved and newly discovered issues in our sentiment analysis process. Named Entity detection along with dependency parse trees will help us filter content to only include sentiment regarding the actual topic being explored (rather than visualizing all subjective content in a comment) as well as helping to reveal popular related topics that exist within any given topic of discussion.
Subjectivity detection and Speech Event Classification are challenges that are acknowledged in a lot of research on the topic of sentiment analysis, but comprehensive solutions have been much more difficult to come by. This system seems to combine a few processes towards those goals (including leveraging WordNet in a new way), and again could really help us filter down our corpus to relevant statements of sentiment for a given topic.
Finally the actual positive/negative sentiment analysis that is applied to subjective sentences is different than any other process I have read about (most including WordNet and trained classifiers, or our original ad hoc method of matching against the General Inquirer Dictionary). We might want to experiment a bit with this phase to see how more or less effective different methods are.
One process that is surprisingly absent from the OpinionFinder system is any sort of negation detection. We may want to explore possibly integrating the algorithm Bruno Ohana experimented with in his dissertation on sentiment analysis, or investigate other solutions.
It also maybe be interesting to see how things change if we begin to stack some of the process used by OpinionFinder with systems that we already have in place, such as our GI Osgood Emotive Assignments.
You can download OpinionFinder for free from the project’s website under an open academic license, or download a PDF of the extended abstract/description of the project here:
One issue of accurate sentiment analysis  identified in a recent round of research is the problem of negation detection. This is the process by which a negating word (such as ‘not’) inverts the evaluative value of an affective word ( for example, “not good” is similar to saying “bad”). This can be resolved in natural language processing by identifying negating words, and then inverting the value of any positive or negative word within n-words of the negating word, where n is the window of potential negation.
In Bruno Ohan’s 2009 dissertation “Opinion Mining with with the SentWordNet Lexical Resource” (Dublin Institute of Technology), a python algorithm is presented to perform this task. While Ohan’s tests with this negation detection algorithm only yielded accuracy improvements of about 0.5%, this might be a good start point for further exploration.
## populates array of negated terms based on document terms# negation[i] indicates if term in doc[i] is negated#def getNegationArray(doc, windowsize):
PSEUDO = ('no increase', 'no wonder', 'no change' , 'not cause' ,
'not only' , 'not necessarily')
PRENEGATION = ('not' , 'no' , 'n\'t' ,'cannot', 'declined' ,
'denied' , 'denies' , 'free of' , 'fails to' , 'no evidence' ,
'no new' , 'no sign' , 'no suspicious' . 'no suggestion' ,
'rather than', 'with no' , 'unremarkable', 'without' ,
'rules out' , 'ruled out', 'rule out')
POSNEGATION = ('unlikely', 'free', 'ruled out')
ENDOFWINDOW = ('.', ':', ',', 'but' , 'however' , 'nevertheless' ,
'yet' , 'though' , 'although' , 'still' , 'aside from' , 'except' ,
'apart from')# Initialise array
vNEG = [0for t inrange(len(doc))]# Initialise window counters
winstart = 0
winend = min( windowsize, len(doc) - 1)
docsize = len(doc)
i = 0
found_pseudo = 0
found_neg_fwd = 0
found_neg_bck = 0
inwindow = 0for i inrange(docsize):
## build 1-ter and 2-term strings#
unigram = doc[i].split('/')[0]if i <(docsize - 1):
bigram = unigram + ' ' + doc[i+1].split('/')[0]else:
bigram = unigram
## Search for pseudo negations#for negterm in PSEUDO:
if bigram == negterm:
found_pseudo=1##print 'found pseudo!', bigram, iif(found_pseudo == 0):
## Look for pre negations#for negterm in PRENEGATION:
if unigram == negterm or bigram == negterm:
found_neg_fwd = 1for negterm in POSNEGATION:
if unigram == negterm or bigram == negterm:
found_neg_bck = 1## If found fwd/backw negation, then negate window#if(found_neg_fwd == 1):
##print 'found forwards!', unigram, bigram, i## negate terms forward up to window#if inwindow < windowsize:
vNEG[i] = 1
inwindow+=1else:
# out of window space
found_neg_fwd = 0
inwindow = 0## backward negation#if(found_neg_bck == 1):
##print 'found backwards!', unigram, bigram, i## negate back until window start#for counter inrange(max(winstart, i-windowsize), i):
vNEG[counter] = 1## done with backwards negation#
found_neg_bck = 0## now move window#for negterm in ENDOFWINDOW:
if unigram == negterm or bigram == negterm:
## found end of negation, must reset windows###print 'found negterm!', unigram, bigram, i
inwindow = 0
found_neg_fwd = 0
winstart = i
winend = min( windowsize + i, len(doc) - 1)return vNEG
I have come across some fantastic Semantic Analysis research over the past few days, and was able to tap into several research papers and dissertations exploring computational Sentiment Analysis or Opinion Mining (OM). Two that provided significant insight were “Opinion Mining and Sentiment Analysis” (Pang et al, 2008) and “Opinion Mining with the SentWordNet Lexical Resource” (Ohana, 2009).
Recent progress in Opinion Mining techniques within natural language processing tasks identify a handful of challenges and potential solutions for accurate sentiment analysis of text based content.
Subjectivity
If our goal is to extract the sentiment, opinions or emotions of users, then we should really only be looking at subjective statements within a user’s comment. This will prevent positively or negatively charged words that are present in objective statements to effect the comment’s overall sentiment score.  Subjectivity could be assed through a trained classifier algorithm like Naive Bayes or Max Entropy.
On Topic
A concern for topic relevance is an issue that we were already aware of, and were searching (with much difficulty) for solutions with dependency grammars. This new round of research seems to dismiss that approach as unrealistically difficult (I’m thinking that could be a project on its own). Unfortunately no good solution strategies were explored for this issue.
Polarity
This is our root goal of applying a negative or positive sentiment score at various text-unit levels, such as word, sentence, or comment. While VoxPop has thus far been using the General Inquirer Dictionary evaluative definitions… It appears a few recent projects have been utilizing the WordNet  (which we explored earlier in our research) and news SentiWordNet lexicons for evaluative sentiment assignments.
Negation Detection
An issue that was just now revealed to us is the problem of Negation Detection. Consider the following two sentences:
Obama’s policies are good.
Obama’s policies are not good.
A normal polarity tagger would give these two sentences the same sentiment score, both of them containing containing 1 positive word (good). Of course our second sentence expresses the opposite of positive sentiment, with the  adverb ‘not’ inverting the value of “good.” A negation detection process aims to identify these negating word, and then invert the value of any positive or negative words that appear wither n-words before or after the negating term.
Here are PDFs of two of the more informative articles:
Opinion Mining and Sentiment Analysis
Bo Pang, Lillian Lee
After evaluating some of the issues we observed in the first design iteration for the VoxPop visualization model, we were able to establish some more criteria as the design evolved.
One idea that I explored early on was the concept of these emotive forces “pulling” the conversation in various directions. While the metaphor seemed interesting, data quantity vs. real estate would be even more of an issue than with the first design iteration. Another big take away from the first prototype was the non-representational and homogenizing side effect of grouping words at a full conversation level, as one abnormally “colorful” comment could dramatically swing the visual representation of the entire conversation.
Additionally, breaking words into these six groups unnecessarily abstracts the way in which these semantic classifications are actually describing attitudes. Going back to Charles Osgood’s Semantic Differential theory, his studies revealed that the Evaluative scale (good to bad) is the primary axis by which study participants classified affective meaning, followed by Potency and Activity as the two other universal characteristics of affective classification. rather than thinking of these three axises as separate scales, when trying visualize emotional qualities of these comments it makes more sense to have these metrics somehow layering on top of each other so that in conjunction, they draw the full “personality” of each comment.
So the idea of representing each comment as its own entity developed. In order for these three emotive characteristics to be able to layer on top of each other, representing each of the six poles as a different color (as done in design iteration 1) would not work.
In this next design iteration, The Evaluative scale is the only metric represented across a color spectrum, ranging from blue (negative) to yellow (positive) where the number of positive words minus the number negative words dictates the comment’s place along the color spectrum. Comments with mostly positive words would be closer to pure yellow, and comments with mostly negative words would be closer to pure blue, with more neutral and balanced comments being various shades of green.
The Potency scale (strong to weak) could then be represented with opacity, where the number of “strong” words minus the number of “weak” words would dictate where the comment lives on the opacity spectrum between fully opaque and almost completely transparent.
Our last axis is Activity (active to passive). While the color-inspired evaluative scale requires a somewhat subjective color scheme choice which of course will involve cultural and personal influences, opacity actually serves as a conveniently clear metaphor for our ’strong to weak’ continuum. The ‘Activity’ axis presented a similarly convenient direct metaphor that could be exploited. The more active a comment is, it could actually be moving faster and farther, and likewise the more passive a comment is, the more stagnant it would be.
The final characteristics of this design iteration was that the scale of each comment could be determined by the number of affective words it has. This could arguably be thought of as how “loud” the comment was (although this is open to some debate, a really long comment with lots of weak and passive words isn’t really “loud”… more “verbose”).
So we created a new prototype exploring how this design iteration might look with real data. We started with just three articles that the comments would “swarm” around. To help keep the composition balanced and as easy to read as possible, we had comments on the ‘positive’ side of the evaluative scale swarm on top of the article’s title, and comments on the ‘negative’ side of the evaluative scale swarm bellow the comment title. (The blue and yellow evaluative colors were accidentally reversed in this version of the prototype, with blue at the ‘positive’ end and yellow at the ‘negative’ end).
Evaluation
This design iteration was beginning to show a lot of promise. With a quick glace a viewer might more successfully read the tone of the entire conversion surrounding the article, while still preserving the ‘personality’ of each individual comment. This was also starting to become more interesting to look at. Active comments would quickly nestle their way in towards the article title while passive comments float aimlessly at the perimeter.
A major flaw with this design iteration was that we were moving in wrong direction with maximizing out screen real estate, with only three articles being viewable at a time. While the readability of each article’s conversion had improved, we were even further from being able to see any sort of trends or evolution in discourse surrounding a topic over time.