Guest Post by Jeff Healey, Director of Product Marketing at Vertica Systems
The NCAA 2013 Men’s Basketball March Madness Tournament officially tiped off on Thursday, March 21st. For those of you unfamiliar with the tournament, 64 teams from colleges and universities across the United States compete for the championship, awarded to just one winner in early April. Buzzer-beating upsets are as common as fan face paint and schools from parts unknown, making it challenging to choose the winner in your office tournament bracket.
To give you a sense of the tournament’s popularity and appeal, according to USA Today “Last year’s championship game alone had about 20 million TV viewers. The overall tournament had 52 million visits across March Madness on Demand’s broadband and mobile platforms.”
So, what is the buzz on this year’s tournament on Twitter, and can social sentiment foreshadow ultimate success? A small team of us here — representing Autonomy, HP Vertica, and HP Information Management & Analytics (IM&A) — set out to answer that very question by building a March Madness Sentiment Tracker Demo to track the “sentiment of the crowd.”
The Technology Behind the March Madness Sentiment Tracker
Using HP Labs’ Academy Awards Meter demo as our guide, we created a framework in roughly a week based on Autonomy, HP Vertica, and Tibco Spotfire.
We unveiled the demonstration at the Sloan MIT Sports Analytics Conference. See Chris Selland’s blog post from that event and his participation on the Big Data in Sports panel.
Since the MIT Sports Analytics Conference was held weeks before the tourney had begun, we first collected roughly half a million Tweets using Autonomy’s data aggregator from February 20th to March 1st. The Tweets included anything related to the Top 25 ranked teams at the time as well as the top scorers. Our colleagues at Autonomy also used Autonomy IDOL to structure and sentiment to the data. For example, a Tweet like “I am excited to watch my Jayhawks win #MarchMadness!” would carry a positive sentiment. However, a Tweet like “I hate #MarchMadness – it interrupts my favorite TV shows!” would carry a negative sentiment.
Our very own Will Cairns, who presented on the main stage of the MIT Sloan Sports Analytics Conference, loaded the data into the HP Vertica Analytics Platform, ran some analytical queries and provided an output file for HP IM&A to create the visualization front-end with Tibco SpotFire. That is where the insight (and conversation with the data) began to happen.
Visualizing the Sentiment and Lessons Learned
HP IM&A created impressive visualizations that helped us (and attendees) to explore:
- Volume of tweets by team
- Volume of tweets by player
- Positive, negative, and neutral sentiment groupings
- Volume of tweets by U.S. city and by worldwide country
- Volume of tweets by language (English, French, Spanish, etc.)
They say that a picture is worth a 1,000 words. Well, the visualizations provided for great conversation – some results were not surprising such as NCAA perennial teams steeped with rich history, such as Kansas and Duke, leading the total volume of tweets. Some players ranked higher than others in volume of tweets, leading attendees to observe, “ Well, Trey Burke had a monster game the other night, so that makes sense.”
But why did Chicago rank as the U.S. city with the highest number of tweets, despite having no college or university from Illinois team ranked in the top 25 at the time? Well, the Big 12 is one of the more competitive conferences in the country this season, and Chicago area schools (such as the University of Illinois) play Wisconsin, Indiana, Michigan, and Michigan State. It’s also one of the top five major media hubs in the country.
Spirited debates and conversations aside, most importantly, this exercise clearly demonstrated the power of sentiment for a range of use cases in nearly every industry with a major product, brand, or service. In the telecommunications industry, network providers are actively tracking social media channels to measure customer satisfaction. If there is an issue with the service, say in a certain region of the country, you better believe that customer service will soon receive calls to that very point. Using sentiment analysis to quickly address issues by, say, adding more network bandwidth and improving service can help reduce service costs, improve customer satisfaction, and minimize churn.
But can sentiment foreshadow success? I guess you will have to tune into the games to find out, while tracking your favorite social media channel. Better yet, why not use HP Vertica’s tight integration with R to develop a statistical model based on data available from ESPN and the likes on hard basketball statistics, such as field goal percentage, points allowed, head-to-head scoring, and more? You could correlate that statistical data with sentiment data trending from Twitter.
Hmm…that sounds like a perfect complement to our March Madness Sentiment Tracker demo. Stay tuned for more details or share your thoughts on how you could marry sentiment data with statistical data to ultimately predict this year’s winner.