Workshop at Sunbelt: May 21, 2013
Since Twitter’s creation in 2006, it has become one of the most popular microblogging platforms in the world. By virtue of its popularity, the relative structural simplicity of Twitter posts, and a tendency towards relaxed privacy settings, Twitter has also become a popular data source for research on a range of topics in sociology, psychology, political science, and anthropology. Nonetheless, despite its widespread use in the research community, there are many pitfalls when working with Twitter data.
In this workshop, we will lead participants through the entire Twitter-based research pipeline: from obtaining Twitter data all the way through performing some of the sophisticated analyses that have been featured in recent high-profile publications. Participants will leave with able to independently collect Twitter data and analyze a wide array of social processes present in the data, from sentiment to social network structure.
In the first section of the tutorial, we will cover the nuts and bolts of obtaining and working with a Twitter dataset including: using the Twitter API, the firehose, and rate limits; strategies for storing and filtering Twitter data; and how to publish your dataset for other researchers to use.
In the second section, we will delve into techniques for analyzing Twitter content. We will focus on several widely used techniques including the measurement of mood and sentiment; entity extraction; and mention, retweet, and follower-based social network extraction. Some time will also be given to more experimental techniques including latent attribute inference and topic classification. Throughout time will be given to explain the theoretical basis for the methods, but the primary focus will be on showing participants how to do the various analyses.
We assume that participants will have little to no prior experience with mining Twitter or other social network datasets. As the workshop will be interactive, participants are encouraged to bring a laptop. Code examples and exercises will be given in Python, thus participants should have some familiarity with the language. However, all concepts and techniques covered will be language-independent, so any individual with some background in scripting or programming will benefit from the workshop.
In order to be able to follow along with the exercises, please setup your machine with Python 2.7.