Information Diffusion on Twitter

This spring, I volunteered to teach a lecture in a new Berkeley course called “Analyzing Big Data With Twitter,” developed jointly by Twitter and Berkeley’s School of Information. I had recently done my masters thesis work on predicting the spread of topics on Twitter by looking at phenomena on a macroscopic scale — like the time series of tweet activity. But I was curious to see what was going on on a microscopic scale — how are topics spreading from person to person? I decided to dig into the data and find out.

First, I needed a way to track when a topic spreads from one person to another person. I considered using retweets, but quickly realized there are several drawbacks.  For one, retweets correspond to a particular tweet rather than a topic. There could be many tweets about the same topic, and tracking the spread of a single tweet just wouldn’t do. Even tracking retweets for a collection of tweets is inadequate, as many of the tweets about the topic may not be retweets at all. Looking at retweets wouldn’t reveal how the topic really spread through the network, because they don’t account for all tweets, and retweets are only one way for a topic to spread from one person to another.

So how does a topic spread? Often, you look at your timeline and see tweets from your followers about the topic. This may influence you to create an original tweet commenting on the same topic.

I wanted to capture these events, which are not explicitly recorded the way that retweets are.

To do that, one has to eliminate all other ways that a person can be influenced to write a tweet about the topic. For example, if the topic was in the news or on the trending topics list, it is not clear whether someone was compelled to tweet about the topic by someone they follow or by an exogenous source.

To attempt to eliminate exogenous sources, I used only “hashtag meme” topics, the type that originate and spread within twitter — things like #ThingsYouSayToYourBestFriend, rather than events like #election2012. In addition, I only tracked topics up until they became trending.

I generated some videos of information spreading for various topics. To simplify the spreading graph, I kept only the most recent parent for each node. The colors represent components of the resulting graph.

Here in this first video, the green nodes form the largest component. The shade of green represents the distance from the first ever green node. The red nodes are all other components. The pink nodes trace out the longest path through the green component — 53 hops.

In this video, I took the most recent 2 parents.

This is the first step in a work in progress. I’d like to understand more about how people are influenced to spread a topic using this approach.

For now, you can see a blog post about the lecture and the video (also above) here. I talk about some theory behind information cascades in the first half, and some more detailed experimental results and difficulties in the second half.


  1. Pingback: Video Lecture: Information Diffusion on Twitter by @snikolov | Analyzing Big Data with Twitter

  2. Pingback: blogs.ischool » Blog Archive » Video Lecture: Information Diffusion on Twitter by @snikolov

  3. Pingback: UC Berkeley Course Lectures: Analyzing Big Data With Twitter | Analyzing Big Data with Twitter

  4. Pingback: UC Berkeley Course Lectures: Analyzing Big Data With Twitter | XSData

  5. vango55


    Great job. I really like what you are doing. I’m living in Luxembourg, EU and will launch a startup active in that field. Will you be interrested to collaborate just let me know.

  6. Haritha


    Is it possible to build an accurate retweet graph. Using twitter API, if we get retweet of a retweet, it contains reference to the original tweet instead of the intermediate retweet. I presume there is absolutely no reference or connection from this final retweet to intermediate retweet. In that case, how can one possibly construct a retweet graph?

    • Haritha,

      Yes, unfortunately, you can’t get the intermediate retweets through the API as far as I know. I did this with internal data and did not use retweets at all (I construct the graph from instances of a user and her followers mentioning the same topic within a short time window). You can try various hacks like filtering tweets from the API that match the manual “RT” pattern (the old style of retweeting). There may be more clever ways. I would try asking someone who has studied with retweet graphs using the API to see how they did it.

      Best wishes,

  7. Pingback: Twitterが奏でる音楽 | MASSAGE

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: