The Indian Twitter Election
by Debarghya Das

From the 18th to the 24th of April, we decided to stream the tweets of Indians around the world in attempt to capture the voice of the 2014 Indian General Elections. In a week, we captured more than one and a half million on Twitter. Here's what we found.

Aam Aadmi Party
410876 tweets

Bharatiya Janata Party
829655 tweets

Indian National Congress
319340 tweets


Speed Gauge


For each of the 3 most talked about parties, we used Twitter's Streaming API to stream tweets in realtime, filtered with party-specific words.

The Aam Aadmi Party showed a steady influx of daily tweets. The tweet frequency curve fits a sum of sines distribution of five terms. The graph shows us when India sleeps and the 2 times a day that India is most active.

The Bharatiya Janata Party, too, showed a steady influx of daily tweets although upto almost 2 times that of the AAP. It follows similar patterns.

The Indian National Congress seems to have a more erratic flow of tweets. This may be in part due to the difficult it is to extra tweets relevant to the party, on account of words like 'Sonia' and 'Congress' not being a 100% indicator of the context.


Sentiment Analysis




We stripped the tweets of hashtags and urls, corrected their spelling, and used a Python library called NLTK to analyze the English tweets for their sentiment. Each analyzed tweet had a sentiment polarity from -1 (bad) to 1 (good) and a subjectivity of 0 (not really sure) to 1(certain). We remove all unclassifiable tweets and look at 3D histograms to see how each of the parties performed. A high level overview is given by donut charts here.

The Aam Aadmi Party showed a mean polarity of around 0.16 with a standard deviation of around 0.37. 71% of classified votes were found to be positive.

The Bharatiya Janata Party showed a slightly less mean of 0.13, with the exact same standard deviation as the AAP. 67% of classified votes were found to be positive

The Indian National Congress show's the highest mean sentiment of 0.18, and yet again the same standard deviation, even though only 71% of all classified votes to be positive.


Location Analysis




Although only 1% of all our tweets were tagged with location data, and some of that data was from beyond our borders, we plotted the ones that were in our borders in attempt to visualize the demographic of our tweeters.

The Aam Aadmi Party seemed to very popular in North India, especially in Delhi - a result we expected to see.

The Bharatiya Janata Party not only provided us with more data, but also had a much more widespread popularity, which seemed to quite uniformly cover the country.

The Indian National Congress , again, is a tricky situation due to the lack of words which ascertain the appropriate context. Nonetheless, the results show that the popularity is again concentrated in the Delhi region.


Some other nifty details




We looked at the distribution of words tweets regarding each of the parties gathered, and the distribution of hashtags and other things and found some interesting resuls.


The 411k tweets about Aam Aadmi Party , nonsurprisingly, contained the word "AAP" 211k times. The top 10 most used functional words were:

WordOccurences(in 1000s)
1AAP211
2Kejriwal81
3Arvind56
4vote55
5Indian31
6Modi30
7politician28
8BJP27
9Varanasi23
10Aam22

The 829k tweets about Bharatiya Janata Party contained the word "Modi" a staggering 489k times. The top 10 most used functional words were:

WordOccurences(in 1000s)
1Modi489
2BJP272
3Narendra115
4vote69
5India64
6Congress44
7Indian40
8PM31
9hai31
10Pakistan30

The 319k tweets about Indian National Congress , perhaps nonsurprisingly, contained the word "Gandhi" a staggering 137k times. The top 10 was:

WordOccurences(in 1000s)
1Gandhi137
2INC92
3Rahul36
4Priyanka30
5Mahatma21
6Sonia15
7Modi15
8Congress12
9India10
10BJP10

We also looked at the next 10 most interesting words:

WordOccurences(in 1000s)
Shazia14
Delhi13
Amethi13
support11.5
against11.5
communal11.2
Congress10.4
candidate10.1
ensure8
politics8
WordOccurences(in 1000s)
Giriraj27
leader27
Singh26
politician25
Togadia25
Gujarat21
Ji20
Varanasi19
Muslims18
Media18
WordOccurences(in 1000s)
Vadra9
people9
rally6
family6
Ji5.1
PM5.0
Narendra4.9
Robert4.9
Times4.6
Interview4.5




We'll let you, the reader, draw inferences from the data we've presented. It reveals a lot about the high level nature of the elections, albeit distorted with some noice. Of only thing, we can be fairly certain - the results will be exciting.



If you have any comments or suggestions, reach out to us at any of the following:


Only time can tell us how things actually unfold.