From the 18th to the 24th of April, we decided to stream the tweets of Indians around the world in attempt to capture the voice of the 2014 Indian General Elections. In a week, we captured more than one and a half million on Twitter. Here's what we found.
Aam Aadmi Party
Bharatiya Janata Party
Indian National Congress
The Aam Aadmi Party showed a steady influx of daily tweets. The tweet frequency curve fits a sum of sines distribution of five terms. The graph shows us when India sleeps and the 2 times a day that India is most active.
The Bharatiya Janata Party, too, showed a steady influx of daily tweets although upto almost 2 times that of the AAP. It follows similar patterns.
We stripped the tweets of hashtags and urls, corrected their spelling, and used a Python library called NLTK to analyze the English tweets for their sentiment. Each analyzed tweet had a sentiment polarity from -1 (bad) to 1 (good) and a subjectivity of 0 (not really sure) to 1(certain). We remove all unclassifiable tweets and look at 3D histograms to see how each of the parties performed. A high level overview is given by donut charts here.
The Aam Aadmi Party showed a mean polarity of around 0.16 with a standard deviation of around 0.37. 71% of classified votes were found to be positive.
The Bharatiya Janata Party showed a slightly less mean of 0.13, with the exact same standard deviation as the AAP. 67% of classified votes were found to be positive
The Indian National Congress show's the highest mean sentiment of 0.18, and yet again the same standard deviation, even though only 71% of all classified votes to be positive.
Although only 1% of all our tweets were tagged with location data, and some of that data was from beyond our borders, we plotted the ones that were in our borders in attempt to visualize the demographic of our tweeters.
The Aam Aadmi Party seemed to very popular in North India, especially in Delhi - a result we expected to see.
The Bharatiya Janata Party not only provided us with more data, but also had a much more widespread popularity, which seemed to quite uniformly cover the country.
The Indian National Congress , again, is a tricky situation due to the lack of words which ascertain the appropriate context. Nonetheless, the results show that the popularity is again concentrated in the Delhi region.
We looked at the distribution of words tweets regarding each of the parties gathered, and the distribution of hashtags and other things and found some interesting resuls.
The 411k tweets about Aam Aadmi Party , nonsurprisingly, contained the word "AAP" 211k times. The top 10 most used functional words were:
The 829k tweets about Bharatiya Janata Party contained the word "Modi" a staggering 489k times. The top 10 most used functional words were:
The 319k tweets about Indian National Congress , perhaps nonsurprisingly, contained the word "Gandhi" a staggering 137k times. The top 10 was:
We also looked at the next 10 most interesting words:
We'll let you, the reader, draw inferences from the data we've presented. It reveals a lot about the high level nature of the elections, albeit distorted with some noice. Of only thing, we can be fairly certain - the results will be exciting.
If you have any comments or suggestions, reach out to us at any of the following: