Reading Live Tweets

  • This tutorial provides steps to read live tweets and do a basic trend analysis
  • The tweets can be filtered using a search term
  • The tweets received are stored as json file and then the json file is loaded to a dataframe
  • The tweet messages are tokenzied and stop words are removed from the text
  • The final tokens are then counted and sorted in order of their frequency of occurances
In [35]:
import tweepy
from tweepy import OAuthHandler
from tweepy import Stream
from tweepy.streaming import StreamListener

How to create an twitter application and get consumer and access tokens

  • This webpage provides detailed steps on how to login to twitter account and get these details.
  • Once account is opened and the tokens are obtained, please enter the details in the section below
In [36]:
consumer_key = ''
consumer_secret = ''
access_token = ''
access_secret = ''
In [37]:
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_secret)

Provide the file name to store the tweets

In [38]:
class TweetsListener( StreamListener ):

  def on_data(self, data):
          with open('mytweets.json', 'a', newline='') as f:
              return True
          num_tweets = num_tweets + 1
      except BaseException as e:
          print("Error on_data: %s" % str(e))
      return True

  def on_error(self, status):
      return True

Provide the filter word and start streaming the tweets

In [39]:
import os

#Delete any existing file. If you do not want to delete the earlier file. Then you can commend the line below
os.remove( 'mytweets.json' )
  twitter_stream = Stream( auth, TweetsListener() )
  twitter_stream.filter( track=['sultan'] )
except BaseException as e:
  print( "Streaming Stopped")
Streaming Stopped
In [40]:
# read the entire file into a python array
with open('mytweets.json', 'r') as f:
  data = f.readlines()

# remove the trailing "\n" from each line
data = map(lambda x: x.rstrip(), data)
In [41]:
# Load all json messages as one python list
data_json_str = "[" + ','.join(data) + "]"
In [42]:
# Now the list of json message can be converted into dataframe. Python support loading 
# json messages as dataframe
import pandas as pd
tweets_df = pd.read_json(data_json_str)
In [43]:
contributors coordinates created_at entities extended_entities favorite_count favorited filter_level geo id ... quoted_status_id quoted_status_id_str retweet_count retweeted retweeted_status source text timestamp_ms truncated user
0 NaN NaN 2016-07-09 05:26:48 {'user_mentions': [], 'hashtags': [], 'symbols': [], 'urls': [{'indices': [5, 28], 'expanded_url': '', 'url': '', 'display_url': '…'}]} NaN 0 False low NaN 751648779468566532 ... 7.513779e+17 7.513779e+17 0 False NaN <a href="" rel="nofollow">Twitter for Android</a> 😀😀😀😀 2016-07-09 05:26:48.641 False {'created_at': 'Thu Jul 16 04:49:09 +0000 2015', 'is_translator': False, 'description': 'Known As Salman Khan Fan.', 'profile_image_url_https': '', 'profile_use_background_image': True, 'profile_banner_url': '', 'profile_background_image_url': '', 'contributors_enabled': False, 'profile_image_url': '', 'geo_enabled': True, 'default_profile': True, 'notifications': None, 'lang': 'en', 'favourites_count': 15722, 'following': None, 'profile_background_image_url_https': '', 'time_zone': Non...
1 NaN NaN 2016-07-09 05:26:49 {'user_mentions': [{'name': 'Indian Boxoffice', 'id': 3306396280, 'id_str': '3306396280', 'indices': [3, 12], 'screen_name': 'TradeBOC'}, {'name': 'Salman Khan', 'id': 132385468, 'id_str': '132385468', 'indices': [92, 108], 'screen_name': 'BeingSalmanKhan'}, {'name': 'Anushka Sharma', 'id': 54829997, 'id_str': '54829997', 'indices': [110, 124], 'screen_name': 'AnushkaSharma'}], 'hashtags': [{'indices': [63, 70], 'text': 'SULTAN'}], 'symbols': [], 'urls': []} NaN 0 False low NaN 751648781083545600 ... NaN NaN 0 False {'in_reply_to_status_id_str': None, 'retweeted': False, 'user': {'created_at': 'Tue Jun 02 05:44:19 +0000 2015', 'is_translator': False, 'description': 'Posts boxoffice collections of #Indians films,Analyise of boxoffice, Prediction,Verdicts,News etc.', 'profile_image_url_https': '', 'profile_use_background_image': True, 'profile_banner_url': '', 'profile_background_image_url': '', 'contributors_enabled': False, 'profile_image_url': '', 'geo_enabled': False, 'default_profile': True, 'notifications': None, 'lang': 'en', 'favourites_cou... <a href="" rel="nofollow">Twitter for Android</a> RT @TradeBOC: All time top 10 Domestic Second Day collection, #SULTAN ranked 1st position. @beingsalmankhan· @anushkasharma 2016-07-09 05:26:49.026 False {'created_at': 'Sun Aug 25 05:03:59 +0000 2013', 'is_translator': False, 'description': 'BEING HUMAN', 'profile_image_url_https': '', 'profile_use_background_image': True, 'profile_banner_url': '', 'profile_background_image_url': '', 'contributors_enabled': False, 'profile_image_url': '', 'geo_enabled': True, 'default_profile': True, 'notifications': None, 'lang': 'en', 'favourites_count': 2922, 'following': None, 'profile_background_image_url_https': '', 'time_zone': None, 'default_pro...

2 rows × 31 columns

In [44]:
tweets_df_text = tweets_df[['text']]
In [45]:
pd.set_option('max_colwidth', 800)
0 😀😀😀😀
1 RT @TradeBOC: All time top 10 Domestic Second Day collection, #SULTAN ranked 1st position. @beingsalmankhan· @anushkasharma
2 RT @madreeve: @madreeve 5.
3 RT @MVenkaiahNaidu: I rarely go to movies. Yesterday being a holiday, grand daughter Sushma took me to Salman Khan's starrer Sultan. 1/
4 RT @MayurStudios: #Sultan has Crushed the Lifetime Collections of previous YRF film #FAN , during early morning show today.…
5 RT @Tutejajoginder: #Sultan - @AnushkaSharma has now enjoyed a century each with Salman [Sultan], Aamir [PK], SRK [JTHJ]. Good going!
6 RT @Stenymerowin: எல்லாரும் வழக்கம் போல அனில அடிச்சு சாகடிங்க💪💪💪
7 RT @shubhh876: #SULTAN will hit 400 cr easily.. now 33 cr on day 3 is outstanding ..
8 RT @AsliShiva: #Sultan crossed 100cr mark on Friday. @BeingSalmanKhan's 10th consecutive film to do so.
9 RT @addatoday: ' @BeingSalmanKhan becomes 1st Person ever to have 'TEN' 100 crore Films at Indian BO. Creates HISTORY with #Sultan. https:/…
10 A look at the Standard Gauge Railway vs The Meter Gauge Railway at Sultan Hamud. #BrandObura exclusive
11 RT @kamaalrkhan: It's clear now that #Sultan will be first ever film which will collect 30Cr+ on first 5days n it's proof of Salman khan's…
12 RT @KRKBoxOffice: #Sultan collected 30+Cr on day3 and it's a record that any film collected 30+Cr on first 3days.
13 RT @EternalSalman_: Bajrangi Bhaijaan crossed HNY lifetime in 7\ndays. &amp; Looking like #Sultan will cross HNY lifetime in just 5 days. :P
14 RT @Salilacharya: check out a post mortem analysis of #sultan with my fellow rjs @ArchanaaPania and @Su4ita @beingSalmanKhan…
15 Sukses kk tam di kampung orang 🙏🏻\nSafe flight 😇 @ Sultan Hasanuddin International Airport
16 RT @SultanEid016: What is Stardom..??\nThe thing that other star want to follow it. But in salman case the thing which follow #sultan is sta…
17 RT @saini12ajay_: @AnushkaSharma is only actress with two 100 crores club movies and one 500 crores movie and one more should added to this…
18 RT @kamaalrkhan: It's clear now that #Sultan will be first ever film which will collect 30Cr+ on first 5days n it's proof of Salman khan's…
19 @Gr8IndianFood #FilmyFoodie Sultan for Shahi Tukda
20 RT @sameeratweeter: My sister asking me can u whistle #Sultan #SalmanKhan #chandan \nLol
21 RT @KomalNahta: IT IS HUGE HUGE HUGE! That's #Sultan. Unstoppable tsunami from Salman Khan this Eid. Let the celebrations begin!!!
22 RT @umangarora_: #Sultan is a reminder that good flick can also be made with 'Simple Story' lineup and extraordinary Direction and Screenpl…
23 RT @AttitudeKnight: I loved the movie so so much, I think I will give it a watch again with my family. #Sultan
24 RT @kabirkhankk: #Sultan delivers a solid punch... A strong Emotional core that really moves you... @BeingSalmanKhan is at his best @Anushk…
25 RT @IndiaBoxOffice1: #Sultan is setting records all over the world, meanwhile, time for a POLL :\n\nWHO HAS THE LARGER FAN FOLLOWING?
26 RT @VishalDadlani: Whaaaaaaat! That's awesome! #Sultan @BeingSalmanKhan @aliabbaszafar @yrf @AnushkaSharma @ShekharRavjiani…
27 RT @kabirkhankk: At YRF to watch #Sultan 😄👍🏼 @aliabbaszafar @BeingSalmanKhan @AnushkaSharma
28 200 Crore For Sultan: via @YouTube
29 😀😀😀😀😀
... ...
43 RT @MayurStudios: #Sultan - . @BeingSalmanKhan has now hit 10 centuries in a row - Dabangg, Ready, Bodyguard, ETT, D2, Jai Ho, Kick, BB, PR…
44 watching Sultan
45 RT @SalluLicious: Anushka Sharma looked CUTE&amp;She did her role Excellently! Salman Bro&amp;Her chemistry was really Stunning!They should do anot…
46 RT @Tutejajoginder: #Sultan - @AnushkaSharma has now enjoyed a century each with Salman [Sultan], Aamir [PK], SRK [JTHJ]. Good going!
47 With Melgi at Waiting Room Sultan Syarif Kasim II International Airport —
48 RT @brijeshrtrivedi: #Sultan is a sport movie with lots of emotions and awesome performances. Movie teach us never give up in life.
49 RT @raydeep: #Sultan storms @KomalNahta says this can be the biggest blockbuster of @BeingSalmanKhan. Wll cross Rs 300 crore…
50 RT @nikimarwah: The movie #Sultan inspires, motivates, moves and entertains. Every scene is a treat! A fabulous package and a must watch fo…
51 RT @kabirkhankk: Ladies and Gentleman.... Here's SULTANNNNNN!!!
52 RT @Bollyhungama: 100 cr club: @BeingSalmanKhan 's #SULTAN all set to surpass collections of #AIRLIFT &amp; become highest grosser of 2016\nhttp…
53 RT @TheViralFever: Releasing every #EID.\n\n#Sultan #Dangal
54 RT @sameeratweeter: Younger sister refuses to come to #chandan next time as can't hear a dialogue with such deafening whistles /claps #Sult…
55 #Sultan , I heard so much good things bout you. I am coming for you ! I can't wait! 😍 @BeingSalmanKhan
56 RT @Tutejajoginder: #Sultan - For @yrf, it is their fourth century after Ek Tha Tiger, Jab Tak Hai Jaan and Dhoom 3.
57 RT @MrUthaman: pandaram is leading. anils using all 420 work to win. Vote for thala ajith
58 RT @aamirmerijaan: I liked @iamsrk's fan more than Sultan to be honest!
59 RT @Ak47Damani: Most people celebrate this day by partying and everything, but SULTAN is my most important priority today :) #Sultan https:…
60 RT @taran_adarsh: #Sultan continued its MAJESTIC run for the third consecutive day [Fri]... Is all set for a POWER-PACKED 5-day weekend in…
61 #Sultan Box-office Collection for two days 73 Cr. 1st day 36.54 Cr&amp; Day2 37.20 Cr. @BeingSalmanKhan
62 RT @IconSalman: CROWD Gathered IN LAHORE TO WATCH SULTAN !!! #ReSultan
63 RT @aamir_khan: Saw Sultan last night . OUTSTANDING! Ali Abbas shines as writer and director. (1/2)
64 RT @iFaridoon: Watchin #Sultan wid Aarefa n Insha,packed hall..ppl clapping,laughing,clicking selfies wid Sultan cutout in interval https:/…
65 Re #Sultan kardi box office PR chadhai #SALMAN10thConsecutive100cr
66 RT @sameeratweeter: #Sultan #SalmanKhan the biggest superstar only he can break the BO collection ..\nWhoaaaaa
67 @BeingSalmanKhan खून में है मिट्टी, मिट्टी में है खून, ऊपर अल्लाह, नीचे तेरा जूनून, #Sultan. @AnushkaSharma I watched it. Its superb movie!
68 RT @Being_Soniyaa: Now KRK to SRK - Bhai mein kya kru , kitna negative bol diya SULTAN K liye but He is unstoppable. \nThen both cries 😀😁😂
69 RT @RamGKrishna: #SULTAN \nDay1 -36.54 Cr. (Pre EID)\nDay2 -37.20 Cr. (EID)\nDay3 -32-33 Cr(Working Day)\nTotal -106-107 Cr\n\nNow Weekend Sat +…
70 RT @Tutejajoginder: #Sultan @aliabbaszafar has entered #100croreclub in style after MBKD and Gunday (which were successful too). He is in s…
71 RT @satishkaushik2: Uff what a film #Sultan @BeingSalmanKhan what a great performance..u r like a raging bull..unstoppable..hats off to the…
72 RT @kamaalrkhan: It's clear now that #Sultan will be first ever film which will collect 30Cr+ on first 5days n it's proof of Salman khan's…

73 rows × 1 columns

In [46]:
# Tokenize the tweets message 
# Remove stop words
# Calculate the occurance of words
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer

vect = CountVectorizer( stop_words = "english" ) tweets_df_text.text )
tweets_vec = vect.fit_transform( tweets_df_text.text )
word_freq_df = pd.DataFrame({'term': vect.get_feature_names(),
                           'tf':np.asarray( tweets_vec.sum( axis=0 ) ).ravel().tolist()})
term tf
0 10 3
1 100 3
2 100cr 2
3 100crore 1
4 100croreclub 1
In [47]:
# Print the words in descending order
word_freq_df.sort_values( "tf", ascending = False ).head( 10 )
term tf
402 sultan 68
351 rt 60
224 https 21
71 beingsalmankhan 15
121 cr 12
295 movie 10
137 day 9
357 salman 9
181 film 8
430 tutejajoginder 7