API's

Application Programming Interface

Retrieving data troughout API's

API stands for Application Programming Interface. It is the interface that allows software applications to communicate with one another. An API is a software-to-software interface, not a user interface. With APIs, applications talk to each other without any user knowledge or intervention.

An example is the Twitter API. It is a web-based JSON API that allows developers to programmatically interact with Twitter data. The Twitter API is a web-based API. It must be accessed by making requests over the Internet to services that Twitter hosts. With a web-based API such as Twitter’s, your application sends an HTTP request, just like a web browser does. But instead of the response being delivered as a webpage, for human understanding, it’s returned in a format that applications can easily parse. Various formats exist for this purpose, and Twitter uses a popular and easy-to-use format called JSON.

In order to access Twitter Streaming API, we need to get 4 pieces of information from Twitter: API key, API secret, Access token, and Access token secret. If you to https://apps.twitter.com/ and log in with your Twitter credentials you can create a New App and get the API key credentials for yourself.

For the twitter API we need the tweepy library see https://tweepy.readthedocs.io/en/latest/

In the example below we see a piece of code that downloads the tweets into a JSON file

#source: http://adilmoujahid.com/posts/2014/07/twitter-analytics/
#Import the necessary methods from tweepy library
from tweepy.streaming import StreamListener
from tweepy import OAuthHandler
from tweepy import Stream

#Variables that contains the user credentials to access Twitter API 
access_token = token
access_token_secret = secret_token
consumer_key = api_key
consumer_secret = api_secret_key


#This is a basic listener that just stores tweets in json file
class StdOutListener(StreamListener):

    def on_data(self, data):
        with open('data/result2.json', 'a') as f:
            f.write(data)
        print(data)
        return True

    def on_error(self, status):
        print(status)


if __name__ == '__main__':

    #This handles Twitter authetification and the connection to Twitter Streaming API
    l = StdOutListener()
    auth = OAuthHandler(consumer_key, consumer_secret)
    auth.set_access_token(access_token, access_token_secret)
    stream = Stream(auth, l)

    #This line filter Twitter Streams to capture data by the keywords
    stream.filter(track=['Sunday'])

Since this is a JSON format we can process the data accordingly. If we open the json file we see it contain records according the { } format, but is is not closed by the [ ]. Therefor we apply a little trick to enclose the data in the [ ] format. After that we can put the data into a pandas DataFrame and process further.

contributors

coordinates

created_at

display_text_range

entities

extended_tweet

favorite_count

favorited

filter_level

geo

...

quoted_status_id_str

quoted_status_permalink

reply_count

retweet_count

retweeted

source

text

timestamp_ms

truncated

user

0

None

None

Sun May 19 17:53:12 +0000 2019

NaN

{'hashtags': [], 'urls': [{'expanded_url': 'ht...

{'full_text': 'Can we put this out to the spor...

0

False

low

None

...

1130144770255413248

{'expanded': 'https://twitter.com/girlontheriv...

0

0

False

<a href="http://twitter.com/download/iphone" r...

Can we put this out to the sports med communit...

1558288392774

True

{'listed_count': 49, 'following': None, 'defau...

1

None

None

Sun May 19 17:56:17 +0000 2019

[12, 130]

{'hashtags': [], 'urls': [], 'symbols': [], 'u...

NaN

0

False

low

None

...

NaN

NaN

0

0

False

<a href="http://twitter.com/download/iphone" r...

@mboyle1959 I might clarify that the “low” end...

1558288577756

False

{'listed_count': 3, 'following': None, 'defaul...

2

None

None

Sun May 19 17:59:11 +0000 2019

NaN

{'hashtags': [], 'urls': [{'expanded_url': 'ht...

{'full_text': 'Mechanical efficiency of high v...

0

False

low

None

...

NaN

NaN

0

0

False

<a href="http://twitter.com" rel="nofollow">Tw...

Mechanical efficiency of high versus moderate ...

1558288751178

True

{'listed_count': 18, 'following': None, 'defau...

3 rows × 34 columns

For instance we can select a column

Subtract by regex

We can apply regex to filter interesting information. In the examples below we first extract the hyperlinks, then we search for the word 'sport'

Read more

More information is to be found here https://github.com/fenna/twitter_analysis

Last updated

Was this helpful?