API's
Application Programming Interface
Retrieving data troughout API's
API stands for Application Programming Interface. It is the interface that allows software applications to communicate with one another. An API is a software-to-software interface, not a user interface. With APIs, applications talk to each other without any user knowledge or intervention.
An example is the Twitter API. It is a web-based JSON API that allows developers to programmatically interact with Twitter data. The Twitter API is a web-based API. It must be accessed by making requests over the Internet to services that Twitter hosts. With a web-based API such as Twitter’s, your application sends an HTTP request, just like a web browser does. But instead of the response being delivered as a webpage, for human understanding, it’s returned in a format that applications can easily parse. Various formats exist for this purpose, and Twitter uses a popular and easy-to-use format called JSON.
In order to access Twitter Streaming API, we need to get 4 pieces of information from Twitter: API key, API secret, Access token, and Access token secret. If you to https://apps.twitter.com/ and log in with your Twitter credentials you can create a New App and get the API key credentials for yourself.
For the twitter API we need the tweepy library see https://tweepy.readthedocs.io/en/latest/
In the example below we see a piece of code that downloads the tweets into a JSON file
#source: http://adilmoujahid.com/posts/2014/07/twitter-analytics/
#Import the necessary methods from tweepy library
from tweepy.streaming import StreamListener
from tweepy import OAuthHandler
from tweepy import Stream
#Variables that contains the user credentials to access Twitter API
access_token = token
access_token_secret = secret_token
consumer_key = api_key
consumer_secret = api_secret_key
#This is a basic listener that just stores tweets in json file
class StdOutListener(StreamListener):
def on_data(self, data):
with open('data/result2.json', 'a') as f:
f.write(data)
print(data)
return True
def on_error(self, status):
print(status)
if __name__ == '__main__':
#This handles Twitter authetification and the connection to Twitter Streaming API
l = StdOutListener()
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
stream = Stream(auth, l)
#This line filter Twitter Streams to capture data by the keywords
stream.filter(track=['Sunday'])Since this is a JSON format we can process the data accordingly. If we open the json file we see it contain records according the { } format, but is is not closed by the [ ]. Therefor we apply a little trick to enclose the data in the [ ] format. After that we can put the data into a pandas DataFrame and process further.
contributors
coordinates
created_at
display_text_range
entities
extended_tweet
favorite_count
favorited
filter_level
geo
...
quoted_status_id_str
quoted_status_permalink
reply_count
retweet_count
retweeted
source
text
timestamp_ms
truncated
user
0
None
None
Sun May 19 17:53:12 +0000 2019
NaN
{'hashtags': [], 'urls': [{'expanded_url': 'ht...
{'full_text': 'Can we put this out to the spor...
0
False
low
None
...
1130144770255413248
{'expanded': 'https://twitter.com/girlontheriv...
0
0
False
<a href="http://twitter.com/download/iphone" r...
Can we put this out to the sports med communit...
1558288392774
True
{'listed_count': 49, 'following': None, 'defau...
1
None
None
Sun May 19 17:56:17 +0000 2019
[12, 130]
{'hashtags': [], 'urls': [], 'symbols': [], 'u...
NaN
0
False
low
None
...
NaN
NaN
0
0
False
<a href="http://twitter.com/download/iphone" r...
@mboyle1959 I might clarify that the “low” end...
1558288577756
False
{'listed_count': 3, 'following': None, 'defaul...
2
None
None
Sun May 19 17:59:11 +0000 2019
NaN
{'hashtags': [], 'urls': [{'expanded_url': 'ht...
{'full_text': 'Mechanical efficiency of high v...
0
False
low
None
...
NaN
NaN
0
0
False
<a href="http://twitter.com" rel="nofollow">Tw...
Mechanical efficiency of high versus moderate ...
1558288751178
True
{'listed_count': 18, 'following': None, 'defau...
3 rows × 34 columns
For instance we can select a column
Subtract by regex
We can apply regex to filter interesting information. In the examples below we first extract the hyperlinks, then we search for the word 'sport'
Read more
Last updated
Was this helpful?
