How to crawl Instagram data using its public API and Python?
In 2020, the official Instagram API allow you to access only your own posts and not even public comments and posts on Instagram because of the rising privacy concerns from the users and frequent accusations of data-breach at many big companies including Facebook. This has made it difficult for programmers to crawl Instagram data.
So, how to crawl Instagram data?
There’s still a workaround. It does provide an API which is publicly accessible.
Let’s try to hit this URL.
Eureka, it’s a JSON response:
URL & JSON response
Here, travel is the hashtag, as we can also see in the JSON response. And JSON response consist of all the posts containing hashtag travel. Now JSON response is easy to understand. Edges is the list that contains posts’ data. So, now all we need is to parse this JSON to get the data.
Programmatically parsing response using Python
Libraries required: requests
Here’s a quick Python code to get the captions from the posts, you can modify it for your own use:
import requests class Parser: HASH_KEY = "graphql" HASHTAG_KEY = "hashtag" MEDIA_KEY = "edge_hashtag_to_media" LIST_KEY = "edges" NODE_KEY = "node" CAPTION_LIST_KEY = "edge_media_to_caption" TEXT_KEY = "text" def __init__(self, tag): self.tag = tag def get_url(self): url = "https://www.instagram.com/explore/tags/" + self.tag + "/?__a=1" return url def get_request_response(self): r = requests.get(url=self.get_url(), params="") data = r.json() return data def get_captions(self): captions =  data = self.get_request_response() nodes_list = data[Parser.HASH_KEY][Parser.HASHTAG_KEY][Parser.MEDIA_KEY][Parser.LIST_KEY] for obj in nodes_list: caption_list = obj[Parser.NODE_KEY][Parser.CAPTION_LIST_KEY][Parser.LIST_KEY] if len(caption_list) > 0: caption = caption_list[Parser.NODE_KEY][Parser.TEXT_KEY] captions.append(caption) print(caption) def main(): parser = Parser("travel") parser.get_captions() if __name__ == "__main__": main()
If you are new to python programming, I would recommend you to go through any of the following books:
- Python Crash Course, 2nd Edition: A Hands-On, Project-Based Introduction to Programming
- Python: The Complete Reference
- Let Us Python
- Head First Python: A Brain-Friendly Guide
At Only Code, we’re curating an intensive collection of programming questions which have been asked in interviews and questions which help people understand a programming concept. There are many articles, dedicated specifically to advanced competitive programming concepts like:
- Modular Multiplicative Inverse
- Nth Catalan Number
- How to calculate Binomial Coefficient
- Sieve of Eratosthenes
We’ve also covered Data Structures(DS) & Algorithms topics such as: Dynamic Programming, Trees, and Graphs.
Besides this there’re are language specific blog posts like: C/C++, Java, and Python.
If you want to contribute, you’re most welcomed. Otherwise, keep a check, you might learn something useful.