web-scraping | TNPSC Fuhrer Notes

Web scraping requires two parts, namely the crawler and the scraper.

The crawler is an artificial intelligence algorithm that browses the web to search for the particular data required by following the links across the internet.
The scraper, on the other hand, is a specific tool created to extract data from the website. The design of the scraper can vary greatly according to the complexity and scope of the project so that it can quickly and accurately extract the data.

Python Libraries

[[Scrapy]] is a very popular open-source web crawling framework that is written in Python. It is ideal for web scraping as well as extracting data using APIs.

[[Beautiful soup]] is another Python library that is highly suitable for Web Scraping. It creates a parse tree that can be used to extract data from HTML on a website. Beautiful soup also has multiple features for navigation, searching, and modifying these parse trees.