Skip to main content

Scraping with NodeJS and Cheerio

· 6 min read

Everybody says that python is best for scraping but I always wonder why we don't use NodeJs for Scraping?

The answer is clear scraping is CPU intensive task since NodeJs is single threaded so scraping blocks the main thread. I have one solution for the problem worker threads. We would scrap the IMDB website for the data.

Our goal is to extract all the data from this page. We would scrap all the details of the tv show, all awards won by the tv show, the cast of the tv show, episodes, seasons and much more.

The data provided from scraping is way more than provided by our script and would be more than provided by any third-party API.