Downloading more than 20 years of The New York Times

Articles for the period from 1987 to present are available without subscription. Their copyright notice is web scraping friendly: “… you may download material from The New York Times on the Web (one machine readable copy and one print copy per page) for your personal, noncommercial use only.” Why waste the opportunity to download these […]

Continue reading


Open Data Spotlight: The Ultimate European Soccer Database | Hugo Mathien

Whether you call it soccer or football, this sport is the world’s favorite to watch and play. Thanks to Hugo Mathien who compiled, cleaned, and shared a dataset of stats on European professional football on Kaggle, it can become a data scientist’s favorite playground, too. Among other data points, the database includes 25,000+ matches from […]

Continue reading


Implementing Web Scraping in Python with Beautiful Soup

Note: This article has also featured on geeksforgeeks.com . There are mainly two ways to extract data from a website: Use the API of the website (if it exists). For example, facebook has the Facebook Graph API which allows retrieval of data posted on facebook. Access the HTML of the webpage and extract useful information/data […]

Continue reading