Web Scraping in Python In this appendix lecture we'll go over how to scrape information from the web using Python. We'll go to a website, decide what information we want, see where and how it is stored, then scrape it and set it as a pandas DataFrame! Some things you should consider before web scraping a website: 1.) You should check a site's terms and conditions before you scrape them. 2.) Space out your requests so you don't overload the site's server, doing this could get you blocked. 3.) Scrapers break after time - web pages change their layout all the time, you'll more than likely have to rewrite your code. 4.) Web pages are usually inconsistent, more than likely you'll have to clean up the data after scraping it. 5.) Every web page and situation is different, you'll have to spend time configuring your scraper. To learn more about HTML I suggest theses two resources: W3School Codecademy There are three modules we'll need in addition to python are: 1.) BeautifulSoup, which you can download by typing: pip install beautifulsoup4 or conda install beautifulsoup4 (for the Anaconda distrbution of Python) in your command prompt. 2.) lxml , which you can download by typing: pip install lxml or conda install lxml (for the Anaconda distrbution of Python) in your command prompt. 3.) requests, which you can download by typing: pip install requests or conda install requests (for the Anaconda distrbution of Python) in your command prompt. We'll start with our imports:
Stars
2
Forks
0
Watchers
2
Open Issues
0
Overall repository health assessment
No package.json found
This might not be a Node.js project
2
commits