In the PyCharm, I utilized Scrapy framework to create a distributed python crawler. Both proxy IP pool and server camouflage technology are used in crawler design. Redis technology is also taken into consideration to improve the efficiency of data acquisition. I tested the crawler by crawling commodity information in Taobao. Then I wrote the crawled data asynchronously into MySQL database by Twisted framework. In the IDLE, pandas module was used to read the stored data in the database, and python modules like numpy and matplotlib are also used to realize data analysis and visualization. Finally, machine learning modules like sklearn, Gensim and keras are used to realize data classification, data clustering and correlation analysis.
Stars
5
Forks
1
Watchers
5
Open Issues
2
Overall repository health assessment
No package.json found
This might not be a Node.js project
3
commits