Search Results

Found 15 repositories(showing 15)

Mongo_Scraper

CaptainEFFF

❤️20

# All the News That's Fit to Scrape ### Overview In this assignment, you'll create a web app that lets users view and leave comments on the latest news. But you're not going to actually write any articles; instead, you'll flex your Mongoose and Cheerio muscles to scrape news from another site. ### Before You Begin 1. Create a GitHub repo for this assignment and clone it to your computer. Any name will do -- just make sure it's related to this project in some fashion. 2. Run `npm init`. When that's finished, install and save these npm packages: 1. express 2. express-handlebars 3. mongoose 4. cheerio 5. axios 3. **NOTE**: If you want to earn complete credit for your work, you must use all five of these packages in your assignment. 4. In order to deploy your project to Heroku, you must set up an mLab provision. mLab is remote MongoDB database that Heroku supports natively. Follow these steps to get it running: 5. Create a Heroku app in your project directory. 6. Run this command in your Terminal/Bash window: * `heroku addons:create mongolab` * This command will add the free mLab provision to your project. 7. When you go to connect your mongo database to mongoose, do so the following way: ```js // If deployed, use the deployed database. Otherwise use the local mongoHeadlines database var MONGODB_URI = process.env.MONGODB_URI || "mongodb://localhost/mongoHeadlines"; mongoose.connect(MONGODB_URI); ``` * This code should connect mongoose to your remote mongolab database if deployed, but otherwise will connect to the local mongoHeadlines database on your computer. 8. [Watch this demo of a possible submission](https://youtu.be/4ltZr3VPmno). See the deployed demo application [here](http://nyt-mongo-scraper.herokuapp.com/). 9. Your site doesn't need to match the demo's style, but feel free to attempt something similar if you'd like. Otherwise, just be creative! ### Commits Having an active and healthy commit history on GitHub is important for your future job search. It is also extremely important for making sure your work is saved in your repository. If something breaks, committing often ensures you are able to go back to a working version of your code. * Committing often is a signal to employers that you are actively working on your code and learning. * We use the mantra “commit early and often.” This means that when you write code that works, add it and commit it! * Numerous commits allow you to see how your app is progressing and give you a point to revert to if anything goes wrong. * Be clear and descriptive in your commit messaging. * When writing a commit message, avoid vague messages like "fixed." Be descriptive so that you and anyone else looking at your repository knows what happened with each commit. * We would like you to have well over 200 commits by graduation, so commit early and often! ### Submission on BCS * **This assignment must be deployed.** * Please submit both the deployed Heroku link to your homework AND the link to the Github Repository! ## Instructions * Create an app that accomplishes the following: 1. Whenever a user visits your site, the app should scrape stories from a news outlet of your choice and display them for the user. Each scraped article should be saved to your application database. At a minimum, the app should scrape and display the following information for each article: * Headline - the title of the article * Summary - a short summary of the article * URL - the url to the original article * Feel free to add more content to your database (photos, bylines, and so on). 2. Users should also be able to leave comments on the articles displayed and revisit them later. The comments should be saved to the database as well and associated with their articles. Users should also be able to delete comments left on articles. All stored comments should be visible to every user. * Beyond these requirements, be creative and have fun with this! ### Tips * Go back to Saturday's activities if you need a refresher on how to partner one model with another. * Whenever you scrape a site for stories, make sure an article isn't already represented in your database before saving it; Do not save any duplicate entries. * Don't just clear out your database and populate it with scraped articles whenever a user accesses your site. * If your app deletes stories every time someone visits, your users won't be able to see any comments except the ones that they post. ### Helpful Links * [MongoDB Documentation](https://docs.mongodb.com/manual/) * [Mongoose Documentation](http://mongoosejs.com/docs/api.html) * [Cheerio Documentation](https://github.com/cheeriojs/cheerio) ### Reminder: Submission on BCS * Please submit both the deployed Heroku link to your homework AND the link to the Github Repository! --- ### Minimum Requirements * **This assignment must be deployed.** Attempt to complete homework assignment as described in instructions. If unable to complete certain portions, please pseudocode these portions to describe what remains to be completed. Hosting on Heroku and adding a README.md are required for this homework. In addition, add this homework to your portfolio, more information can be found below. --- ### Hosting on Heroku Now that we have a backend to our applications, we use Heroku for hosting. Please note that while **Heroku is free**, it will request credit card information if you have more than 5 applications at a time or are adding a database. Please see [Heroku’s Account Verification Information](https://devcenter.heroku.com/articles/account-verification) for more details. --- ### Create a README.md Add a `README.md` to your repository describing the project. Here are some resources for creating your `README.md`. Here are some resources to help you along the way: * [About READMEs](https://help.github.com/articles/about-readmes/) * [Mastering Markdown](https://guides.github.com/features/mastering-markdown/) --- ### Add To Your Portfolio After completing the homework please add the piece to your portfolio. Make sure to add a link to your updated portfolio in the comments section of your homework so the TAs can easily ensure you completed this step when they are grading the assignment. To receive an 'A' on any assignment, you must link to it from your portfolio. --- ### One Last Thing If you have any questions about this project or the material we have covered, please post them in the community channels in slack so that your fellow developers can help you! If you're still having trouble, you can come to office hours for assistance from your instructor and TAs. That goes threefold for this unit: MongoDB and Mongoose compose a challenging data management system. If there's anything you find confusing about these technologies, don't hesitate to speak with someone from the Boot Camp team. **Good Luck!**

JavaScript

Updated 6 months ago

scrape-hw

jdrenteria

❤️20

JavaScript

Updated 7 months ago

my_ruby_scraper

ggerdsen

❤️20

A web scraper that searches Indeed.com for entry-level remote jobs based on job title or keywords input by the user. The scraped data is parsed and exported to a CSV file. Current fields exported include Title, Company, Location, Job Summary, Easy Apply (boolean), and URL to the job posting.

Ruby

Updated 1 year ago

csv-exportnokogiriruby+2

Proxy-Powered-Job-Scraper

RohanDas28

❤️45

This repository contains a lightweight, production-ready Python web scraper designed to collect publicly available remote developer job listings from RemoteOK. The primary focus of this project is to demonstrate the proper, sustainable integration of mobile or residential proxies within a real-world data collection workflow.

Python

Updated 1 month ago

jobsearchproxy

remote-jobs-data-scraper

Lautarocuello98

🧡60

Python ETL pipeline that scrapes remote jobs from the RemoteOK API, cleans the dataset, and exports analytics-ready files (CSV, Excel, JSON).

MIT

Python

Updated 3 weeks ago

api-scrapingdata-cleaningdata-engineering+5

Remote-Jobs-Scraper-Data-Pipeline-mini-project-2

adeel-ai-builder

❤️45

No description available

Python

Updated 3 weeks ago

dba-job-hunter

xiangyuzeng

🧡55

Remote DBA/Data Engineer/SRE job hunter dashboard with auto-refresh, Vercel deployment, and GitHub Actions daily scraper

Python

Updated 3 weeks ago

web-scraper-toolkit

timdunn22

❤️45

HN Who Is Hiring scraper. Extracts structured job data (company, role, location, remote, tech stack) to CSV/JSON. Python + BeautifulSoup.

Python

Updated 1 month ago

web-Scraper

Hermela440

❤️35

A Python web scraper collects remote job listings and saves them to a Django app. Users can search, filter, paginate, and export jobs via a web interface, with real-time scraping and scheduled updates for always-fresh job data.

Python

Updated 9 months ago

CodeAlpha_WebScraping

MohdShoaib98

❤️35

Python web scraper to collect remote jobs from RemoteOK in Development, Design, Marketing, Writing, and Sales. Extracts Job Title, Company, Location, Salary, and Link, saving data in a formatted Excel file with headers, borders, alternating row colours, and filters.

Python

Updated 7 months ago

Api_scrapper

Pranjal-S101

❤️35

This repository contains a RemoteOK Job Scraper script written in Python. It fetches remote job postings from the RemoteOK API and exports the data into an Excel file for easy reference. This script is useful for job seekers looking for organized and accessible job listings.

Python

Updated 1 year ago

job_list_scraper

Rafid-Rahman

❤️40

I built a web scraper that extracts remote job listings from "We Work Remotely". This Python-based script uses BeautifulSoup to collect key job details, including: Job Titles & Companies, Job Posting Date & Application Deadline and Job Type & Location. The extracted data is structured and saved in an Excel file for easy access.

MIT

Python

Updated 10 months ago

beautifulsoup4data-extractionexcel-export+5

Linked-in-scraper

BH-coding1

❤️35

🚀 LinkedIn Job Scraper Bot This Python bot automates LinkedIn to: ✅ Log in manually ✅ Extract your profile skills ✅ Search for jobs based on those skills ✅ Collect job data (title, company, location, remote/local, and link) ✅ Save results to a CSV file and optionally to a Google Sheet

Python

Updated 9 months ago

dynamic-job-scraper

minhosong88

❤️40

This tool is designed to streamline your job search by automatically extracting and compiling job listings from various online sources. Whether you are looking for remote opportunities or local positions, this scraper will save you time and effort by consolidating job data into easy-to-use CSV files.for parsing HTML content.

MIT

Python

Updated 1 year ago

beautifulsoup4playwrightplaywright-python+2

-ai-job-scout

nemestron

🧡65

Autonomous AI-powered job discovery system that hunts AI/ML/Data Science roles across GCCs, startups, and remote boards. 4 specialized web scrapers + deterministic scoring post 10-12 verified job links to Telegram every hour via GitHub Actions. Cloud-native — no laptop needed. Zero AI AI API costs.

Updated 9 hours ago

All 15 repositories loaded

GitHub Explorer

Search Results

Mongo_Scraper

scrape-hw

my_ruby_scraper

Proxy-Powered-Job-Scraper

remote-jobs-data-scraper

Remote-Jobs-Scraper-Data-Pipeline-mini-project-2

dba-job-hunter

web-scraper-toolkit

web-Scraper

CodeAlpha_WebScraping

Api_scrapper

job_list_scraper

Linked-in-scraper

dynamic-job-scraper

-ai-job-scout

Mongo_Scraper

scrape-hw

my_ruby_scraper

Proxy-Powered-Job-Scraper

remote-jobs-data-scraper

Remote-Jobs-Scraper-Data-Pipeline-mini-project-2

dba-job-hunter

web-scraper-toolkit

web-Scraper

CodeAlpha_WebScraping

Api_scrapper

job_list_scraper

Linked-in-scraper

dynamic-job-scraper

-ai-job-scout