Found 73 repositories(showing 30)
Best free, open-source datasets for data science and machine learning projects. Top government data including census, economic, financial, agricultural, image datasets, labeled and unlabeled, autonomous car datasets, and much more. Data.gov NOAA - https://www.ncdc.noaa.gov/cdo-web/ atmospheric, ocean Bureau of Labor Statistics - https://www.bls.gov/data/ employment, inflation US Census Data - https://www.census.gov/data.html demographics, income, geo, time series Bureau of Economic Analysis - http://www.bea.gov/data/gdp/gross-dom... GDP, corporate profits, savings rates Federal Reserve - https://fred.stlouisfed.org/ curency, interest rates, payroll Quandl - https://www.quandl.com/ financial and economic Data.gov.uk UK Dataservice - https://www.ukdataservice.ac.uk Census data and much more WorldBank - https://datacatalog.worldbank.org census, demographics, geographic, health, income, GDP IMF - https://www.imf.org/en/Data economic, currency, finance, commodities, time series OpenData.go.ke Kenya govt data on agriculture, education, water, health, finance, … https://data.world/ Open Data for Africa - http://dataportal.opendataforafrica.org/ agriculture, energy, environment, industry, … Kaggle - https://www.kaggle.com/datasets A huge variety of different datasets Amazon Reviews - https://snap.stanford.edu/data/web-Am... 35M product reviews from 6.6M users GroupLens - https://grouplens.org/datasets/moviel... 20M movie ratings Yelp Reviews - https://www.yelp.com/dataset 6.7M reviews, pictures, businesses IMDB Reviews - http://ai.stanford.edu/~amaas/data/se... 25k Movie reviews Twitter Sentiment 140 - http://help.sentiment140.com/for-stud... 160k Tweets Airbnb - http://insideairbnb.com/get-the-data.... A TON of data by geo UCI ML Datasets - http://mlr.cs.umass.edu/ml/ iris, wine, abalone, heart disease, poker hands, …. Enron Email dataset - http://www.cs.cmu.edu/~enron/ 500k emails from 150 people From 2001 energy scandal. See the movie: The Smartest Guys in the Room. Spambase - https://archive.ics.uci.edu/ml/datase... Emails Jeopardy Questions - https://www.reddit.com/r/datasets/com... 200k Questions and answers in json Gutenberg Ebooks - http://www.gutenberg.org/wiki/Gutenbe... Large collection of books
Reddit Data Science Project Ideas
jmportilla
Repo for Capstone Project
adam-mcdaniel
A data science project to analyze Reddit content
behavioral-data
Information and data for the reddit Community Values Surveys, a project of the Behavioral Data Science Lab at the University of Washington's Allen School of Computer Science and Engineering.
csush
Data Science Project: To compare toxicity of comments between Facebook and Reddit.
UMassCDS
The Center for Data Science repository for the International Hate Observatory Project and analyzing Reddit. This produces the models used in RedditMap.social.
gilaniasher
Sentiment-driven stock market analysis using Yahoo Finance, Twitter, Reddit, and NLP. CMSC320 Data Science Final Project.
LawlessJ
A data science project completed in General Assembly, utilizing webscraping to collect data, and NLP to analyze reddit comments and posts on two opposed political subreddits.
noaimabari
Web app created using python’s micro web framework flask which predicts the flair of a reddit post given its url. The project utilizes data science concepts and NLP techniques for extracting, cleaning, processing of text data and performing exploratory data analysis. Machine learning algorithms used for classification.
Research paper of this project is under review at the Journal of Universal Computer Science. Project analyzes Amazon Stock data using Python. Feature Extraction is performed and ARIMA and Fourier series models are made. LSTM is used with multiple features to predict stock prices and then sentimental analysis is performed using news and reddit sentiments. GANs are used to predict stock data too where Amazon data is taken from an API as Generator and CNNs are used as discriminator.
MatteoLarrode
Reddit text analysis collaborative project for Fundamentals of Social Data Science at Oxford Uni (MSc Social Data Science)
sierraflanagan4
This Code Extracts Data From Reddit Comments using the RedditExtractoR package and applies sentiment analysis to categorize comments as news or conspiracy
KiranGershenfeld
A collection of data science projects based around reddit and twitch.
angelowilliams
A data science project to scrape reddit user data and predict their political preferences based on the political compass
A binary classification project in NLP completed during the General Assembly Data Science Immersive utilizing NLTK and a reddit API
tercasaskova311
This project analyzes how public discussion on Reddit responds to real-world conflict events in Gaza and the West Bank. We combine data from Reddit (online discourse) and ACLED (verified conflict events) to examine temporal relationships using Python-based data science methods.
Created a model based on posts from numerous programming-focused subreddits on reddit.com. Project completed as a student in the Data Science Immersive program at General Assembly.
hxwwong
This project uses beautifulsoup4 to scrape data on the web and store them as dataframes. The websites in question include reddit and book website. This was submitted in partial fulfillment of the course Fundamentals of Data Science (DATA100) under the Data Science Minor program at De La Salle University.
Pratap-Samar
A full-stack data science project that scrapes Reddit for fan opinions on Spider-Man comics, performs sentiment analysis with NLTK, and visualizes the results in an interactive Streamlit dashboard.
gabegagster
A data science project analyzing 2024 US Election discourse on Reddit using R. It employs hypothesis testing, K-means clustering, and network analysis to uncover user engagement patterns, topic prevalence, and community structures.
alimanbg
Detecting and Visualizing Echo Chambers in Reddit Vaccine Myths Discussions. Code, data, reports, and analysis for an academic project examining echo chamber formation and misinformation in r/VaccineMyths using Python data science tools. Includes sentiment analysis, topic modeling, and network-based community detection.
A full-stack data science application that performs real-time sentiment analysis on stock-related discussions from Reddit. The project uses Natural Language Processing (NLP) and Machine Learning to analyze posts from popular financial subreddits (like r/wallstreetbets, r/stocks) and provides sentiment insights through an interactive dashboard.
rohandhupar1996
we will use the pipeline we have been building, and apply it to a real world data pipeline project. From a JSON API, we will filter, clean, aggregate, and summarize data in a sequence of tasks that will apply these transformations for us. The data we will use comes from a Hacker News (HN) API that returns JSON data of the top stories in 2014. If you're unfamiliar with Hacker News, it's a link aggregator website that users vote up stories that are interesting to the community. It is similar to Reddit, but the community only revolves around on computer science and entrepreneurship posts.
shenoy10
No description available
qrampah
A Project analyzing top posts and comments from the DataScience subreddit in 2024
vsurendr
No description available
No description available
A data science project that helps detect and handle clone accounts on the Reddit platform to optimize the platform and avoid bad issues that cause negative impacts.
AslihanYoldas
No description available