Found 1,712 repositories(showing 30)
shafiab
My Insight Data Engineering Fellowship project. I implemented a big data processing pipeline based on lambda architecture, that aggregates Twitter and US stock market data for user sentiment analysis using open source tools - Apache Kafka for data ingestions, Apache Spark & Spark Streaming for batch & real-time processing, Apache Cassandra f or storage, Flask, Bootstrap and HighCharts f or frontend.
A real-time interactive web app based on data pipelines using streaming Twitter data, automated sentiment analysis, and MySQL&PostgreSQL database (Deployed on Heroku)
ujjwalkarn
tutorial for sentiment analysis on Twitter data using Python
llSourcell
Twitter Sentiment Analysis Challenge for Learn Python for Data Science #2 by @Sirajology on Youtube
Aryia-Behroziuan
An ANN is a model based on a collection of connected units or nodes called "artificial neurons", which loosely model the neurons in a biological brain. Each connection, like the synapses in a biological brain, can transmit information, a "signal", from one artificial neuron to another. An artificial neuron that receives a signal can process it and then signal additional artificial neurons connected to it. In common ANN implementations, the signal at a connection between artificial neurons is a real number, and the output of each artificial neuron is computed by some non-linear function of the sum of its inputs. The connections between artificial neurons are called "edges". Artificial neurons and edges typically have a weight that adjusts as learning proceeds. The weight increases or decreases the strength of the signal at a connection. Artificial neurons may have a threshold such that the signal is only sent if the aggregate signal crosses that threshold. Typically, artificial neurons are aggregated into layers. Different layers may perform different kinds of transformations on their inputs. Signals travel from the first layer (the input layer) to the last layer (the output layer), possibly after traversing the layers multiple times. The original goal of the ANN approach was to solve problems in the same way that a human brain would. However, over time, attention moved to performing specific tasks, leading to deviations from biology. Artificial neural networks have been used on a variety of tasks, including computer vision, speech recognition, machine translation, social network filtering, playing board and video games and medical diagnosis. Deep learning consists of multiple hidden layers in an artificial neural network. This approach tries to model the way the human brain processes light and sound into vision and hearing. Some successful applications of deep learning are computer vision and speech recognition.[68] Decision trees Main article: Decision tree learning Decision tree learning uses a decision tree as a predictive model to go from observations about an item (represented in the branches) to conclusions about the item's target value (represented in the leaves). It is one of the predictive modeling approaches used in statistics, data mining, and machine learning. Tree models where the target variable can take a discrete set of values are called classification trees; in these tree structures, leaves represent class labels and branches represent conjunctions of features that lead to those class labels. Decision trees where the target variable can take continuous values (typically real numbers) are called regression trees. In decision analysis, a decision tree can be used to visually and explicitly represent decisions and decision making. In data mining, a decision tree describes data, but the resulting classification tree can be an input for decision making. Support vector machines Main article: Support vector machines Support vector machines (SVMs), also known as support vector networks, are a set of related supervised learning methods used for classification and regression. Given a set of training examples, each marked as belonging to one of two categories, an SVM training algorithm builds a model that predicts whether a new example falls into one category or the other.[69] An SVM training algorithm is a non-probabilistic, binary, linear classifier, although methods such as Platt scaling exist to use SVM in a probabilistic classification setting. In addition to performing linear classification, SVMs can efficiently perform a non-linear classification using what is called the kernel trick, implicitly mapping their inputs into high-dimensional feature spaces. Illustration of linear regression on a data set. Regression analysis Main article: Regression analysis Regression analysis encompasses a large variety of statistical methods to estimate the relationship between input variables and their associated features. Its most common form is linear regression, where a single line is drawn to best fit the given data according to a mathematical criterion such as ordinary least squares. The latter is often extended by regularization (mathematics) methods to mitigate overfitting and bias, as in ridge regression. When dealing with non-linear problems, go-to models include polynomial regression (for example, used for trendline fitting in Microsoft Excel[70]), logistic regression (often used in statistical classification) or even kernel regression, which introduces non-linearity by taking advantage of the kernel trick to implicitly map input variables to higher-dimensional space. Bayesian networks Main article: Bayesian network A simple Bayesian network. Rain influences whether the sprinkler is activated, and both rain and the sprinkler influence whether the grass is wet. A Bayesian network, belief network, or directed acyclic graphical model is a probabilistic graphical model that represents a set of random variables and their conditional independence with a directed acyclic graph (DAG). For example, a Bayesian network could represent the probabilistic relationships between diseases and symptoms. Given symptoms, the network can be used to compute the probabilities of the presence of various diseases. Efficient algorithms exist that perform inference and learning. Bayesian networks that model sequences of variables, like speech signals or protein sequences, are called dynamic Bayesian networks. Generalizations of Bayesian networks that can represent and solve decision problems under uncertainty are called influence diagrams. Genetic algorithms Main article: Genetic algorithm A genetic algorithm (GA) is a search algorithm and heuristic technique that mimics the process of natural selection, using methods such as mutation and crossover to generate new genotypes in the hope of finding good solutions to a given problem. In machine learning, genetic algorithms were used in the 1980s and 1990s.[71][72] Conversely, machine learning techniques have been used to improve the performance of genetic and evolutionary algorithms.[73] Training models Usually, machine learning models require a lot of data in order for them to perform well. Usually, when training a machine learning model, one needs to collect a large, representative sample of data from a training set. Data from the training set can be as varied as a corpus of text, a collection of images, and data collected from individual users of a service. Overfitting is something to watch out for when training a machine learning model. Federated learning Main article: Federated learning Federated learning is an adapted form of distributed artificial intelligence to training machine learning models that decentralizes the training process, allowing for users' privacy to be maintained by not needing to send their data to a centralized server. This also increases efficiency by decentralizing the training process to many devices. For example, Gboard uses federated machine learning to train search query prediction models on users' mobile phones without having to send individual searches back to Google.[74] Applications There are many applications for machine learning, including: Agriculture Anatomy Adaptive websites Affective computing Banking Bioinformatics Brain–machine interfaces Cheminformatics Citizen science Computer networks Computer vision Credit-card fraud detection Data quality DNA sequence classification Economics Financial market analysis[75] General game playing Handwriting recognition Information retrieval Insurance Internet fraud detection Linguistics Machine learning control Machine perception Machine translation Marketing Medical diagnosis Natural language processing Natural language understanding Online advertising Optimization Recommender systems Robot locomotion Search engines Sentiment analysis Sequence mining Software engineering Speech recognition Structural health monitoring Syntactic pattern recognition Telecommunication Theorem proving Time series forecasting User behavior analytics In 2006, the media-services provider Netflix held the first "Netflix Prize" competition to find a program to better predict user preferences and improve the accuracy of its existing Cinematch movie recommendation algorithm by at least 10%. A joint team made up of researchers from AT&T Labs-Research in collaboration with the teams Big Chaos and Pragmatic Theory built an ensemble model to win the Grand Prize in 2009 for $1 million.[76] Shortly after the prize was awarded, Netflix realized that viewers' ratings were not the best indicators of their viewing patterns ("everything is a recommendation") and they changed their recommendation engine accordingly.[77] In 2010 The Wall Street Journal wrote about the firm Rebellion Research and their use of machine learning to predict the financial crisis.[78] In 2012, co-founder of Sun Microsystems, Vinod Khosla, predicted that 80% of medical doctors' jobs would be lost in the next two decades to automated machine learning medical diagnostic software.[79] In 2014, it was reported that a machine learning algorithm had been applied in the field of art history to study fine art paintings and that it may have revealed previously unrecognized influences among artists.[80] In 2019 Springer Nature published the first research book created using machine learning.[81] Limitations Although machine learning has been transformative in some fields, machine-learning programs often fail to deliver expected results.[82][83][84] Reasons for this are numerous: lack of (suitable) data, lack of access to the data, data bias, privacy problems, badly chosen tasks and algorithms, wrong tools and people, lack of resources, and evaluation problems.[85] In 2018, a self-driving car from Uber failed to detect a pedestrian, who was killed after a collision.[86] Attempts to use machine learning in healthcare with the IBM Watson system failed to deliver even after years of time and billions of dollars invested.[87][88] Bias Main article: Algorithmic bias Machine learning approaches in particular can suffer from different data biases. A machine learning system trained on current customers only may not be able to predict the needs of new customer groups that are not represented in the training data. When trained on man-made data, machine learning is likely to pick up the same constitutional and unconscious biases already present in society.[89] Language models learned from data have been shown to contain human-like biases.[90][91] Machine learning systems used for criminal risk assessment have been found to be biased against black people.[92][93] In 2015, Google photos would often tag black people as gorillas,[94] and in 2018 this still was not well resolved, but Google reportedly was still using the workaround to remove all gorillas from the training data, and thus was not able to recognize real gorillas at all.[95] Similar issues with recognizing non-white people have been found in many other systems.[96] In 2016, Microsoft tested a chatbot that learned from Twitter, and it quickly picked up racist and sexist language.[97] Because of such challenges, the effective use of machine learning may take longer to be adopted in other domains.[98] Concern for fairness in machine learning, that is, reducing bias in machine learning and propelling its use for human good is increasingly expressed by artificial intelligence scientists, including Fei-Fei Li, who reminds engineers that "There’s nothing artificial about AI...It’s inspired by people, it’s created by people, and—most importantly—it impacts people. It is a powerful tool we are only just beginning to understand, and that is a profound responsibility.”[99] Model assessments Classification of machine learning models can be validated by accuracy estimation techniques like the holdout method, which splits the data in a training and test set (conventionally 2/3 training set and 1/3 test set designation) and evaluates the performance of the training model on the test set. In comparison, the K-fold-cross-validation method randomly partitions the data into K subsets and then K experiments are performed each respectively considering 1 subset for evaluation and the remaining K-1 subsets for training the model. In addition to the holdout and cross-validation methods, bootstrap, which samples n instances with replacement from the dataset, can be used to assess model accuracy.[100] In addition to overall accuracy, investigators frequently report sensitivity and specificity meaning True Positive Rate (TPR) and True Negative Rate (TNR) respectively. Similarly, investigators sometimes report the false positive rate (FPR) as well as the false negative rate (FNR). However, these rates are ratios that fail to reveal their numerators and denominators. The total operating characteristic (TOC) is an effective method to express a model's diagnostic ability. TOC shows the numerators and denominators of the previously mentioned rates, thus TOC provides more information than the commonly used receiver operating characteristic (ROC) and ROC's associated area under the curve (AUC).[101] Ethics Machine learning poses a host of ethical questions. Systems which are trained on datasets collected with biases may exhibit these biases upon use (algorithmic bias), thus digitizing cultural prejudices.[102] For example, using job hiring data from a firm with racist hiring policies may lead to a machine learning system duplicating the bias by scoring job applicants against similarity to previous successful applicants.[103][104] Responsible collection of data and documentation of algorithmic rules used by a system thus is a critical part of machine learning. Because human languages contain biases, machines trained on language corpora will necessarily also learn these biases.[105][106] Other forms of ethical challenges, not related to personal biases, are more seen in health care. There are concerns among health care professionals that these systems might not be designed in the public's interest but as income-generating machines. This is especially true in the United States where there is a long-standing ethical dilemma of improving health care, but also increasing profits. For example, the algorithms could be designed to provide patients with unnecessary tests or medication in which the algorithm's proprietary owners hold stakes. There is huge potential for machine learning in health care to provide professionals a great tool to diagnose, medicate, and even plan recovery paths for patients, but this will not happen until the personal biases mentioned previously, and these "greed" biases are addressed.[107] Hardware Since the 2010s, advances in both machine learning algorithms and computer hardware have led to more efficient methods for training deep neural networks (a particular narrow subdomain of machine learning) that contain many layers of non-linear hidden units.[108] By 2019, graphic processing units (GPUs), often with AI-specific enhancements, had displaced CPUs as the dominant method of training large-scale commercial cloud AI.[109] OpenAI estimated the hardware compute used in the largest deep learning projects from AlexNet (2012) to AlphaZero (2017), and found a 300,000-fold increase in the amount of compute required, with a doubling-time trendline of 3.4 months.[110][111] Software Software suites containing a variety of machine learning algorithms include the following: Free and open-source so
anubhavanand12qw
The coding has been done on Python 3.65 using Jupyter Notebook. This program fetches LIVE data from TWITTER using Tweepy. Then we clean our data or tweets ( like removing special characters ). After that we perform sentiment analysis on the twitter data and plot it for better visualization. The we fetch the STOCK PRICE from yahoo.finance and add it to the data-set to perform prediction. We apply many machine learning algorithms like (random forest, MLPClassifier, logistic regression) and train our data-set. Then we perform prediction on untrained data and plot it with the real data and see the accuracy.
roshancyriacmathew
This project walks you on how to create a twitter sentiment analysis model using python. Twitter sentiment analysis is performed to identify the sentiments of the people towards various topics. For this project, we will be analysing the sentiment of people towards Pfizer vaccines. We will be using the data available on Kaggle to create this machine learning model. The collected tweets from Twitter will be analysed using machine learning to identify the different sentiments present in the tweets. The different sentiments identified in this project include positive sentiment, negative sentiment and neutral sentiment. We will also be using different classifiers to see which classifier gives the best model accuracy.
kenchang408
Currently taking a course on Coursera.org called Introduction to Data Science. The very first programming assignment is called Twitter Sentiment Analysis in Python. I would like to keep track of my progress of the source code for this project and if people want to contribute after the class is over that is good. This is also used as a tutorial for people to learn how to use Twitter API in python to do Twitter sentiment analysis
Best free, open-source datasets for data science and machine learning projects. Top government data including census, economic, financial, agricultural, image datasets, labeled and unlabeled, autonomous car datasets, and much more. Data.gov NOAA - https://www.ncdc.noaa.gov/cdo-web/ atmospheric, ocean Bureau of Labor Statistics - https://www.bls.gov/data/ employment, inflation US Census Data - https://www.census.gov/data.html demographics, income, geo, time series Bureau of Economic Analysis - http://www.bea.gov/data/gdp/gross-dom... GDP, corporate profits, savings rates Federal Reserve - https://fred.stlouisfed.org/ curency, interest rates, payroll Quandl - https://www.quandl.com/ financial and economic Data.gov.uk UK Dataservice - https://www.ukdataservice.ac.uk Census data and much more WorldBank - https://datacatalog.worldbank.org census, demographics, geographic, health, income, GDP IMF - https://www.imf.org/en/Data economic, currency, finance, commodities, time series OpenData.go.ke Kenya govt data on agriculture, education, water, health, finance, … https://data.world/ Open Data for Africa - http://dataportal.opendataforafrica.org/ agriculture, energy, environment, industry, … Kaggle - https://www.kaggle.com/datasets A huge variety of different datasets Amazon Reviews - https://snap.stanford.edu/data/web-Am... 35M product reviews from 6.6M users GroupLens - https://grouplens.org/datasets/moviel... 20M movie ratings Yelp Reviews - https://www.yelp.com/dataset 6.7M reviews, pictures, businesses IMDB Reviews - http://ai.stanford.edu/~amaas/data/se... 25k Movie reviews Twitter Sentiment 140 - http://help.sentiment140.com/for-stud... 160k Tweets Airbnb - http://insideairbnb.com/get-the-data.... A TON of data by geo UCI ML Datasets - http://mlr.cs.umass.edu/ml/ iris, wine, abalone, heart disease, poker hands, …. Enron Email dataset - http://www.cs.cmu.edu/~enron/ 500k emails from 150 people From 2001 energy scandal. See the movie: The Smartest Guys in the Room. Spambase - https://archive.ics.uci.edu/ml/datase... Emails Jeopardy Questions - https://www.reddit.com/r/datasets/com... 200k Questions and answers in json Gutenberg Ebooks - http://www.gutenberg.org/wiki/Gutenbe... Large collection of books
Sentiment analysis over twitter data (deep learning) in Python
pmbaumgartner
Provide a comprehensive list of tokenizers, features, and general NLP things used for text analysis with examples. The initial focus is on features used for twitter data and sentiment analysis.
ajayshewale
This project addresses the problem of sentiment analysis on Twitter. The goal of this project was to predict sentiment for the given Twitter post using Python. Sentiment analysis can predict many different emotions attached to the text, but in this report, only 3 major were considered: positive, negative and neutral. The training dataset was small (just over 5900 examples) and the data within it was highly skewed, which greatly impacted on the difficulty of building a good classifier. After creating a lot of custom features, utilizing bag-of-words representations and applying the Extreme Gradient Boosting algorithm, the classification accuracy at the level of 58% was achieved. Analysing the public sentiment as firms trying to find out the response of their products in the market, predicting political elections and predicting socioeconomic phenomena like the stock exchange.
akurniawan
char-rnn implementation for sentiment analysis on twitter data
pran4ajith
A real-time streaming ETL pipeline for streaming and performing sentiment analysis on Twitter data using Apache Kafka, Apache Spark and Delta Lake.
Aghoreshwar
Customer analytics has been one of hottest buzzwords for years. Few years back it was only marketing department’s monopoly carried out with limited volumes of customer data, which was stored in relational databases like Oracle or appliances like Teradata and Netezza. SAS & SPSS were the leaders in providing customer analytics but it was restricted to conducting segmentation of customers who are likely to buy your products or services. In the 90’s came web analytics, it was more popular for page hits, time on sessions, use of cookies for visitors and then using that for customer analytics. By the late 2000s, Facebook, Twitter and all the other socialchannels changed the way people interacted with brands and each other. Businesses needed to have a presence on the major social sites to stay relevant. With the digital age things have changed drastically. Customer issuperman now. Their mobile interactions have increased substantially and they leave digital footprint everywhere they go. They are more informed, more connected, always on and looking for exceptionally simple and easy experience. This tsunami of data has changed the customer analytics forever. Today customer analytics is not only restricted to marketing forchurn and retention but more focus is going on how to improve thecustomer experience and is done by every department of the organization. A lot of companies had problems integrating large bulk of customer data between various databases and warehouse systems. They are not completely sure of which key metrics to use for profiling customers. Hence creating customer 360 degree view became the foundation for customer analytics. It can capture all customer interactions which can be used for further analytics. From the technology perspective, the biggest change is the introduction of big data platforms which can do the analytics very fast on all the data organization has, instead of sampling and segmentation. Then came Cloud based platforms, which can scale up and down as per the need of analysis, so companies didn’t have to invest upfront on infrastructure. Predictive models of customer churn, Retention, Cross-Sell do exist today as well, but they run against more data than ever before. Even analytics has further evolved from descriptive to predictive to prescriptive. Only showing what will happen next is not helping anymore but what actions you need to take is becoming more critical. There are various ways customer analytics is carried out: Acquiring all the customer data Understanding the customer journey Applying big data concepts to customer relationships Finding high propensity prospects Upselling by identifying related products and interests Generating customer loyalty by discovering response patterns Predicting customer lifetime value (CLV) Identifying dissatisfied customers & churn patterns Applying predictive analytics Implementing continuous improvement Hyper-personalization is the center stage now which gives your customer the right message, on the right platform, using the right channel, at the right time. Now via Cognitive computing and Artificial Intelligence using IBM Watson, Microsoft and Google cognitive services, customer analytics will become sharper as their deep learning neural network algorithms provide a game changing aspect. Tomorrow there may not be just plain simple customer sentiment analytics based on feedback or surveys or social media, but with help of cognitive it may be what customer’s facial expressions show in real time. There’s no doubt that customer analytics is absolutely essential for brand survival.
vishalbhalla
Predict Personality of a person using Sentiment Analysis & Unigram Words as features on user's Twitter data.
shaheen-syed
Full workflow to perform sentiment analysis on Twitter data. Contains crawlers, parsers, preprocessing, machine learning model creation, and various plots.
ginking
Archimedes 1 is a bot based sentient based trader, heavily influenced on forked existing bots, with a few enhancements here or there, this was completed to understand how the bots worked to roll the forward in our own manner to our own complete ai based trading system (Archimedes 2:0) This bot watches [followed accounts] tweets and waits for them to mention any publicly traded companies. When they do, sentiment analysis is used determine whether the opinions are positive or negative toward those companies. The bot then automatically executes trades on the relevant stocks according to the expected market reaction. The code is written in Python and is meant to run on a Google Compute Engine instance. It uses the Twitter Streaming APIs (however new version) to get notified whenever tweets within remit are of interest. The entity detection and sentiment analysis is done using Google's Cloud Natural Language API and the Wikidata Query Service provides the company data. The TradeKing (ALLY) API does the stock trading (changed to ALLY). The main module defines a callback where incoming tweets are handled and starts streaming user's feed: def twitter_callback(tweet): companies = analysis.find_companies(tweet) if companies: trading.make_trades(companies) twitter.tweet(companies, tweet) if __name__ == "__main__": twitter.start_streaming(twitter_callback) The core algorithms are implemented in the analysis and trading modules. The former finds mentions of companies in the text of the tweet, figures out what their ticker symbol is, and assigns a sentiment score to them. The latter chooses a trading strategy, which is either buy now and sell at close or sell short now and buy to cover at close. The twitter module deals with streaming and tweeting out the summary. Follow these steps to run the code yourself: 1. Create VM instance Check out the quickstart to create a Cloud Platform project and a Linux VM instance with Compute Engine, then SSH into it for the steps below. The predefined machine type g1-small (1 vCPU, 1.7 GB memory) seems to work well. 2. Set up auth The authentication keys for the different APIs are read from shell environment variables. Each service has different steps to obtain them. Twitter Log in to your Twitter account and create a new application. Under the Keys and Access Tokens tab for your app you'll find the Consumer Key and Consumer Secret. Export both to environment variables: export TWITTER_CONSUMER_KEY="<YOUR_CONSUMER_KEY>" export TWITTER_CONSUMER_SECRET="<YOUR_CONSUMER_SECRET>" If you want the tweets to come from the same account that owns the application, simply use the Access Token and Access Token Secret on the same page. If you want to tweet from a different account, follow the steps to obtain an access token. Then export both to environment variables: export TWITTER_ACCESS_TOKEN="<YOUR_ACCESS_TOKEN>" export TWITTER_ACCESS_TOKEN_SECRET="<YOUR_ACCESS_TOKEN_SECRET>" Google Follow the Google Application Default Credentials instructions to create, download, and export a service account key. export GOOGLE_APPLICATION_CREDENTIALS="/path/to/credentials-file.json" You also need to enable the Cloud Natural Language API for your Google Cloud Platform project. TradeKing (ALLY) Log in to your TradeKing (ALLY account and create a new application. Behind the Details button for your application you'll find the Consumer Key, Consumer Secret, OAuth (Access) Token, and Oauth (Access) Token Secret. Export them all to environment variables: export TRADEKING_CONSUMER_KEY="<YOUR_CONSUMER_KEY>" export TRADEKING_CONSUMER_SECRET="<YOUR_CONSUMER_SECRET>" export TRADEKING_ACCESS_TOKEN="<YOUR_ACCESS_TOKEN>" export TRADEKING_ACCESS_TOKEN_SECRET="<YOUR_ACCESS_TOKEN_SECRET>" Also export your TradeKing (ALLY) account number, which you'll find under My Accounts: export TRADEKING_ACCOUNT_NUMBER="<YOUR_ACCOUNT_NUMBER>" 3. Install dependencies There are a few library dependencies, which you can install using pip: $ pip install -r requirements.txt 4. Run the tests Verify that everything is working as intended by running the tests with pytest using this command: $ export USE_REAL_MONEY=NO && pytest *.py --verbose 5. Run the benchmark The benchmark report shows how the current implementation of the analysis and trading algorithms would have performed against historical data. You can run it again to benchmark any changes you may have made: $ ./benchmark.py > benchmark.md 6. Start the bot Enable real orders that use your money: $ export USE_REAL_MONEY=YES Have the code start running in the background with this command: $ nohup ./main.py & License Archimedes (edits under Invacio) Max Braun Frame under Max Braun, licence under Apache V2 License. Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
DevikaMishra-Dataturks
Complete Guide to text processing and sentiment analysis on Twitter data.
NanditaRao
The application is a cloud service that provides the functionality of performing sentiment analysis on stock market and financial data. The application can be hosted on Google App Engine and makes use of many of the GAE services like Search Service, MemCache, DataStore etc. Given the name of a company, data from various sources like Twitter, Facebook Graph, Google News, Google Finance etc is aggregated. For each source, different models have been pretrained using some prior data. Using different models provided us with a chance to utilize different Machine Learning methodologies based on the type of data from each source. The various techniques that we have built and tested on are :Naive Bayes, Multinomial and Bernoulli text representations, KNN.
This project aims to use the Hadoop framework to analyze unstructured data that we obtain from Twitter and perform sentiment and trend analysis using Hive on MapReduce and Spark on keyword “COVID19”. We then compare the Hive and Spark approaches to determine the best performance.
chandrahas-reddy
This repository helps Data Analytics/Science Enthusiasts to carry out Sentiment Analysis on text derived from Twitter, Facebook and other data sources.
UrbanTchen
Using Twitter Sentiment Analysis Data & Santiment's Blockchain Activity Data to to Multivariate Time Series Forecasting on Altcoin's USD & BTC Price.
priyeshpatel
Moodmap is an application which correlates data from Twitter with data from the government. Tweets are put through sentiment analysis (to assess the overall mood) and then plotted on a map according to the location from which they were tweeted from. Government data for deprivation is then overlayed on top of this.
Dhanuraj-22
A Natural Language Processing project that performs sentiment analysis on Twitter data using TF-IDF and Logistic Regression. The model classifies tweets as positive or negative and evaluates performance using accuracy and classification report.
Problem Statement The objective of this task is to detect hate speech in tweets. For the sake of simplicity, we say a tweet contains hate speech if it has a racist or sexist sentiment associated with it. So, the task is to classify racist or sexist tweets from other tweets. Formally, given a training sample of tweets and labels, where label '1' denotes the tweet is racist/sexist and label '0' denotes the tweet is not racist/sexist, your objective is to predict the labels on the test dataset. Motivation Hate speech is an unfortunately common occurrence on the Internet. Often social media sites like Facebook and Twitter face the problem of identifying and censoring problematic posts while weighing the right to freedom of speech. The importance of detecting and moderating hate speech is evident from the strong connection between hate speech and actual hate crimes. Early identification of users promoting hate speech could enable outreach programs that attempt to prevent an escalation from speech to action. Sites such as Twitter and Facebook have been seeking to actively combat hate speech. In spite of these reasons, NLP research on hate speech has been very limited, primarily due to the lack of a general definition of hate speech, an analysis of its demographic influences, and an investigation of the most effective features. Data Our overall collection of tweets was split in the ratio of 65:35 into training and testing data. Out of the testing data, 30% is public and the rest is private. Data Files train.csv - For training the models, we provide a labelled dataset of 31,962 tweets. The dataset is provided in the form of a csv file with each line storing a tweet id, its label and the tweet. There is 1 test file (public) test_tweets.csv - The test data file contains only tweet ids and the tweet text with each tweet in a new line.
A sentiment analysis project performed on data collected from Twitter mentioning the two primary contestants in the 2020 US Elections.
Analysis of the opinions expressed on Twitter regarding the relocation of Indonesia's capital city using combination of algorithm classifiers Support Vector Machine (SVM), Feature Selection Term Frequency Inverse Document (TF-IDF), and Bag of Words, and also using a Lexicon-based approach for labeling data as positive or negative sentiment
degenspot
OnChain Sage is an AI-driven, decentralized trading assistant that fuses real-time social sentiment analysis with on-chain market data. It helps crypto traders identify high-potential tokens by scanning Twitter for trending narratives and monitoring on-chain metrics from platforms like Raydium and Dex Screener.
adriel1997
Twitter based sentiment analysis using JAVA and Hadoop. In this project we are doing the sentiment analysis on twitter data to analyse whether the tweets posted by people are positive or negative or neutral by checking the tweets with the AFFIN dictionary which has a set of 2500 words along with the value of each word ranging from -5 to +5 denoting whether tweets are positive or negative.