Found 571 repositories(showing 30)
canaveensetia
This Project is part of Data Science Nanodegree Program by Udacity in collaboration with Figure Eight. The dataset contains pre-labelled tweet and messages from real-life disaster events. The project aim is to build a Natural Language Processing (NLP) model to categorize messages on a real time basis.
prateeksawhney97
This Project is a part of Data Science Nanodegree Program by Udacity in collaboration with Figure Eight. The initial dataset contains pre-labelled tweet and messages from real-life disasters. The aim of this project is to build a Natural Language Processing tool that categorize messages.
rsreetech
A BERT based solution to Natural Language Processing with Disaster Tweets Predict which Tweets are about real disasters and which ones are not ( Kaggle Getting Started Competition )
Kaggle Project: Predict which Tweets are about real disasters and which ones are not.
Natural Language Processing (NLP) is heavily being used in our text classification task. So, before we begin, I want to cover a few terms and concepts that we will be using. This will help you understand why a particular function or process is being called or at the very least clear any confusion you might have. I) Stemming – Stemming is a process applied to a single word to derive its root. Many words that are being used in a sentence are often inflected or derived. To standardize our process, we would like to stem such words and end up with only root words. For example, a stemmer will convert the following words “walking”, “walked”, “walker” to its root word “walk“. II) Tokenization – Tokens are basically words. This is a process of taking in a piece of text and find out all the unique words in the text. We would get a list of words in the text as the output of tokens. For example, for the sentence “Python NLP is just going great” we have the token list [ “Python”, “NLP”, ïs”, “just”, “going”, “great”]. So, as you can see, tokenization involves breaking up the text into words. III) Bag of Words – The Bag of Words model in Text Processing is the process of creating a unique list of words. This model is used as a tool for feature generation. Eg: consider two sentences: Star Wars is better than Star Trek. Star Trek isn’t as good as Star Wars. For the above two sentences, the bag of words will be: [“Star”, “Wars”, “Trek”, “better”, “good”, “isn’t”, “is”, “as”]. The position of each word in the list is hence fixed. Now, to construct a feature for classification from a sentence, we use a binary array ( an array where each element can either be 1 or 0). For example, a new sentence, “Wars is good” will be represented as [0,1,0,0,1,0,1,0] . As you can see in the array, position 2 is set to 1 because the word in position 2 is “wars” in the bag of words which is also present in our example sentence. This same holds good for the other words “is” and “good” as well. You can read more about the Bag of Words model here. Step 1: Data Preparation Before we train a model that can classify a given text to a particular category, we have to first prepare the data. We can create a simple JSON file that will hold the required data for training. We are using a dataset of 2014_India_floods which contains tweets from twitter as text and its assigned category. In this dataset we are having 9 different categories regarding natural disaster. We will be having a JSON with 9 categories. For each category, we have a set of sentences which we can use to train our model. Given this data, we have to classify any given sentence into one of these 9 categories. Step 2: Data Load and Pre-processing We will be creating multiple lists and each list “words” will hold all the unique stemmed words in all the sentences provided for training. Another list “categories” holds all the different categories. The output of this step is the list which contains the words from each sentence and which category the sentence belongs. An example document is ([“whats”, “your”, “age”], “age”). Step 3: Convert the data to Tensorflow Specification From the previous step, we have documents but they are still in the text form. Tensorflow being a math library accepts the data in the numeric form. So, before we begin with the tensorflow text classification, we take the text form and apply the bag of words model to convert the sentence into a numeric binary array. We then store the labels/category, in the same way, that is a numeric binary array. Step 4: Initiate Tensorflow Text Classification With the documents in the right form, we can now begin the tensorflow text classification. In this step, we build a simple Deep Neural Network and use that for training our model.The code runs for a 100 epochs with a batch size of 20 and it took around 2 hours to finish training. The size of data and the type of GPU heavily determine the time taken for training. Step 5: Testing the Tensorflow Text Classification Model We can now test the neural network text classification python model. The model was able to correctly classify almost all the sentences. There will definitely be a lot of sentences that might fail to be classified correctly. This is only because the amount of data is less. with more and more data, you can be assured the model will be more confident. Conclusion This is how you can perform tensorflow text classification. You can use this approach and scale it to perform a lot of different classification. You can use it to build chatbots as well. How users can get started with the project NGO's, organisations etc can get categorical tweets from our project which can help them to get different informations like information regarding infrastructure damage, no. of deaths etc. So, this project can help them to figure out current situation and take decision accordingly. Dataset used We are using dataset of the 2014 India Floods. Technologies used Python Information Retrieval Natural Language Processing Deep learning Tensorflow NLTK
No description available
No description available
emirkaanozdemr
No description available
VedantGabhane
This project leverages Natural Language Processing (NLP) to classify disaster-related tweets using a Naive Bayes classifier. Implemented with Streamlit, it provides an interactive web application to visualize and analyze tweet classifications in real-time, aiding in timely and effective disaster response.
The objective of the project is to predict whether a particular tweet, of which the text (occasionally the keyword and the location as well) is provided, indicates a real disaster or not. We use various NLP techniques and classification models for this purpose and objectively compare these models by means of appropriate evaluation metric.
No description available
A lightweight NLP baseline for the Kaggle competition “Natural Language Processing with Disaster Tweets”
MahalavanyaSriram
Kaggle Competition - Natural Language Processing with Disaster Tweets
Kaggle competition
No description available
A deep learning-based application built using Natural Language Processing (NLP) techniques to classify tweets as disaster-related or not. This solution demonstrates accurate classification of social media content for disaster detection and crisis management.
taher-software
In this Kaggle competition, we’re challenged to build a machine learning model that predicts which Tweets are about real disasters and which one’s aren’t.
bianchi-john
The competition "Getting Started" is proposed as a task suitable for "novices" in the field of natural language processing. This is due, on the one hand, to the small size of the datasets provided and, on the other hand, to the binary classification task which is certainly more accessible if compared to multi-level classification. The goal of this competition is, starting from a training dataset and an evaluation dataset, to realize a model, based on machine learning, able to distinguish between tweets that refer to a real disaster situation (natural disaster or other kind of anomaly/emergency) and those that are not. This project illustrates the realization of two different models both able to perform this classification task: the first one, based on a traditional approach, uses the technique of feature engineering while the second one uses the Bert language model through Simple Transformers library. Translated with www.DeepL.com/Translator (free version)
Parneet-Sandhu
Natural Language Processing with Disaster Tweets Predict which Tweets are about real disasters and which ones are not.I achieved Rank 15 with 0.87618 accuracy.
AbhinavSharma07
My solution to Kaggle's Getting started, "Natural Language Processing with Disaster Tweets" competition. Uses GloVe + BiLSTM
nfacciol
Nicholas Facciola NLP submission for Natural Language Processing with Disaster Tweets https://www.kaggle.com/competitions/nlp-getting-started/overview
ShirakGevorgyan
This project applies boosting techniques in Natural Language Processing (NLP) to classify disaster-related tweets. Using XGBoost and Multinomial Naive Bayes, we achieved competitive results on Kaggle's NLP with Disaster Tweets competition.
SergKhachikyan
Natural Language Processing with Disaster Tweets (Kaggle) This repository contains my full pipeline for the Kaggle competition NLP - Getting Started. The goal is to classify whether a tweet is about a real disaster or not (target = 1 or 0).
hrithickcodes
Real or Not? NLP with Disaster Tweets is a kaggle competition problem where we have to predict which Tweets are about real disasters and which ones are not. I participated in the Kaggle competition and solved the problem using Machine Learning and Natural Language processing. This repository contains all the code regarding the solution that I submitted at that time.
nikjohn7
My solution to Kaggle's Getting started, "Natural Language Processing with Disaster Tweets" competition. Uses GloVe + BiLSTM
siyovushchik1414
NLP из Kaggle
No description available
using natural language processing on twitter tweets to find the disaster related matter
No description available
No description available