Search Results

Found 509 repositories(showing 30)

NLP-Analysis-of-Patient-Journals-for-Anxiety-Trends

Okes2024

🧡65

This project uses NLP to analyze synthetic patient journals for anxiety trends. It applies text preprocessing, scoring, and topic modeling to detect patterns over time, offering insights into mental health monitoring, clinical research, and AI applications for detecting anxiety-related changes in patient narratives.

Python

Updated 4 days ago

SMSTextSpamPrediction

FarooqMulla

💛70

Focuses on detecting spam messages in SMS text using Natural Language Processing (NLP) and Machine Learning techniques. It leverages text preprocessing, feature extraction, and classification algorithms to accurately predict whether a message is Spam or Ham (Not Spam).

MIT

Swift

Updated 3 days ago

coremlcoreml-modelsios+7

policy-data-analyzer

wri-dssg-omdena

❤️25

Building a model to recognize incentives for landscape restoration in environmental policies from Latin America, the US and India. Bringing NLP to the world of policy analysis through an extensible framework that includes scraping, preprocessing, active learning and text analysis pipelines.

NOASSERTION

Jupyter Notebook

Updated 3 months ago

active-learningbertdata-science+17

Sentiment-Analysis-NLP-with-Python

yrtnsari

❤️40

The project is a simple sentiment analysis using NLP. The project in written in python with Jupyter notebook. It shows how to do text preprocessing (removing of bad words, stop words, lemmatization, tokenization). It further shows how to save a trained model, and use the model in a real life suitation. The machine learning model used here is k-Nearest Neighbor which is used to build the model. Various performance evaluation techniques are used, and they include confusion matrix, and Scikit-learn libraries classification report which give the accuracy, precision, recall and f1- score preformance of the model. The target values been classified are positive and negative review.

Jupyter Notebook

Updated 2 months ago

data-sciencejupyter-notebookknn-classification+10

maleo

jakartaresearch

❤️25

Wrapper library for text cleansing, preprocessing in NLP

MIT

Python

Updated 2 years ago

indonesian-languagemachine-learningnlp+1

Sentiment_Analysis_with_Insights

Jai-Agarwal-04

❤️35

Sentiment Analysis with Insights using NLP and Dash This project show the sentiment analysis of text data using NLP and Dash. I used Amazon reviews dataset to train the model and further scrap the reviews from Etsy.com in order to test my model. Prerequisites: Python3 Amazon Dataset (3.6GB) Anaconda How this project was made? This project has been built using Python3 to help predict the sentiments with the help of Machine Learning and an interactive dashboard to test reviews. To start, I downloaded the dataset and extracted the JSON file. Next, I took out a portion of 7,92,000 reviews equally distributed into chunks of 24000 reviews using pandas. The chunks were then combined into a single CSV file called balanced_reviews.csv. This balanced_reviews.csv served as the base for training my model which was filtered on the basis of review greater than 3 and less than 3. Further, this filtered data was vectorized using TF_IDF vectorizer. After training the model to a 90% accuracy, the reviews were scrapped from Etsy.com in order to test our model. Finally, I built a dashboard in which we can check the sentiments based on input given by the user or can check the sentiments of reviews scrapped from the website. What is CountVectorizer? CountVectorizer is a great tool provided by the scikit-learn library in Python. It is used to transform a given text into a vector on the basis of the frequency (count) of each word that occurs in the entire text. This is helpful when we have multiple such texts, and we wish to convert each word in each text into vectors (for using in further text analysis). CountVectorizer creates a matrix in which each unique word is represented by a column of the matrix, and each text sample from the document is a row in the matrix. The value of each cell is nothing but the count of the word in that particular text sample. What is TF-IDF Vectorizer? TF-IDF stands for Term Frequency - Inverse Document Frequency and is a statistic that aims to better define how important a word is for a document, while also taking into account the relation to other documents from the same corpus. This is performed by looking at how many times a word appears into a document while also paying attention to how many times the same word appears in other documents in the corpus. The rationale behind this is the following: a word that frequently appears in a document has more relevancy for that document, meaning that there is higher probability that the document is about or in relation to that specific word a word that frequently appears in more documents may prevent us from finding the right document in a collection; the word is relevant either for all documents or for none. Either way, it will not help us filter out a single document or a small subset of documents from the whole set. So then TF-IDF is a score which is applied to every word in every document in our dataset. And for every word, the TF-IDF value increases with every appearance of the word in a document, but is gradually decreased with every appearance in other documents. What is Plotly Dash? Dash is a productive Python framework for building web analytic applications. Written on top of Flask, Plotly.js, and React.js, Dash is ideal for building data visualization apps with highly custom user interfaces in pure Python. It's particularly suited for anyone who works with data in Python. Dash apps are rendered in the web browser. You can deploy your apps to servers and then share them through URLs. Since Dash apps are viewed in the web browser, Dash is inherently cross-platform and mobile ready. Dash is an open source library, released under the permissive MIT license. Plotly develops Dash and offers a platform for managing Dash apps in an enterprise environment. What is Web Scrapping? Web scraping is a term used to describe the use of a program or algorithm to extract and process large amounts of data from the web. Running the project Step 1: Download the dataset and extract the JSON data in your project folder. Make a folder filtered_chunks and run the data_extraction.py file. This will extract data from the JSON file into equal sized chunks and then combine them into a single CSV file called balanced_reviews.csv. Step 2: Run the data_cleaning_preprocessing_and_vectorizing.py file. This will clean and filter out the data. Next the filtered data will be fed to the TF-IDF Vectorizer and then the model will be pickled in a trained_model.pkl file and the Vocabulary of the trained model will be stored as vocab.pkl. Keep these two files in a folder named model_files. Step 3: Now run the etsy_review_scrapper.py file. Adjust the range of pages and product to be scrapped as it might take a long long time to process. A small sized data is sufficient to check the accuracy of our model. The scrapped data will be stored in csv as well as db file. Step 4: Finally, run the app.py file that will start up the Dash server and we can check the working of our model either by typing or either by selecting the preloaded scrapped reviews.

Python

Updated 6 months ago

textcl

alinapetukhova

❤️40

Text preprocessing package for use in NLP tasks https://pypi.org/project/textcl/

MIT

Python

Updated 1 year ago

nlpoutlier-detectiontext-cleaning+1

Python-NLP-Fundamentals

dlab-berkeley

❤️21

D-Lab's introduction to NLP in Python. Learn how to preprocess text data, apply bag-of-words methods, engage with word embeddings, and more, using Python.

CC-BY-4.0

Jupyter Notebook

Updated 3 months ago

LughaatNLP

MuhammadNoman76

🧡60

LughaatNLP: First Urdu language preprocessing library in Pakistan. Tokenization, lemmatization, stop word removal, and normalization for Urdu text. Join us to advance Urdu NLP! #OpenSource #UrduLanguage

MIT

Updated 1 week ago

Twitter-US-Airline-Sentiment-Analysis

swap-253

❤️40

In this repository I have utilised 6 different NLP Models to predict the sentiments of the user as per the twitter reviews on airline. The dataset is Twitter US Airline Sentiment. The best models each from ML and DL have been deployed. It employs text preprocessing,

GPL-3.0

Jupyter Notebook

Updated 1 year ago

airline-sentimentartificial-neural-networksbidirectional-lstm+14

nepalikit

prabhashj07

❤️25

NepaliKit is a Python library for natural language processing (NLP) tasks in Nepali. It features tokenization (rule-based and SentencePiece), text preprocessing, stopword management, and sentence segmentation. Ideal for developers and researchers working with Nepali text data.

Python

Updated 10 months ago

nepalinlp-librarynlp-machine-learning+4

TSNE-on-Amazon-Fine-Food-reviews-Dataset

RohithM191

❤️35

Amazon-Food-Reviews-Analysis-and-Modelling Using Various Machine Learning Models Performed Exploratory Data Analysis, Data Cleaning, Data Visualization and Text Featurization(BOW, tfidf, Word2Vec). Build several ML models like KNN, Naive Bayes, Logistic Regression, SVM, Random Forest, GBDT, LSTM(RNNs) etc. Objective: Given a text review, determine the sentiment of the review whether its positive or negative. Data Source: https://www.kaggle.com/snap/amazon-fine-food-reviews About Dataset The Amazon Fine Food Reviews dataset consists of reviews of fine foods from Amazon. Number of reviews: 568,454 Number of users: 256,059 Number of products: 74,258 Timespan: Oct 1999 - Oct 2012 Number of Attributes/Columns in data: 10 Attribute Information: Id ProductId - unique identifier for the product UserId - unqiue identifier for the user ProfileName HelpfulnessNumerator - number of users who found the review helpful HelpfulnessDenominator - number of users who indicated whether they found the review helpful or not Score - rating between 1 and 5 Time - timestamp for the review Summary - brief summary of the review Text - text of the review 1 Amazon Food Reviews EDA, NLP, Text Preprocessing and Visualization using TSNE Defined Problem Statement Performed Exploratory Data Analysis(EDA) on Amazon Fine Food Reviews Dataset plotted Word Clouds, Distplots, Histograms, etc. Performed Data Cleaning & Data Preprocessing by removing unneccesary and duplicates rows and for text reviews removed html tags, punctuations, Stopwords and Stemmed the words using Porter Stemmer Documented the concepts clearly Plotted TSNE plots for Different Featurization of Data viz. BOW(uni-gram), tfidf, Avg-Word2Vec and tf-idf-Word2Vec 2 KNN Applied K-Nearest Neighbour on Different Featurization of Data viz. BOW(uni-gram), tfidf, Avg-Word2Vec and tf-idf-Word2Vec Used both brute & kd-tree implementation of KNN Evaluated the test data on various performance metrics like accuracy also plotted Confusion matrix using seaborne Conclusions: KNN is a very slow Algorithm takes very long time to train. Best Accuracy is achieved by Avg Word2Vec Featurization which is of 89.38%. Both kd-tree and brute algorithms of KNN gives comparatively similar results. Overall KNN was not that good for this dataset. 3 Naive Bayes Applied Naive Bayes using Bernoulli NB and Multinomial NB on Different Featurization of Data viz. BOW(uni-gram), tfidf. Evaluated the test data on various performance metrics like accuracy, f1-score, precision, recall,etc. also plotted Confusion matrix using seaborne Printed Top 25 Important Features for both Negative and Positive Reviews Conclusions: Naive Bayes is much faster algorithm than KNN The performance of bernoulli naive bayes is way much more better than multinomial naive bayes. Best F1 score is acheived by BOW featurization which is 0.9342 4 Logistic Regression Applied Logistic Regression on Different Featurization of Data viz. BOW(uni-gram), tfidf, Avg-Word2Vec and tf-idf-Word2Vec Used both Grid Search & Randomized Search Cross Validation Evaluated the test data on various performance metrics like accuracy, f1-score, precision, recall,etc. also plotted Confusion matrix using seaborne Showed How Sparsity increases as we increase lambda or decrease C when L1 Regularizer is used for each featurization Did pertubation test to check whether the features are multi-collinear or not Conclusions: Sparsity increases as we decrease C (increase lambda) when we use L1 Regularizer for regularization. TF_IDF Featurization performs best with F1_score of 0.967 and Accuracy of 91.39. Features are multi-collinear with different featurization. Logistic Regression is faster algorithm. 5 SVM Applied SVM with rbf(radial basis function) kernel on Different Featurization of Data viz. BOW(uni-gram), tfidf, Avg-Word2Vec and tf-idf-Word2Vec Used both Grid Search & Randomized Search Cross Validation Evaluated the test data on various performance metrics like accuracy, f1-score, precision, recall,etc. also plotted Confusion matrix using seaborne Evaluated SGDClassifier on the best resulting featurization Conclusions: BOW Featurization with linear kernel with grid search gave the best results with F1-score of 0.9201. Using SGDClasiifier takes very less time to train. 6 Decision Trees Applied Decision Trees on Different Featurization of Data viz. BOW(uni-gram), tfidf, Avg-Word2Vec and tf-idf-Word2Vec Used both Grid Search with random 30 points for getting the best max_depth Evaluated the test data on various performance metrics like accuracy, f1-score, precision, recall,etc. also plotted Confusion matrix using seaborne Plotted feature importance recieved from the decision tree classifier Conclusions: BOW Featurization(max_depth=8) gave the best results with accuracy of 85.8% and F1-score of 0.858. Decision Trees on BOW and tfidf would have taken forever if had taken all the dimensions as it had huge dimension and hence tried with max 8 as max_depth 6 Ensembles(RF&GBDT) Applied Random Forest on Different Featurization of Data viz. BOW(uni-gram), tfidf, Avg-Word2Vec and tf-idf-Word2Vec Used both Grid Search with random 30 points for getting the best max_depth, learning rate and n_estimators. Evaluated the test data on various performance metrics like accuracy, f1-score, precision, recall,etc. also plotted Confusion matrix using seaborne Plotted world cloud of feature importance recieved from the RF and GBDT classifier Conclusions: TFIDF Featurization in Random Forest (BASE-LEARNERS=10) with random search gave the best results with F1-score of 0.857. TFIDF Featurization in GBDT (BASE-LEARNERS=275, DEPTH=10) gave the best results with F1-score of 0.8708.

Jupyter Notebook

Updated 3 years ago

LLM-Cancer-Classification-using-Pathology-text

MarySuneela

🧡55

This project integrates large language models with pathology text data to improve cancer classification. Using NLP techniques such as text preprocessing, Named Entity Recognition (NER), and feature engineering, we enhance clinical decision-making. Machine learning models are trained to improve accuracy and predictions in medical reports.

Jupyter Notebook

Updated 2 weeks ago

twitter-sentiment-analysis-using-python

shubhampadole68

❤️35

Millions of people are using Twitter and expressing their emotions like happiness, sadness, angry, etc. The Sentiment analysis is also about detecting the emotions, opinion, assessment, attitudes, and took this into consideration as a way humans think. Sentiment analysis classifies the emotions into classes such as positive or negative. Nowadays, industries are interested to use textual data for semantic analysis to extract the view of people about their products and services. Sentiment analysis is very important for them to know the customer satisfaction level and they can improve their services accordingly. To work on the text data, they try to extract the data from social media platforms. There are a lot of social media sites like Google Plus, Facebook, and Twitter that allow expressing opinions, views, and emotions about certain topics and events. Microblogging site Twitter is expanding rapidly among all other online social media networking sites with about 200 million users. Twitter was founded in 2006 and currently, it is the most famous microblogging platform. In 2017 2 million users shared 8.3 million tweets in one hour. Twitter users use to post their thoughts, emotions, and messages on their profiles, called tweets. Words limit of a single tweet has 140 characters. Twitter sentiment analysis based on the NLP (natural language processing) field. For tweets text, we use NLP techniques like tokenizing the words, removing the stop words like I, me, my, our, your, is, was, etc. Natural language processing also plays a part to preprocess the data like cleaning the text and removing the special characters and punctuation marks. Sentimental analysis is very important because we can know the trends of people’s emotions on specific topics with their tweets.

Jupyter Notebook

Updated 2 years ago

Text_preprocessing_steps_for_NLP

sharadpatell

❤️35

Text preprocessing techniques used in NLP

Jupyter Notebook

Updated 1 year ago

textfab

Astromis

❤️35

Tiny library for text preprocessing in NLP

MIT

Python

Updated 1 month ago

depression-detection

ardaoezsap

❤️40

This project utilizes Natural Language Processing (NLP) and Machine Learning to identify signs of depression in Reddit posts. It applies text preprocessing techniques, TF-IDF vectorization, and a Random Forest classifier for detection. The model is evaluated based on accuracy, precision, recall, and F1-score.

MIT

Jupyter Notebook

Updated 1 year ago

Simple_chatbot

MazenAziz1

❤️35

This project is all about creating a friendly chatbot that answers huge number of questions covering several health topics using cosine similarity. This journey would help you a lot to look more closely at some phases in NLP pipeline while working on most of text classification projects like Text preprocessing and representation and Model training

Jupyter Notebook

Updated 1 year ago

AI-Chatbot-For-Tiruchendur-Temple--using-Python

sabari-js

❤️35

It is a Retrieval based Chatbot using to know about Tiruchendur temple. Natural Language Processing (NLP) and Long Short Term Memory (LSTM) is used. NLP is used to Data preprocess method to clean the text like remove words & stemming. LSTM used to text classification. In this project to train the model using NLP & LSTM. It find out the accuracy and predict the queries. It is developed in python

Python

Updated 9 months ago

NLP_Text_Summarization

Nafisur21

❤️35

Basic preprocessing task in NLP and ruled based auto text summarization

Jupyter Notebook

Updated 6 years ago

Text-Preprocessing-in-NLP

abdullahzunorain

❤️35

Text Preprocessing in NLP

Jupyter Notebook

Updated 1 year ago

TextHero-Text-Preprocessing-in-NLP-

573-pankaj

❤️25

No description available

Jupyter Notebook

Updated 3 years ago

Urdu-Text-Preprocessing

MD-Ryhan

❤️35

This repository contains code for Urdu Text preprocessing natural language data for use in NLP applications.

Jupyter Notebook

Updated 1 year ago

counterlemmatizationnlp+11

Smartphone-Price-Prediction-Using-NLP-Approaches

ovaheb

❤️35

In this project, price of smartphones were predicted using the text description of advertisement about that phone utilizing regression and NLP preprocessing.

Python

Updated 1 year ago

regex_clean_data

vishalbpatil1

❤️35

Text preprocessing is one of the most important tasks in Natural Language Processing (NLP). For instance, you may want to remove all punctuation marks from text documents before they can be used for text classification. Similarly, you may want to extract numbers from a text string. Writing manual scripts for such preprocessing tasks requires a lot of effort and is prone to errors. Keeping in view the importance of these preprocessing tasks, the Regular Expressions (aka Regex) have been developed in different languages in order to ease these text preprocessing tasks.

Jupyter Notebook

Updated 3 years ago

Drug-Review-Sentiment-Analysis

suhaibmukhtar

❤️40

Sentiment analysis in NLP uses deep learning to categorize text sentiment into positive, negative, or neutral classes. It involves data preprocessing, representation with embeddings, diverse models (LSTMs, GRUs, transformers), and rigorous evaluation, enhancing our understanding of opinions in text.

BSD-2-Clause

Jupyter Notebook

Updated 1 year ago

LANGUAGE-DETECTION-MODEL

sujalthapa369

❤️45

Language detection model using NLP to predict the language of a given text with TF‑IDF features and a Logistic Regression classifier, including basic preprocessing and evaluation in Jupyter notebooks.

Jupyter Notebook

Updated 1 month ago

TFIDF-Caclulation-NLP

dhamotharnaf

❤️35

This Repository contains all the code that you need to learn to get started with TFIDF with NLP, in addition to that I've also included cosine similarity. This also contains text preprocessing

Python

Updated 2 years ago

Fake-News-Prediction

KHATEEB-ARMAN

❤️45

This project, completed in May 2025, classifies news articles as fake or true using machine learning and NLP. Built with Python, it employs text preprocessing (tokenization, TF-IDF) and a Logistic Regression model to achieve 85%+ accuracy. The dataset is sourced from Kaggle, with scripts for preprocessing, training, evaluation, and prediction.

Updated 2 months ago

Mini_Chatbot

navneetjaguri

❤️40

Mini Chatbot - Offline Knowledge Base A lightweight, offline chatbot that answers questions from local text files using TF-IDF vectorization and cosine similarity. Built in Python with scikit-learn, this project demonstrates information retrieval, NLP preprocessing,

MIT

Python

Updated 3 months ago

GitHub Explorer

Search Results

NLP-Analysis-of-Patient-Journals-for-Anxiety-Trends

SMSTextSpamPrediction

policy-data-analyzer

Sentiment-Analysis-NLP-with-Python

maleo

Sentiment_Analysis_with_Insights

textcl

Python-NLP-Fundamentals

LughaatNLP

Twitter-US-Airline-Sentiment-Analysis

nepalikit

TSNE-on-Amazon-Fine-Food-reviews-Dataset

LLM-Cancer-Classification-using-Pathology-text

twitter-sentiment-analysis-using-python

Text_preprocessing_steps_for_NLP

textfab

depression-detection

Simple_chatbot

AI-Chatbot-For-Tiruchendur-Temple--using-Python

NLP_Text_Summarization

Text-Preprocessing-in-NLP

TextHero-Text-Preprocessing-in-NLP-

Urdu-Text-Preprocessing

Smartphone-Price-Prediction-Using-NLP-Approaches

regex_clean_data

Drug-Review-Sentiment-Analysis

LANGUAGE-DETECTION-MODEL

TFIDF-Caclulation-NLP

Fake-News-Prediction

Mini_Chatbot

NLP-Analysis-of-Patient-Journals-for-Anxiety-Trends

SMSTextSpamPrediction

policy-data-analyzer

Sentiment-Analysis-NLP-with-Python

maleo

Sentiment_Analysis_with_Insights

textcl

Python-NLP-Fundamentals

LughaatNLP

Twitter-US-Airline-Sentiment-Analysis

nepalikit

TSNE-on-Amazon-Fine-Food-reviews-Dataset

LLM-Cancer-Classification-using-Pathology-text

twitter-sentiment-analysis-using-python

Text_preprocessing_steps_for_NLP

textfab

depression-detection

Simple_chatbot

AI-Chatbot-For-Tiruchendur-Temple--using-Python

NLP_Text_Summarization

Text-Preprocessing-in-NLP

TextHero-Text-Preprocessing-in-NLP-

Urdu-Text-Preprocessing

Smartphone-Price-Prediction-Using-NLP-Approaches

regex_clean_data

Drug-Review-Sentiment-Analysis

LANGUAGE-DETECTION-MODEL

TFIDF-Caclulation-NLP

Fake-News-Prediction

Mini_Chatbot