Found 19 repositories(showing 19)
rohitk140797k
Problem Statement Amazon is an online shopping website that now caters to millions of people everywhere. Over 34,000 consumer reviews for Amazon brand products like Kindle, Fire TV Stick and more are provided. The dataset has attributes like brand, categories, primary categories, reviews.title, reviews.text, and the sentiment. Sentiment is a categorical variable with three levels "Positive", "Negative“, and "Neutral". For a given unseen data, the sentiment needs to be predicted. You are required to predict Sentiment or Satisfaction of a purchase based on multiple features and review text. picture Dataset Snapshot picture Project Task: Week 1 Class Imbalance Problem: Perform an EDA on the dataset. a) See what a positive, negative, and neutral review looks like. b) Check the class count for each class. It’s a class imbalance problem. Convert the reviews in Tf-Idf score. Run multinomial Naive Bayes classifier. Everything will be classified as positive because of the class imbalance. Project Task: Week 2 Tackling Class Imbalance Problem: Oversampling or undersampling can be used to tackle the class imbalance problem. In case of class imbalance criteria, use the following metrices for evaluating model performance: precision, recall, F1-score, AUC-ROC curve. Use F1-Score as the evaluation criteria for this project. Use Tree-based classifiers like Random Forest and XGBoost. Note: Tree-based classifiers work on two ideologies namely, Bagging or Boosting and have fine-tuning parameter which takes care of the imbalanced class. Project Task: Week 3 Model Selection: Apply multi-class SVM’s and neural nets. Use possible ensemble techniques like: XGboost + oversampled_multinomial_NB. Assign a score to the sentence sentiment (engineer a feature called sentiment score). Use this engineered feature in the model and check for improvements. Draw insights on the same. Project Task: Week 4 Applying LSTM: Use LSTM for the previous problem (use parameters of LSTM like top-word, embedding-length, Dropout, epochs, number of layers, etc.) Hint: Another variation of LSTM, GRU (Gated Recurrent Units) can be tried as well. Compare the accuracy of neural nets with traditional ML based algorithms. Find the best setting of LSTM (Neural Net) and GRU that can best classify the reviews as positive, negative, and neutral. Hint: Use techniques like Grid Search, Cross-Validation and Random Search Optional Tasks: Week 4 Topic Modelling: Cluster similar reviews. Note: Some reviews may talk about the device as a gift-option. Other reviews may be about product looks and some may highlight about its battery and performance. Try naming the clusters. Perform Topic Modelling Hint: Use scikit-learn provided Latent Dirchlette Allocation (LDA) and Non-Negative Matrix Factorization (NMF).
RohithM191
Amazon-Food-Reviews-Analysis-and-Modelling Using Various Machine Learning Models Performed Exploratory Data Analysis, Data Cleaning, Data Visualization and Text Featurization(BOW, tfidf, Word2Vec). Build several ML models like KNN, Naive Bayes, Logistic Regression, SVM, Random Forest, GBDT, LSTM(RNNs) etc. Objective: Given a text review, determine the sentiment of the review whether its positive or negative. Data Source: https://www.kaggle.com/snap/amazon-fine-food-reviews About Dataset The Amazon Fine Food Reviews dataset consists of reviews of fine foods from Amazon. Number of reviews: 568,454 Number of users: 256,059 Number of products: 74,258 Timespan: Oct 1999 - Oct 2012 Number of Attributes/Columns in data: 10 Attribute Information: Id ProductId - unique identifier for the product UserId - unqiue identifier for the user ProfileName HelpfulnessNumerator - number of users who found the review helpful HelpfulnessDenominator - number of users who indicated whether they found the review helpful or not Score - rating between 1 and 5 Time - timestamp for the review Summary - brief summary of the review Text - text of the review 1 Amazon Food Reviews EDA, NLP, Text Preprocessing and Visualization using TSNE Defined Problem Statement Performed Exploratory Data Analysis(EDA) on Amazon Fine Food Reviews Dataset plotted Word Clouds, Distplots, Histograms, etc. Performed Data Cleaning & Data Preprocessing by removing unneccesary and duplicates rows and for text reviews removed html tags, punctuations, Stopwords and Stemmed the words using Porter Stemmer Documented the concepts clearly Plotted TSNE plots for Different Featurization of Data viz. BOW(uni-gram), tfidf, Avg-Word2Vec and tf-idf-Word2Vec 2 KNN Applied K-Nearest Neighbour on Different Featurization of Data viz. BOW(uni-gram), tfidf, Avg-Word2Vec and tf-idf-Word2Vec Used both brute & kd-tree implementation of KNN Evaluated the test data on various performance metrics like accuracy also plotted Confusion matrix using seaborne Conclusions: KNN is a very slow Algorithm takes very long time to train. Best Accuracy is achieved by Avg Word2Vec Featurization which is of 89.38%. Both kd-tree and brute algorithms of KNN gives comparatively similar results. Overall KNN was not that good for this dataset. 3 Naive Bayes Applied Naive Bayes using Bernoulli NB and Multinomial NB on Different Featurization of Data viz. BOW(uni-gram), tfidf. Evaluated the test data on various performance metrics like accuracy, f1-score, precision, recall,etc. also plotted Confusion matrix using seaborne Printed Top 25 Important Features for both Negative and Positive Reviews Conclusions: Naive Bayes is much faster algorithm than KNN The performance of bernoulli naive bayes is way much more better than multinomial naive bayes. Best F1 score is acheived by BOW featurization which is 0.9342 4 Logistic Regression Applied Logistic Regression on Different Featurization of Data viz. BOW(uni-gram), tfidf, Avg-Word2Vec and tf-idf-Word2Vec Used both Grid Search & Randomized Search Cross Validation Evaluated the test data on various performance metrics like accuracy, f1-score, precision, recall,etc. also plotted Confusion matrix using seaborne Showed How Sparsity increases as we increase lambda or decrease C when L1 Regularizer is used for each featurization Did pertubation test to check whether the features are multi-collinear or not Conclusions: Sparsity increases as we decrease C (increase lambda) when we use L1 Regularizer for regularization. TF_IDF Featurization performs best with F1_score of 0.967 and Accuracy of 91.39. Features are multi-collinear with different featurization. Logistic Regression is faster algorithm. 5 SVM Applied SVM with rbf(radial basis function) kernel on Different Featurization of Data viz. BOW(uni-gram), tfidf, Avg-Word2Vec and tf-idf-Word2Vec Used both Grid Search & Randomized Search Cross Validation Evaluated the test data on various performance metrics like accuracy, f1-score, precision, recall,etc. also plotted Confusion matrix using seaborne Evaluated SGDClassifier on the best resulting featurization Conclusions: BOW Featurization with linear kernel with grid search gave the best results with F1-score of 0.9201. Using SGDClasiifier takes very less time to train. 6 Decision Trees Applied Decision Trees on Different Featurization of Data viz. BOW(uni-gram), tfidf, Avg-Word2Vec and tf-idf-Word2Vec Used both Grid Search with random 30 points for getting the best max_depth Evaluated the test data on various performance metrics like accuracy, f1-score, precision, recall,etc. also plotted Confusion matrix using seaborne Plotted feature importance recieved from the decision tree classifier Conclusions: BOW Featurization(max_depth=8) gave the best results with accuracy of 85.8% and F1-score of 0.858. Decision Trees on BOW and tfidf would have taken forever if had taken all the dimensions as it had huge dimension and hence tried with max 8 as max_depth 6 Ensembles(RF&GBDT) Applied Random Forest on Different Featurization of Data viz. BOW(uni-gram), tfidf, Avg-Word2Vec and tf-idf-Word2Vec Used both Grid Search with random 30 points for getting the best max_depth, learning rate and n_estimators. Evaluated the test data on various performance metrics like accuracy, f1-score, precision, recall,etc. also plotted Confusion matrix using seaborne Plotted world cloud of feature importance recieved from the RF and GBDT classifier Conclusions: TFIDF Featurization in Random Forest (BASE-LEARNERS=10) with random search gave the best results with F1-score of 0.857. TFIDF Featurization in GBDT (BASE-LEARNERS=275, DEPTH=10) gave the best results with F1-score of 0.8708.
ishikawa08
Multivariate Time Series Forecasting with LSTM in TensorFlow 2.x
Guillem96
Multivariate LSTM Fully Convolutional Networks for Time Series Classification
Project Task: Week 1 Class Imbalance Problem: 1. Perform an EDA on the dataset. a) See what a positive, negative, and neutral review looks like b) Check the class count for each class. It’s a class imbalance problem. 2. Convert the reviews in Tf-Idf score. 3. Run multinomial Naive Bayes classifier. Everything will be classified as positive because of the class imbalance. Project Task: Week 2 Tackling Class Imbalance Problem: Oversampling or undersampling can be used to tackle the class imbalance problem. In case of class imbalance criteria, use the following metrices for evaluating model performance: precision, recall, F1-score, AUC-ROC curve. Use F1-Score as the evaluation criteria for this project. Use Tree-based classifiers like Random Forest and XGBoost. Note: Tree-based classifiers work on two ideologies namely, Bagging or Boosting and have fine-tuning parameter which takes care of the imbalanced class. Project Task: Week 3 Model Selection: Apply multi-class SVM’s and neural nets. Use possible ensemble techniques like: XGboost + oversampled_multinomial_NB. Assign a score to the sentence sentiment (engineer a feature called sentiment score). Use this engineered feature in the model and check for improvements. Draw insights on the same. Project Task: Week 4 Applying LSTM: Use LSTM for the previous problem (use parameters of LSTM like top-word, embedding-length, Dropout, epochs, number of layers, etc.) Hint: Another variation of LSTM, GRU (Gated Recurrent Units) can be tried as well. 2. Compare the accuracy of neural nets with traditional ML based algorithms. 3. Find the best setting of LSTM (Neural Net) and GRU that can best classify the reviews as positive, negative, and neutral. Hint: Use techniques like Grid Search, Cross-Validation and Random Search Optional Tasks: Week 4 Topic Modeling: 1. Cluster similar reviews. Note: Some reviews may talk about the device as a gift-option. Other reviews may be about product looks and some may highlight about its battery and performance. Try naming the clusters. 2. Perform Topic Modeling Hint: Use scikit-learn provided Latent Dirchlette Allocation (LDA) and Non-Negative Matrix Factorization (NMF). Download the Data sets from here .
aszhaoweiguo
This python file is an example for Multi-RNN with LSTM using Tensorflow
A hybrid neural architecture for Language Identification (LID) covering 235 languages. Combines BiLSTM-Attention for sequential context with Char-level TF-IDF for statistical feature extraction. Achieves 93.75% accuracy. Built for scalability and precision in multilingual document processing.
Mr-0racle
Developed a multi-class classification model using NLP techniques (TF-IDF, LSTM) to predict Myers-Briggs (MBTI) personality types from unstructured text data, achieving 85% accuracy on a dataset of 8,675 samples.Developed a multi-class classification model using NLP techniques (TF-IDF, LSTM) to predict Myers-Briggs (MBTI) personality type
"Multi-class text classification on A Comparison of Word Representations and ML/NN Modelss: TF-IDF vs Word2Vec embeddings with ML and deep learning models (RandomForest, LSTM, GRU, BiLSTM)
wiqilee
A complete multi-aspect ABSA (Aspect-Based Sentiment Analysis) system combining PLSA topic modeling, TF-ICF aspect expansion, AC3 semantic similarity mapping, and GloVe-LSTM sentiment classification. Includes a full Streamlit application for analysis, training, and multi-aspect inference.
Fahmidaa-Afrin
NLP course project comparing traditional ML models and advanced RNN-based architectures (SimpleRNN, GRU, LSTM, and their bidirectional variants) for multi-class text classification using BoW, TF-IDF, GloVe, and Skip-gram embeddings.
aadishrath
A full-stack, modular NLP application that showcases foundational and advanced sentiment analysis techniques. Built with NestJS, Prisma, and React, it supports multi-model inference (TF-IDF+SVM, LSTM, DistilBERT), emoji-based feedback, and analytics logging.
Nhatnguyn1710
Vietnamese text classification project using machine learning and deep learning models. The system applies text preprocessing, TF-IDF and fastText embeddings, dimensionality reduction, and multiple classifiers (SVM, Logistic Regression, XGBoost, DNN, LSTM) to evaluate performance on multi-class Vietnamese text datasets.
krittikasardar
Sentiment analysis on the Amazon Fine Food Reviews dataset using traditional ML, LSTM, and transformer-based models. This project compares TF-IDF and Word2Vec vectorizations with various class balancing techniques to evaluate model performance on multi-class sentiment classification tasks.
– Compared word representations (BoW, TF-IDF, GloVe, Skip-gram) with ML models (Logistic Regression, Naive Bayes, Random Forest) and NN models (DNN, RNN, GRU, Bidirectional LSTM) for multi-class text classification. – Used 340,000 question-answer pairs across ten categories.
mdadnanparvez
A comparative study of machine learning and neural network models for multi-class text classification using BoW, TF-IDF, GloVe, and Skip-gram embeddings, evaluating algorithms such as Logistic Regression, Naive Bayes, Random Forest, DNN, RNN, GRU, LSTM, and their bidirectional variants.
nowshinreza
A comparative study of machine learning and neural network models for multi-class text classification using BoW, TF-IDF, GloVe, and Skip-gram embeddings, evaluating algorithms such as Logistic Regression, Naive Bayes, Random Forest, DNN, RNN, GRU, LSTM, and their bidirectional variants.
rajeshwarichandratre
A machine learning multi-label classification model to identify various types of toxic comments posted on social networking sites. Used and compared different machine learning algorithms such as SVM, KNN, XGBoost, LSTM, and NLP using TF-IDF, Glove. Achieved an average accuracy of 90%.
ameeraahmed04
Performed multi-class classification on mental-health text. Cleaned and preprocessed the data by removing punctuation marks and unnecessary noise. For Logistic Regression, used TF-IDF with n-grams, and for LSTM model, applied word embeddings to capture sequential context. Compared both models using F1-score to better understand mental-health data.
All 19 repositories loaded