Found 46 repositories(showing 30)
EmmanuelOchieng01
A machine learning credit scoring system for Kenyan SMEs using explainable AI. Features transaction data simulation, risk prediction models, SHAP interpretability, and ready for API deployment. Built for financial inclusion in emerging markets.
An end-to-end MLOps pipeline to predict loan defaults and credit risk using XGBoost, deployed on AWS (SageMaker, Lambda, Elastic Beanstalk). Automates data ingestion, preprocessing, model training, and real-time predictions with a scalable API.
mohdareeb0x-commits
Machine learning–powered credit risk prediction API with FastAPI and REST endpoints. Supports single and batch applicant analysis, risk scoring, and default probability estimation. Fully Dockerized for easy deployment and integration into financial systems.
steellsas
End-to-end credit default prediction: EDA → Feature Engineering (580 features) → LightGBM + XGBoost Ensemble (ROC AUC 0.785) → FastAPI → Docker → GCP deployment. Live demo available.
krish-mirpuri
This project builds an end-to-end credit risk prediction system that estimates the probability of loan default using machine learning and deploys the model via an API for real-time decision-making.
Kashan-Baig
Credit Risk Classification – Built Random Forest model to predict loan risk deployed using Flask API with web form UI. Laptop Price Prediction – Developed regression model with data preprocessing and feature engineering; deployed via Streamlit. California Housing Prices – Trained regression models (Ridge, RF) with log-scaling and model evaluation
savi09
# LendingClub-ML ## Supervised Machine Learning Homework - Predicting Credit Risk In this assignment, you will be building a machine learning model that attempts to predict whether a loan from LendingClub will become high risk or not. ## Background LendingClub is a peer-to-peer lending services company that allows individual investors to partially fund personal loans as well as buy and sell notes backing the loans on a secondary market. LendingClub offers their previous data through an API. You will be using this data to create machine learning models to classify the risk level of given loans. Specifically, you will be comparing the Logistic Regression model and Random Forest Classifier. ## Instructions ## Retrieve the data In the Generator folder in Resources, there is a GenerateData.ipynb notebook that will download data from LendingClub and output two CSVs: #### 2019loans.csv #### 2020Q1loans.csv You will be using an entire year's worth of data (2019) to predict the credit risk of loans from the first quarter of the next year (2020). Note: these two CSVs have been undersampled to give an even number of high risk and low risk loans. In the original dataset, only 2.2% of loans are categorized as high risk. To get a truly accurate model, special techniques need to be used on imbalanced data. Undersampling is one of those techniques. Oversampling and SMOTE (Synthetic Minority Over-sampling Technique) are other techniques that are also used. ## Preprocessing: Convert categorical data to numeric Create a training set from the 2019 loans using pd.get_dummies() to convert the categorical data to numeric columns. Similarly, create a testing set from the 2020 loans, also using pd.get_dummies(). Note! There are categories in the 2019 loans that do not exist in the testing set. If you fit a model to the training set and try to score it on the testing set as is, you will get an error. You need to use code to fill in the missing categories in the testing set. ## Consider the models You will be creating and comparing two models on this data: a logistic regression, and a random forests classifier. Before you create, fit, and score the models, make a prediction as to which model you think will perform better. You do not need to be correct! Write down (in markdown cells in your Jupyter Notebook or in a separate document) your prediction, and provide justification for your educated guess. ## Fit a LogisticRegression model and RandomForestClassifier model Create a LogisticRegression model, fit it to the data, and print the model's score. Do the same for a RandomForestClassifier. You may choose any starting hyperparameters you like. Which model performed better? How does that compare to your prediction? Write down your results and thoughts. ## Revisit the Preprocessing: Scale the data The data going into these models was never scaled, an important step in preprocessing. Use StandardScaler to scale the training and testing sets. Before re-fitting the LogisticRegression and RandomForestClassifier models on the scaled data, make another prediction about how you think scaling will affect the accuracy of the models. Write your predictions down and provide justification. Fit and score the LogisticRegression and RandomForestClassifier models on the scaled data. How do the model scores compare to each other, and to the previous results on unscaled data? How does this compare to your prediction? Write down your results and thoughts.
jlira5418
In this assignment, you will be building a machine learning model that attempts to predict whether a loan from LendingClub will become high risk or not. ## Background LendingClub is a peer-to-peer lending services company that allows individual investors to partially fund personal loans as well as buy and sell notes backing the loans on a secondary market. LendingClub offers their previous data through an API. You will be using this data to create machine learning models to classify the risk level of given loans. Specifically, you will be comparing the Logistic Regression model and Random Forest Classifier. ## Instructions ### Retrieve the data In the `Generator` folder in `Resources`, there is a [GenerateData.ipynb](/Resources/Generator/GenerateData.ipynb) notebook that will download data from LendingClub and output two CSVs: * `2019loans.csv` * `2020Q1loans.csv` You will be using an entire year's worth of data (2019) to predict the credit risk of loans from the first quarter of the next year (2020). Note: these two CSVs have been undersampled to give an even number of high risk and low risk loans. In the original dataset, only 2.2% of loans are categorized as high risk. To get a truly accurate model, special techniques need to be used on imbalanced data. Undersampling is one of those techniques. Oversampling and SMOTE (Synthetic Minority Over-sampling Technique) are other techniques that are also used. ## Preprocessing: Convert categorical data to numeric Create a training set from the 2019 loans using `pd.get_dummies()` to convert the categorical data to numeric columns. Similarly, create a testing set from the 2020 loans, also using `pd.get_dummies()`. Note! There are categories in the 2019 loans that do not exist in the testing set. If you fit a model to the training set and try to score it on the testing set as is, you will get an error. You need to use code to fill in the missing categories in the testing set. ## Consider the models You will be creating and comparing two models on this data: a logistic regression, and a random forests classifier. Before you create, fit, and score the models, make a prediction as to which model you think will perform better. You do not need to be correct! Write down (in markdown cells in your Jupyter Notebook or in a separate document) your prediction, and provide justification for your educated guess. ## Fit a LogisticRegression model and RandomForestClassifier model Create a LogisticRegression model, fit it to the data, and print the model's score. Do the same for a RandomForestClassifier. You may choose any starting hyperparameters you like. Which model performed better? How does that compare to your prediction? Write down your results and thoughts. ## Revisit the Preprocessing: Scale the data The data going into these models was never scaled, an important step in preprocessing. Use `StandardScaler` to scale the training and testing sets. Before re-fitting the LogisticRegression and RandomForestClassifier models on the scaled data, make another prediction about how you think scaling will affect the accuracy of the models. Write your predictions down and provide justification. Fit and score the LogisticRegression and RandomForestClassifier models on the scaled data. How do the model scores compare to each other, and to the previous results on unscaled data? How does this compare to your prediction? Write down your results and thoughts.
robinpats182
No description available
raghav-kh
No description available
SVChaithanya
No description available
HIIAYUSHI
Machine Learning API for predicting credit risk using Scikit-learn
No description available
No description available
yuqi-luo-nus
Credit risk prediction model with machine learning and API deployment
MatheusNRusso
Spring Boot API + FastAPI ML service for credit risk prediction
No description available
danielsnhr
Credit Risk Prediction API using CatBoost and feature engineering for loan default assessment.
LucasCunha00
Machine learning project for credit risk prediction using real financial data, with a FastAPI-based prediction API.
Saurav-VK
Production-ready credit risk prediction API using FastAPI, Scikit-learn, and Docker for real-time loan risk classification.
roshku239
Credit‑risk data pipeline and prediction API built with scalable, fintech‑grade engineering practices
KenzieJunaidi
A testing project for a credit risk prediction API built with FastAPI and Docker.
divya-vj
XGBoost credit risk model served as REST API with FastAPI — prediction + SHAP explanation as JSON
zero-hacker
End-to-end machine learning pipeline to predict credit default risk using XGBoost & Flask API for real-time predictions
srushtishingri
• Developed a credit risk prediction system in Java to estimate loan default probability. • Deployed prediction engine as REST APIs using Spring Boot for real-time financial decision support.
sevenkushal
Developed a Logistic Regression-based credit risk model achieving an 80% prediction accuracy for customer loan default probability. Incorporated OpenAI API to generate concise risk reports based on individual outcomes.
elayemu
A comprehensive credit scoring model for Bati Bank to predict credit risk and serve real-time predictions via a REST API. The project includes data analysis, machine learning pipeline, and model serving.
ganeshchitlapally
Stacked ensemble of LightGBM/XGBoost/CatBoost models with Optuna tuning and SHAP interpretability, providing churn/credit risk predictions via a FastAPI scoring API and Dockerized deployment.
austinLorenzMccoy
This project is a state-of-the-art Machine Learning API designed to assist financial institutions with credit risk assessment. It leverages advanced neural networks for: Credit Card Default Prediction: Assess the probability of a customer defaulting on their credit card payments. Credit Limit Recommendation: Estimate an optimal credit limit based
yadavanujkumar
A comprehensive data engineering pipeline for real-time loan default risk prediction and credit analysis. This system ingests live loan application and transaction data from fintech APIs, processes it using streaming and batch ETL, and delivers model-ready data for default risk prediction.