Found 18 repositories(showing 18)
This project explores ML techniques across classification and regression. It includes penguin species classification, breast cancer prediction, and baseball performance prediction using regularization. After, I will develop an XGBoost model for hotel cancellation prediction, analyzing key booking factors and optimizing performance. (In Progress)
dark-data
Over the past few decades, ML techniques have been widely used in intelligent healthcare systems, especially for breast cancer (BC) diagnosis and prognosis. Traditionally the diagnostic accuracy of a patient depends on a physician’s experience. however, this expertise is built up over many years of observations of different patient’s symptoms and confirmed diagnoses. ML techniques can take over some complex manual works from the physicians. Recently, ML techniques are playing a significant role in diagnosis of BC by applying classification techniques to identify people with BC, distinguish benign from malignant tumours and to predict weather the patient is affected or not. We focus on the neural network (NN), support vector machine (SVMs) and k-nearest neighbor (k-NNs) techniques in BC diagnosis.
shraddhaghadage
Breast cancer is one of the most common cancers among women worldwide, representing the majority of new cancer cases and cancer-related deaths according to global statistics, making it a significant public health problem in today’s society. The early diagnosis of it can improve the prognosis and chance of survival significantly, as it can promote timely clinical treatment to patients. Further accurate classification of benign tumors can prevent patients undergoing unnecessary treatments. Thus, the correct diagnosis of Breast Cancer and classification of patients into malignant or benign groups is the subject of much research. Because of its unique advantages in critical features detection from complex Breast Cancer datasets, machine learning (ML) is widely recognized as the methodology of choice in Breast cancer pattern classification and forecast modelling. Classification and data mining methods are an effective way to classify data. Especially in medical field, where those methods are widely used in diagnosis and analysis to make decisions. Because we are categorizing whether the tissue is cancerous or benign, we will train multiple Tree-based models for this procedure. We’ll experiment with hyper-parameters to see if we can enhance the accuracy. Try to solve the problem using the approach outlined below. For further information on each feature, consult the data dictionary. Decision trees (DTs) form the basis of ensemble algorithms in machine learning. These are powerful algorithms that can fit complex data. In this project, our focus is on understanding the core concepts of the Decision Tree for healthcare analysis, followed by understanding the different ensemble techniques.
Sudip-Pandit
Description of the Project: + The "Breast Cancer Dataset" is used in this project. It has df.shape=(569, 31) which means 569 rows and 32 columns. + The link of the datset used in this project is -https://www.kaggle.com/uciml/breast-cancer-wisconsin-data + I am importing the important python packages- skelarn, pandas, numpy, seaborn and matplotlib to complete the project. + The machine learning models such as Logistic Regression, Decision Tree, Random Forest, XGBoost, AdaBoost and Gradient Boosting classifier have been used. + The performance of the machine learnig models have been tested on the basis of accuracy score, confusion matrix, classification report, f1 score and roc auc score. + I had tuned hyperparameters to improve the perforamnce for XGBoost model + The good visualization is also important along with accuracy score in model building. The performance of the model have been visualized in this project. Problem statement: The full form of XGBoost is eXtreme Gradient Boosting, also called winner for several kaggle competetion machine learning model. Most of the literatues of Machine Learning found in google has described this model as having best accuracy, efficient and feasibility. It is a decision-tree-based ensemble ML algorithm based on gradient boosting framework. It is considered that XGBoost provides a convenient way of cross-validation. Cross-validation technique is applied to test the model's overfitting during the training phase. If the model gives good accuracy in training dataset but the model works very poor in testing unseen dataset then it is called overfitting or a model of low bias and high variance. I have to calculate the model training and testing errors with different learning rates.As we know that the best technique to choose the learning rate value is between 0 and 1. I will be going to start the test by putting the learning rate as 0.01. It would easy to see the results through good visualization. I am also going to visualize the training and testing errors and accuracies by making a graph. Finally, I will tune the hyperparameters which helps us predict the testing datasets i.e. x_test.
Vishesh29
Presenting an early breast cancer prognosis by using a classification approach with different ML techniques on the Wisconsin Breast Cancer dataset.
AmalMirza
Classification is an important data mining technique with a wide range of applications to classify the various types of data existing in almost all areas of our lives. The purpose of this discovery study can be used to estimate the potential of having breast cancer by taking advantage of anthropometric data and collected routine blood analysis parameters. The study was performed using data from patients who were admitted to the clinic with the suspicion of breast cancer. The values of Age (years), BMI (kg/m2), Glucose (mg/dL), Insulin (µU/mL), HOMA, Leptin (ng/mL), Adiponectin (µg/mL), Resistin (ng/mL), MCP-1(pg/dL) were used. In our study, classification algorithms were applied to the data and they were asked to estimate the disease diagnosis.
rajat1911996sharma
## The Data ### Breast cancer wisconsin (diagnostic) dataset -------------------------------------------- **Data Set Characteristics:** :Number of Instances: 569 :Number of Attributes: 30 numeric, predictive attributes and the class :Attribute Information: - radius (mean of distances from center to points on the perimeter) - texture (standard deviation of gray-scale values) - perimeter - area - smoothness (local variation in radius lengths) - compactness (perimeter^2 / area - 1.0) - concavity (severity of concave portions of the contour) - concave points (number of concave portions of the contour) - symmetry - fractal dimension ("coastline approximation" - 1) The mean, standard error, and "worst" or largest (mean of the three largest values) of these features were computed for each image, resulting in 30 features. For instance, field 3 is Mean Radius, field 13 is Radius SE, field 23 is Worst Radius. - class: - WDBC-Malignant - WDBC-Benign :Summary Statistics: ===================================== ====== ====== Min Max ===================================== ====== ====== radius (mean): 6.981 28.11 texture (mean): 9.71 39.28 perimeter (mean): 43.79 188.5 area (mean): 143.5 2501.0 smoothness (mean): 0.053 0.163 compactness (mean): 0.019 0.345 concavity (mean): 0.0 0.427 concave points (mean): 0.0 0.201 symmetry (mean): 0.106 0.304 fractal dimension (mean): 0.05 0.097 radius (standard error): 0.112 2.873 texture (standard error): 0.36 4.885 perimeter (standard error): 0.757 21.98 area (standard error): 6.802 542.2 smoothness (standard error): 0.002 0.031 compactness (standard error): 0.002 0.135 concavity (standard error): 0.0 0.396 concave points (standard error): 0.0 0.053 symmetry (standard error): 0.008 0.079 fractal dimension (standard error): 0.001 0.03 radius (worst): 7.93 36.04 texture (worst): 12.02 49.54 perimeter (worst): 50.41 251.2 area (worst): 185.2 4254.0 smoothness (worst): 0.071 0.223 compactness (worst): 0.027 1.058 concavity (worst): 0.0 1.252 concave points (worst): 0.0 0.291 symmetry (worst): 0.156 0.664 fractal dimension (worst): 0.055 0.208 ===================================== ====== ====== :Missing Attribute Values: None :Class Distribution: 212 - Malignant, 357 - Benign :Creator: Dr. William H. Wolberg, W. Nick Street, Olvi L. Mangasarian :Donor: Nick Street :Date: November, 1995 This is a copy of UCI ML Breast Cancer Wisconsin (Diagnostic) datasets. https://goo.gl/U2Uwz2 Features are computed from a digitized image of a fine needle aspirate (FNA) of a breast mass. They describe characteristics of the cell nuclei present in the image. Separating plane described above was obtained using Multisurface Method-Tree (MSM-T) [K. P. Bennett, "Decision Tree Construction Via Linear Programming." Proceedings of the 4th Midwest Artificial Intelligence and Cognitive Science Society, pp. 97-101, 1992], a classification method which uses linear programming to construct a decision tree. Relevant features were selected using an exhaustive search in the space of 1-4 features and 1-3 separating planes. The actual linear program used to obtain the separating plane in the 3-dimensional space is that described in: [K. P. Bennett and O. L. Mangasarian: "Robust Linear Programming Discrimination of Two Linearly Inseparable Sets", Optimization Methods and Software 1, 1992, 23-34]. This database is also available through the UW CS ftp server: ftp ftp.cs.wisc.edu cd math-prog/cpo-dataset/machine-learn/WDBC/ .. topic:: References - W.N. Street, W.H. Wolberg and O.L. Mangasarian. Nuclear feature extraction for breast tumor diagnosis. IS&T/SPIE 1993 International Symposium on Electronic Imaging: Science and Technology, volume 1905, pages 861-870, San Jose, CA, 1993. - O.L. Mangasarian, W.N. Street and W.H. Wolberg. Breast cancer diagnosis and prognosis via linear programming. Operations Research, 43(4), pages 570-577, July-August 1995. - W.H. Wolberg, W.N. Street, and O.L. Mangasarian. Machine learning techniques to diagnose breast cancer from fine-needle aspirates. Cancer Letters 77 (1994) 163-171.
Rakasimanaswini
Developed a binary classification model using the Breast Cancer Wisconsin dataset to distinguish malignant and benign tumors .Built and evaluated models using Logistic Regression, Support Vector Machines (SVM), and Artificial Neural Networks (ANN) with Keras and Scikit-learn.
satheeshMulinti
Implemented machine learning models to classify breast cancer as benign or malignant using the Wisconsin dataset. Techniques include data preprocessing, feature selection, model training, and evaluation."
No description available
Neelmani13
An efficient way to detect and predict breast cancer classification using ML techniques
shahadeshubhu
Machine Learning coursework for COMP4139 at the University of Nottingham. Includes labs, breast cancer prediction (classification & regression), concrete strength prediction using ML models, feature selection, and evaluation techniques.
SrilekhaM-Github
Breast cancer is considered one of the most common cancers in women caused by various clinical, lifestyle, social, and economic factors. A machine learning (ML) algorithm helps lot to we used various ML Classification techniques, and to perform diagnosis from the data collected by medical field.
Aabir-coder
Built a breast cancer detection model using machine learning in Python to classify tumors as benign or malignant. Focused on data preprocessing, feature analysis, and model evaluation. Demonstrates practical use of ML in healthcare and strengthens understanding of classification techniques.
Developing a robust deep learning model for breast cancer classification, leveraging ML techniques to differentiate malignant and benign tumors from tissue data. Using CNNs and diverse datasets, we aim to enhance medical diagnostics, aiding informed healthcare decisions and improving patient outcomes
Sanjibmanna76
Objective: Predict the possibility of diseases based on patient data. Approach: Apply classification techniques to structured medical datasets. Key Features: ● Use features like symptoms, age, blood test results ● Algorithms: SVM, Logistic Regression, Random Forest, XGBoost ● Datasets: Heart disease, Diabetes, Breast Cancer (UCI ML Repository)
bhavani-nagarajan
This dataset is about the cells that is derived from the people who are expecting breast cancer using a technique called fine needle aspiration. With the information of the cells from the dataset we are trying to find whether the cells are Benign or Malignant. For this binary classification we are going to use a ML model – Logistic Regression .
:Number of Attributes: 30 numeric, predictive attributes and the class # :Attribute Information: # - radius (mean of distances from center to points on the perimeter) # - texture (standard deviation of gray-scale values) # - perimeter # - area # - smoothness (local variation in radius lengths) # - compactness (perimeter^2 / area - 1.0) # - concavity (severity of concave portions of the contour) # - concave points (number of concave portions of the contour) # - symmetry # - fractal dimension ("coastline approximation" - 1) # The mean, standard error, and "worst" or largest (mean of the three # worst/largest values) of these features were computed for each image, # resulting in 30 features. For instance, field 0 is Mean Radius, field # 10 is Radius SE, field 20 is Worst Radius. # - class: # - WDBC-Malignant # - WDBC-Benign # :Summary Statistics: # ===================================== ====== ====== # Min Max # ===================================== ====== ====== # radius (mean): 6.981 28.11 # texture (mean): 9.71 39.28 # perimeter (mean): 43.79 188.5 # area (mean): 143.5 2501.0 # smoothness (mean): 0.053 0.163 # compactness (mean): 0.019 0.345 # concavity (mean): 0.0 0.427 # concave points (mean): 0.0 0.201 # symmetry (mean): 0.106 0.304 # fractal dimension (mean): 0.05 0.097 # radius (standard error): 0.112 2.873 # texture (standard error): 0.36 4.885 # perimeter (standard error): 0.757 21.98 # area (standard error): 6.802 542.2 # smoothness (standard error): 0.002 0.031 # compactness (standard error): 0.002 0.135 # concavity (standard error): 0.0 0.396 # concave points (standard error): 0.0 0.053 # symmetry (standard error): 0.008 0.079 # fractal dimension (standard error): 0.001 0.03 # radius (worst): 7.93 36.04 # texture (worst): 12.02 49.54 # perimeter (worst): 50.41 251.2 # area (worst): 185.2 4254.0 # smoothness (worst): 0.071 0.223 # compactness (worst): 0.027 1.058 # concavity (worst): 0.0 1.252 # concave points (worst): 0.0 0.291 # symmetry (worst): 0.156 0.664 # fractal dimension (worst): 0.055 0.208 # ===================================== ====== ====== # :Missing Attribute Values: None # :Class Distribution: 212 - Malignant, 357 - Benign # :Creator: Dr. William H. Wolberg, W. Nick Street, Olvi L. Mangasarian # :Donor: Nick Street # :Date: November, 1995 # This is a copy of UCI ML Breast Cancer Wisconsin (Diagnostic) datasets. # https://goo.gl/U2Uwz2 # Features are computed from a digitized image of a fine needle # aspirate (FNA) of a breast mass. They describe # characteristics of the cell nuclei present in the image. # Separating plane described above was obtained using # Multisurface Method-Tree (MSM-T) [K. P. Bennett, "Decision Tree # Construction Via Linear Programming." Proceedings of the 4th # Midwest Artificial Intelligence and Cognitive Science Society, # pp. 97-101, 1992], a classification method which uses linear # programming to construct a decision tree. Relevant features # were selected using an exhaustive search in the space of 1-4 # features and 1-3 separating planes. # The actual linear program used to obtain the separating plane # in the 3-dimensional space is that described in: # [K. P. Bennett and O. L. Mangasarian: "Robust Linear # Programming Discrimination of Two Linearly Inseparable Sets", # Optimization Methods and Software 1, 1992, 23-34]. # This database is also available through the UW CS ftp server: # ftp ftp.cs.wisc.edu # cd math-prog/cpo-dataset/machine-learn/WDBC/ # .. topic:: References # - W.N. Street, W.H. Wolberg and O.L. Mangasarian. Nuclear feature extraction # for breast tumor diagnosis. IS&T/SPIE 1993 International Symposium on # Electronic Imaging: Science and Technology, volume 1905, pages 861-870, # San Jose, CA, 1993. # - O.L. Mangasarian, W.N. Street and W.H. Wolberg. Breast cancer diagnosis and # prognosis via linear programming. Operations Research, 43(4), pages 570-577, # July-August 1995. # - W.H. Wolberg, W.N. Street, and O.L. Mangasarian. Machine learning techniques # to diagnose breast cancer from fine-needle aspirates. Cancer Letters 77 (1994) # 163-171.
All 18 repositories loaded