Found 16 repositories(showing 16)
sujitjean
In the journey of exploring the flown of Data science and predictivemodeling, I explored this Very interesting algorithm k-nearest neighbors (KNN) algorithm.I have tried to leverage the ability of the Classification algorithm whichcomes under Supervised learning of Section of predictive modeling. I used the KNN algorithm for the classification of approvalrate of the projects submitted by the teachers of the United states for students.The main business context of the project was to reduce the manualevaluation of the projects that were done by volunteers as the process of evaluationcan take a long time, which may also be based on some factors and some irreducibleerrors could also be introduced into the processes. Some other import points are.· How to scale current manual processes andresources to screen 500,000 projects so that they can be posted as quickly andas efficiently as possible· How to increase the consistency of projectvetting across different volunteers to improve the experience for teachers· How to focus volunteer time on the applicationsthat need the most assistance.The goal of the project is to predict whether or not aDonorsChoose.org project proposal submitted by a teacher will be approved,using the text of project descriptions as well as additional metadata about theproject, teacher, and school. DonorsChoose.org can then use this information toidentify projects most likely to need further review before approvalThe steps followed for Data Preparation and PredictiveModeling is as follows:Note: Giving Unstructured data (Garbage in common terms) toa machine-learning algorithm gives you random data (Garbage) again. All the code is written in a very clean and untestablemanner ignoring fancy methods where ever possible and reference for everything thatis used in coding is given above the code so that is it easy for everyone tounderstand the code and leverage the potential that AI has because I believein growing together and helping others as this makes me a great team player. Italso increases the storytelling ability and to represent data.For the implementation of all the code, I have used the SKlearn Library. 1. Apply KNN (brute force version) on thesefeature sets1. I have formed the different sets of thedata for checking which Vectorization of the text data works better than others.Set 1: categorical, numericalfeatures + project title (BOW) + preprocessed essay (BOW)Set 2: categorical, numericalfeatures + project title (TFIDF)+ preprocessed essay (TFIDF)Set 3: categorical, numericalfeatures + project title (AVG W2V) + preprocessed essay (AVG W2V)Set 4: categorical, numericalfeatures + project title (TFIDF W2V) + preprocessed essay (TFIDF W2V)2. Hyperparameter tuning to find best K andMetrix used for evaluation of the model. 1. Find the best hyperparameter which results inthe maximum AUC value2. Find the best hyperparameter using k-foldcross-validation (or) simple cross-validation data3. Use grid search-cv or random search-cv or write your own for loops to dothis task 3. Representation of results1. Plotting the performance of model both on traindata and cross-validation data for each hyperparameter. 2. Once you find the best hyperparameter, you needto train your model-M using the best hyper-param. Now, find the AUC on testdata and plot the ROC curve on both train and test using model-M. 3. Along with plotting ROC curve, you need to printthe confusion matrix with predicted and original labels of test data points 4. . Select top 2000 features from the featureSet 2 using `SelectKBest` and then apply KNN on top of these features(this thesection wherein we select the best features from all the features we have)1. Repeat the steps 2 and 3 on the data matrixafter feature selection 5. Conclusion1. Summarize the results at the end of thenotebook, summarizing is done in the table format.
mickeycoi
No description available
olga-st
Test assignment for cv-school.ru
Aspirin4k
Macroscop School 2017
Gavamot
No description available
Exnus
No description available
alexmelyon
No description available
botalov
No description available
A-Kuklin
Тестовые задания для прохождения рабочего интервью в онлайн-школу «Тетрика» на вакансию «Junior python разработчик».
MajorAxe
Test task for Perm computer vision school application
mihal94
No description available
por-quez
My CV test cite for a school project
por-quez
My CV test cite for a school project
Julia-Beiferman
This repository includes an experiment I did in high school to test whether we can use open cv and simple geometry to measure distances. Included is a lab report that goes into more detail. NOTE: finalproject.py is the actual file, blob_detection.py is just a test
In the journey of exploring the flied of Data science and predictivemodeling, I explored this Very interesting algorithm Logistic Regression algorithm.I have tried to leverage the ability of the Classification algorithm whichcomes under Supervised learning of Section of predictive modeling. I used the Logistic Regression algorithm for theclassification of approval rate of the projects submitted by the teachers ofUnited states for students.The main business context of the Project was to reduce the manualevaluation of the projects that was done by volunteers as the process of evaluationcan take long time, which may also be biased on some factors and some irreducibleerrors could also be introduced into the processes. Some other import points are.· How to scale current manual processes andresources to screen 500,000 projects so that they can be posted as quickly andas efficiently as possible· How to increase the consistency of projectvetting across different volunteers to improve the experience for teachers· How to focus volunteer time on the applicationsthat need the most assistance.The goal of the Project is to predict whether or not aDonorsChoose.org project proposal submitted by a teacher will be approved,using the text of project descriptions as well as additional metadata about theproject, teacher, and school. DonorsChoose.org can then use this information toidentify projects most likely to need further review before approvalThe steps followed for Data Preparation and PredictiveModeling is as follows:Note: Giving Unstructured data (Garbage in common terms) toa machine learning algorithm gives you random data (Garbage) again. All the code is written in a very clean and untestablemanner ignoring fancy methods where ever possible and reference for everything thatis used in coding is given above the code so that is it easy for everyone tounderstand the code and leverage the potential that AI has, because I believein growing together and helping others as this makes me a great team player . Italso increases the story telling ability and to represent data.For implementation of all the code I have used the SKlearn Library. 1. Logistic Regression (either SGDClassifierwith log loss, or LogisticRegression) on these feature setsSet 1: categorical, numerical features+ project_title(BOW) + preprocessed_eassay (`BOW with bi-grams` with`min_df=10` and `max_features=5000`)Set 2: categorical, numericalfeatures + project_title(TFIDF)+ preprocessed_eassay (`TFIDF with bi-grams`with `min_df=10` and `max_features=5000`)Set 3: categorical, numericalfeatures + project_title(AVG W2V)+ preprocessed_eassay (AVG W2V)Set 4: categorical, numericalfeatures + project_title(TFIDF W2V)+ preprocessed_essay (TFIDF W2V) 2. Hyper parameter tuning (find best hyperparameters corresponding the algorithm that you choose)1. Find the best hyper parameter which will givethe maximum AUC value2. Find the best hyper parameter using k-fold crossvalidation or simple cross validation data3. Use gridsearch cv or random search cv or you canalso write your own for loops to do this task of hyperparameter tuning 3. Representation of results1. You need to plot the performance of model bothon train data and cross validation data for each hyper parameter, like shown inthe figure. 2. Once after you found the best hyper parameter,you need to train your model with it, and find the AUC on test data and plotthe ROC curve on both train and test. 3. Along with plotting ROC curve, you need to printthe confusion matrix with predicted and original labels of test data points.Please visualize your confusion matrices using seaborn heatmaps. Task-2 Apply Logistic Regression on the belowfeature set Set 5 by finding the best hyper parameter as suggested in step 2and step 3.Consider these set of features Set 5 :school state : categorical dataclean categories : categorical dataclean subcategories : categorical dataproject_grade_category :categorical datateacher prefix : categorical dataquantity : numerical datateacher_number_of_previously_posted_projects : numerical dataprice : numerical datasentiment score's of each of the essay : numerical datanumber of words in the title : numerical datanumber of words in the combine essays : numerical dataAnd apply the Logistic regression on these features byfinding the best hyper paramter as suggested in step 2 and step 3 4 . ConclusionYou need to summarize the results at the end of thenotebook, summarize it in the table format. To print out a table please referto this pretty table library link.
Jeffrey-LXA
LxA - Our Identity in Diversity "Have you ever thought that Exellence sounds like Diversity?"or "Do you think that Mobility and Flexibility are the key to Stronger Performance?" If you answered positively to these 2 questions, let us introduce your mission the French Riviera Nice Sophia Antipolis, France: Location Nice, France (Sophia Antipolis) Mission: To participate in all phases of product development. To design technical solutions and perform feasibility studies. To develop software, conduct performance tests of the software and ensure a level of quality in line with the client guidelines. To participate in the validation/acceptance phase of the product cycle ensuring the fine-tuning necessary to finalize the product. To create the relevant user documentation and provide user support during implementation phases You will work in a 8-10 team people from 63 nationalities (english speaking). Profil: Master of Science (Engineering School) Strong English C++ OR Java 2 years of experience in the field Strong technical knowledge Communication skills Team Work Python knowledge, project management, database knowledge are a plus Salary: 32,000.00€- 40,000.00/year Who are we? LxA is a company focus on digital and IT located at the heart of Sophia Antipolis, Nice, France. We promote strong value in vigorous collaboration and in a lasting solidarity with our consultants. We are going to lead you to the main players in IT in Sophia Antipolis France. What we offer? Multicultural Environment (15+ nationalities) Strong capacity of adjustment Open-mindedness Close relationship with consultants Advice & help to find accomodation (Appartment) Pick you up at the Nice, France Airport Pay the 50% ticket's price Consultant-friendly with events within the company Do no wait anymore and send us your CV (in PDF/English are preferred)that we will carrefully study!
All 16 repositories loaded