Found 193 repositories(showing 30)
Repo for Applied Text Mining in Python (coursera) by University of Michigan
agniiyer
University of Michigan on Coursera
bondeanikets
Applied Data Science with Python Specialization: Course 4 (University of Michigan)
This repository contains graded assignments in python-3 language of the course 'Applied text mining in Python', part of the specialisation 'Applied data Science using Python' by University of Michigan offered by Coursera.
Vaibhavabhaysharma
This repository contains solutions of the course- "Applied_Text_Mining_in_Python provided by University of Michigan on platform Coursera.
Brucewuzhang
applied text mining in python (coursera course)
jhwong18
No description available
Jasonluo666
This repository contained the tests and assignments (auto-graded) I did as well as most of the course resources.
sambhipiyuushh
Applied Text Mining in Python-University of Michigan
No description available
KaoutherElhamdi
No description available
No description available
momin-butt
This course will introduce the learner to the basics of the python programming environment, including fundamental python programming techniques such as lambdas, reading and manipulating csv files, and the numpy library. The course will introduce data manipulation and cleaning techniques using the popular python pandas data science library and introduce the abstraction of the Series and DataFrame as the central data structures for data analysis, along with tutorials on how to use functions such as groupby, merge, and pivot tables effectively. By the end of this course, students will be able to take tabular data, clean it, manipulate it, and run basic inferential statistical analyses. This course should be taken before any of the other Applied Data Science with Python courses: Applied Plotting, Charting & Data Representation in Python, Applied Machine Learning in Python, Applied Text Mining in Python, Applied Social Network Analysis in Python.
This course will introduce the learner to applied machine learning, focusing more on the techniques and methods than on the statistics behind these methods. The course will start with a discussion of how machine learning is different than descriptive statistics, and introduce the scikit learn toolkit through a tutorial. The issue of dimensionality of data will be discussed, and the task of clustering data, as well as evaluating those clusters, will be tackled. Supervised approaches for creating predictive models will be described, and learners will be able to apply the scikit learn predictive modelling methods while understanding process issues related to data generalizability (e.g. cross validation, overfitting). The course will end with a look at more advanced techniques, such as building ensembles, and practical limitations of predictive models. By the end of this course, students will be able to identify the difference between a supervised (classification) and unsupervised (clustering) technique, identify which technique they need to apply for a particular dataset and need, engineer features to meet that need, and write python code to carry out an analysis. This course should be taken after Introduction to Data Science in Python and Applied Plotting, Charting & Data Representation in Python and before Applied Text Mining in Python and Applied Social Analysis in Python.
randallscott25
The Applied Data Science program at Syracuse University's School of Information Studies provides students the opportunity to collect, manage, analyze, and develop insights using data from a multitude of domains using various tools and techniques. In courses such as Database Administration, Data Analytics, Text Mining, and Marketing Analytics, reports and presentations were developed to deliver insights using Microsoft Access, SQL Server Management Studio, Python, R, Excel, and Tableau. The skills developed at the School of Information Studies furnish data scientists focused in the field of marketing analytics with the ability to generate value within their organizations and produce actionable recommendations.
In this task, a data-set is used that comprise of different jobs posted on a job portal. The data-set was downloaded from Kaggle. It had the following basic properties: It was provided in .csv format. The data-set simulated the real life scenario of jobs posted on a job portal and comprised of Job's title, Job's description along with its category As the data was labeled so in the context of machine learning, it was a Supervised Machine learning problem i.e. I had access to the data that was already correctly labeled and I had to train a model using this historical data. The main goal was to build a model that could accurately classify new and unseen data when it was input to it i.e. to assign proper label to a job posting when its input to the model. As the nature of the data was "text" so this project also involved extensive usage of text mining techniques as well. Text in its basic form is unstructured and to develop predictive models, the data needs to be thoroughly pre-processed. So the pipeline of developing models that I followed was: Data Profiling Data Cleansing Exploratory Analysis Data Preprocessing Feature Extraction and Selection Model Development Model Evaluation When text data is pre-processed, the issue of curse of dimensionality usually appears i.e. data becomes highly multi-dimensional with lots of features ranging in thousands. Not all of those features are helpful and also it adversely affects the peformance of classifiers as well so following the best practices, I opted for best-in-class feature extraction methods and also applied feature selection techniques so as to compile only those features that will contribute in this prediction problem. For model development, I used and compared the following set of machine learning algorithms: Bernoulli Naive Bayes Multinomial Naive Bayes Random Forests Linear SVM and compared these algorithms on different metrics like accuracy, training and testing time. As per my analysis, SVM outshines all of the other models when it comes to accuracy. Random Forests accuracy score was also quite good but took considerable time during training phase. For implementation, I used Python. Specifically, I used the following libraries/modules of Python for different set of tasks: pandas, numpy sklearn nltk matplotlib To run the code, please make sure that the latest version of Python, Jupyter and aforementioned libraries are installed in your system.
tahirs95
Applied Text Mining in Python
rahulpatraiitkgp
Course - 4; Specialization: Applied Data Science with Python; University Of Michigan
GyujinSeo
No description available
bahgat-ahmed
The fourth course in the "Applied Data Science with Python" Specialization on Coursera provided by the University of Michigan.
razish88
Applied Data Science with Python - University of Michigan
ShubhMech
No description available
elgeokareem
4to curso de la especializacion de data science
mengjie514
Assignments/Projects/Learning Notes
sgsuryawanshi
No description available
mahimaarora
No description available
JM-Rishav
No description available
hamzaelanssari
Applied Text Mining in Python Course On Coursera
Taranjeet0874
This is the fourth course of 5 course specialization in Applied Data Science in python.
Applied Data Science with Python Specialization