Unsupervised Learning.For this clustering project on text, you will use a dataset named 20newsgroup. This is available in sklearn.datasets. You can use the code given below in the code cells to fetch the data. Next you need to run a TFIDFVectorizer on the sentences to obtain a document-word sparse matrix. Use this array as your X . Once you have got your array, you can apply different clustering techniques such as K-Means clustering and Hierarchical clustering to obtain meaningful clusters. Check if these clusters seem relevant and well separated. Finally you can use dimensionality reduction technqiues such as PCA or t-SNE(you can read about it and use it straight away) to come up with two dimensional visualization of these clusters.
Stars
3
Forks
0
Watchers
3
Open Issues
0
Overall repository health assessment
No package.json found
This might not be a Node.js project
2
commits