How to cluster texts by most relevant words
I have a huge amount of documents and every document has its own portrait, where a portrait has this structure (document_id, word, weight). TFIDF, basically.
I want to cluster these documents into different clusters, say, 10.
I'm trying to implement the K-Means algorithm with sklearn, but I have almost zero experience with data science whatsoever. All tutorials that I found get texts as input from Wikipedia or somewhere else, but I don't have access to the texts themselves. I have only their portraits. Hope that makes sense.
Is this something that can be achievable with sklearn and if so, can you guide me where to dig or what to look at
Topic clustering
Category Data Science