cluster analysis - Text CLustering Algorithms

Keywords:cluster  analysis 


I am looking to cluster a bunch of Twitter hashtags based on their topics. All the hashtags related to the same topic will go under the same cluster. I was looking for any python based libraries which popular and efficient. I would also like suggestions on which algorithms I should be considering to cluster them together.

2 Answers: 

Good luck: Twitter data is so messy, I doubt you will be able to get meaningful results.

Definitely try TF-IDF, and as many algorithms as you can get working on your data.

But what are you going to do with tweets such as this:

Cool: #HashTagIMadeUpForYourSOQuestionASDAS

Which "topic" should this be? How would you expect a clustering algorithm to meaningfully cluster this?


I can recomend natural language processing in python (NLTK package). But as it was sad, it might be challanging with Twiter (but lots of fun too). Might I know, what are you need that for?;)