machine learning - Topic modeling a corpus with one "majority topic" and several "minority topics"

Keywords:machine  learning 


I have a collection of documents, and most of them are about the same topic, and the rest are basically random topics. I wish to classify the documents into whether they are about the "majority topic" or are one of these random "minority topics". What would happen if I used a topic modeling algorithm on this corpus with only 2 topics? Would the corpus be partitioned into "majority topic" and "minority topics" even though the "minority topics" presumably don't have much similarity to each other?

1 Answer: 

You can use MonkeyLearn for this.

You can create a custom classifier with two topics: "majority topic" and "minority topics". You have to add some training samples on each category so MonkeyLearn can learn to predict each category.

After you train your classifier, it can be integrated with any programming language via its API.

You can try MonkeyLearn for free here:

If you have any questions, leave a comment here or email us, I am are here to help.