My approach is to create filters. I decided to remove all edits that don't contain a nickname of the contributor but are identified only by the IP address of the contributor, because most of such edits is spam (though there are some good contributions). This was easy to do with regular expressions. I also removed edits that contained vulgarisms and other typical spam keywords.
Do you know some better approach utilizing algorithms or heuristics with regular expressions, AI, text-processing techniques etc.? The approach should be able to detect bad posts (minor edits or vandalisms) and should be able to incrementally learn what is good/bad contribution and update its database.