python - how to extract information I want by NLKT



I want to extract relevant information about few topics. for example:

  • product information
  • purchase experience of customer
  • recommendation of family or friend

In first step, I extract information from one of the website. for instance :

i think AIA does a more better life insurance as my comparison and the companies comparisonand most important is also medical insurance in my opinionyes there are some agents that will sell u plans that their commission is high...dun worry u buy insurance from a company anything happens u can contact back the company also can ...better find a agent that is reliable and not just working for the commission for now , they might not service u in the future...thanksregardsdiana ""

Then by using NLTK in VS2015, I tried to split words.

toks = nltk.word_tokenize(text)

By using pos_tag I can tag my toks

postoks = nltk.tag.pos_tag(toks)

from this part I am not sure what should I do? Previously, I used IBM text Analytic. In this software I use to create dictionary and then create some pattern and then analysis the data. for instance :

Sample of Dictionary: insurance_cmp : {AIA, IMG, SABB}

Sample of pattern:

insurance_cmp + Good_Feeling_Pattern

insurance_cmp + ['purchase|Buy'] + Bad_Feeling_Pattern

Good_Feeling_Pattern = [good, like it, nice]

Bad_Feeling_Pattern = [bad, worse, not good, regret]

I tried to know can I simulate the same in NLKT? chunker and create grammar can help me to extract what I am looking for? may I have your idea to improve myself please?

grammar = r"""
        {<NN.*|JJ>*<NN.*>}  # Nouns and Adjectives, terminated with Nouns

        {<NBAR><IN><NBAR>}  # Above, connected with in/of/etc...
chunker = nltk.RegexpParser(grammar)

tree = chunker.parse(postoks)

Please help me what could be my next step to reach to my goal?

1 Answer: 

You just need to follow these video

or read this blog.