python - how to extract information I want by NLKT


Keywords:python 


Question: 

I want to extract relevant information about few topics. for example:

  • product information
  • purchase experience of customer
  • recommendation of family or friend

In first step, I extract information from one of the website. for instance :

i think AIA does a more better life insurance as my comparison and the companies comparisonand most important is also medical insurance in my opinionyes there are some agents that will sell u plans that their commission is high...dun worry u buy insurance from a company anything happens u can contact back the company also can ...better find a agent that is reliable and not just working for the commission for now , they might not service u in the future...thanksregardsdiana ""

Then by using NLTK in VS2015, I tried to split words.

toks = nltk.word_tokenize(text)

By using pos_tag I can tag my toks

postoks = nltk.tag.pos_tag(toks)

from this part I am not sure what should I do? Previously, I used IBM text Analytic. In this software I use to create dictionary and then create some pattern and then analysis the data. for instance :

Sample of Dictionary: insurance_cmp : {AIA, IMG, SABB}

Sample of pattern:

insurance_cmp + Good_Feeling_Pattern

insurance_cmp + ['purchase|Buy'] + Bad_Feeling_Pattern

Good_Feeling_Pattern = [good, like it, nice]

Bad_Feeling_Pattern = [bad, worse, not good, regret]

I tried to know can I simulate the same in NLKT? chunker and create grammar can help me to extract what I am looking for? may I have your idea to improve myself please?

grammar = r"""
    NBAR:
        {<NN.*|JJ>*<NN.*>}  # Nouns and Adjectives, terminated with Nouns

    NP:
        {<NBAR>}
        {<NBAR><IN><NBAR>}  # Above, connected with in/of/etc...
"""
chunker = nltk.RegexpParser(grammar)

tree = chunker.parse(postoks)

Please help me what could be my next step to reach to my goal?


1 Answer: 

You just need to follow these video

or read this blog.