Tokenization is a way of separating a piece of text into smaller units called tokens. Here, tokens can be either words, characters, or subwords.
Hence, tokenization can be broadly classified into 3 types – word, character, and subword (n-gram characters) tokenization.
Stemming is a technique that lowers inflection in words to their root forms. Our tool uses the Porter Stemmer - It is one of the most popular stemming methods proposed in 1980. It is based on the idea that the suffixes in the English language are made up of a combination of smaller and simpler suffixes.
Parsing is the process of assigning a word in a text as corresponding to a part of speech based on its definition and its relationship with adjacent and related words in a phrase, sentence, or paragraph. Our parser returns a tagged phrase structure tree.
Part of Speech Tagging is the process of marking up a word in a text as corresponding to a particular part of speech.
Here is a list of POS tags:
- CC coordinating conjunction
- CD cardinal digit
- DT determiner
- EX existential there (like: “there is” … think of it like “there exists”)
- FW foreign word
- IN preposition/subordinating conjunction
- JJ adjective "big"
- JJR adjective, comparative "bigger"
- JJS adjective, superlative "biggest"
- LS list marker 1)
- MD modal could, will
- NN noun, singular "desk"
- NNS noun plural "desks"
- NNP proper noun, singular "Harrison"
- NNPS proper noun, plural "Americans"
- PDT predeterminer "all the kids"
- POS possessive ending parent"s
- PRP personal pronoun I, he, she
- PRP$ possessive pronoun my, his, hers
- RB adverb very, silently,
- RBR adverb, comparative better
- RBS adverb, superlative best
- RP particle give up
- TO, to go "to" the store.
- UH interjection, errrrrrrrm
- VB verb, base form take
- VBD verb, past tense took
- VBG verb, gerund/present participle taking
- VBN verb, past participle taken
- VBP verb, sing. present, non-3d take
- VBZ verb, 3rd person sing. present takes
- WDT wh-determiner which
- WP wh-pronoun who, what
- WP$ possessive wh-pronoun whose
- WRB wh-abverb where, when
- . punctuation marks . , ; !