Home RawrTextAnalyser About the Tool About Us


About the tool

Here you can learn more about the text analysing tools available for you on this website:


Tokenization is a way of separating a piece of text into smaller units called tokens. Here, tokens can be either words, characters, or subwords.
Hence, tokenization can be broadly classified into 3 types – word, character, and subword (n-gram characters) tokenization.

Stemming is a technique that lowers inflection in words to their root forms. Our tool uses the Porter Stemmer - It is one of the most popular stemming methods proposed in 1980. It is based on the idea that the suffixes in the English language are made up of a combination of smaller and simpler suffixes.

Parsing is the process of assigning a word in a text as corresponding to a part of speech based on its definition and its relationship with adjacent and related words in a phrase, sentence, or paragraph. Our parser returns a tagged phrase structure tree.

Part of Speech Tagging is the process of marking up a word in a text as corresponding to a particular part of speech. Here is a list of POS tags: