Normalization of the given text:
Sentence Segmentation:
1. Raj and Vijay are best friends.
2. They play together with other friends.
3. Raj likes to play football but Vijay prefers to play online games.
4. Raj wants to be a footballer.
5. Vijay wants to become an online gamer.
Tokenization:
Raj and Vijay are best friends. Raj and Vijay are best friends .
They play together with other friends They play Together with other friends .
Same will be done for all sentences.
Removing Stop words, Special Characters and Numbers: In this step, the tokens which are not necessary are removed from the token list. So, the words and, are, to, an, (Punctuation) will be removed.
Converting text to a common case: After the stop words removal, we convert the whole text into a similar case, preferably lower case.
Here we don’t have words in different case so this step is not required for given text. Stemming:
In this step, the remaining words are reduced to their root words. In other words, stemming is the process in which the affixes of words are removed and the words are converted to their base form.
Word Affixes Stem
Likes -s Like
Prefers -s Prefer
Wants -s want
In the given text Lemmatization is not required.
Given Text Raj and Vijay are best friends. They play together with other friends. Raj likes to play football but Vijay prefers to play online games. Raj wants to be a footballer. Vijay wants to become an online gamer.
Normalized Text Raj and Vijay best friends They play together with other friends Raj likes to play football but Vijay prefers to play online games Raj wants to be a footballer Vijay wants to become an online gamer