This projects includes a simple English tokenizer and a vocabulary maker.
How to run the program:
Put the English text that need tokenizing in the same directory of the program;
Run the command: cat filename | ./ > filename.tok to get the tokenized text;
Run the command: cat filename.tok | ./ > filename.tok.voc to get the vocabulary of the tokenized text;
Run the command: cat filename | ./ > filename.voc to get the vocabulary of the original text.