engTokenizer

This projects includes a simple English tokenizer and a vocabulary maker.

How to run the program:

Put the English text that need tokenizing in the same directory of the program;
Run the command: cat filename | ./eng_tokenizer.sh > filename.tok to get the tokenized text;
Run the command: cat filename.tok | ./make_voc.sh > filename.tok.voc to get the vocabulary of the tokenized text;
Run the command: cat filename | ./make_voc.sh > filename.voc to get the vocabulary of the original text.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md
eng_tokenizer.class		eng_tokenizer.class
eng_tokenizer.java		eng_tokenizer.java
eng_tokenizer.sh		eng_tokenizer.sh
ex1		ex1
ex1.tok		ex1.tok
ex1.tok.txt		ex1.tok.txt
ex1.tok.voc		ex1.tok.voc
ex1.tok.voc.txt		ex1.tok.voc.txt
ex1.voc		ex1.voc
ex1.voc.txt		ex1.voc.txt
make_voc$1.class		make_voc$1.class
make_voc.class		make_voc.class
make_voc.java		make_voc.java
make_voc.sh		make_voc.sh

Provide feedback