GitHub - gdebasis/BengaliStemmer: A very simple-to-use rule based stemmer for Bengali (Bangla). The program takes as input a new line separated list of words and outputs in each new line the stem for every input word.

This is a very simple light-weight rule based stemmer for Bengali. To build on a Linux system, type make. You can then invoke the stemmer (the executable name is rbs) by ./stem_bn

The input file is a new line separated list of Bengali words and the output is also a new line separated file, the first word being the original word and the consecutive word being its stemmed form.

No corpus preprocessing is required to run this stemmer. I have provided a sample input file. Just type in ./rbs sample.txt sample.stem to see the output.

NOTE: You can provide an optional third argument for the aggressiveness of the stemmer. By default, the aggressive mode is turned off. To turn it on, please append a "1" at the end of the argument list.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
bn_unicode.h		bn_unicode.h
makefile		makefile
rbs.c		rbs.c
rbs.h		rbs.h
sample.stem		sample.stem
sample.txt		sample.txt
stem_bn.c		stem_bn.c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

gdebasis/BengaliStemmer

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages