Skip to content

๐Ÿ๐Ÿ’ฏpySBD (Python Sentence Boundary Disambiguation) is a rule-based sentence boundary detection that works out-of-the-box.

License

Notifications You must be signed in to change notification settings

lianNice/pySBD

This branch is 126 commits behind nipunsadvilkar/pySBD:master.

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Oct 25, 2019
a2bb451 ยท Oct 25, 2019
Oct 25, 2019
Oct 25, 2019
Oct 25, 2019
Oct 6, 2019
Jun 8, 2019
Oct 25, 2019
Jun 8, 2019
Jun 8, 2019
Jun 8, 2019
Jun 8, 2019
Oct 25, 2019
Jun 8, 2019
Oct 9, 2019

Repository files navigation

pySBD: Python Sentence Boundary Disambiguation (SBD)

Build Status License

pySBD - python Sentence Boundary Disambiguation (SBD) - is a rule-based sentence boundary detection module that works out-of-the-box.

This project is a direct port of ruby gem - Pragmatic Segmenter which provides rule-based sentence boundary detection.

Install

Python

pip install pysbd

Usage

  • Currently pySBD supports only English language. Support for more languages will be released soon.
import pysbd
text = "My name is Jonas E. Smith. Please turn to p. 55."
seg = pysbd.Segmenter(language="en", clean=False)
print(seg.segment(text))
# ['My name is Jonas E. Smith.', 'Please turn to p. 55.']

Contributing

If you find a text that is incorrectly segmented using pySBD, please submit an issue.

  1. Fork it ( https://github.com/nipunsadvilkar/pySBD/fork )
  2. Create your feature branch (git checkout -b my-new-feature)
  3. Commit your changes (git commit -am 'Add some feature')
  4. Push to the branch (git push origin my-new-feature)
  5. Create a new Pull Request

Credit

This project wouldn't be possible without the great work done by Pragmatic Segmenter team.

About

๐Ÿ๐Ÿ’ฏpySBD (Python Sentence Boundary Disambiguation) is a rule-based sentence boundary detection that works out-of-the-box.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 100.0%