The py-strsim
library is a wrapper for the fabulous
strsim
Rust crate. This package
extends the functionality marginally by enabling parallelized versions of each
strsim
function using rayon
.
It is advised to use a virtual environment for most projects. The instructions
below assume that python
is the name of your Python-3 interpreter.
git clone https://github.com/knight9114/py-strsim.git
cd py-strsim
python setup.py install
The py-strsim
package has two parts - single
and vectorized
. The single
API matches the original strsim
crate exactly. The vectorized
versions of
the functions look slightly different:
strsim.vectorized.<function>(n: int, a: str, bs: list[str]) -> list[int] | list[float]:
...
The first argument, n
, specifies the number of threads to use during the
computation. Each element in bs
will be right-compared to the input a
. The
ordering in the output matches the ordering in the input bs
.
import strsim
assert strsim.single.levenshtein('hello world', 'Hello, World') == 3
assert strsim.single.normalized_levenshtein('hello world', 'Hello, World') == 0.75
...
assert strsim.vectorized.levenshtein(2, 'hello world', ['Hello, World', 'hello world!']) == [3, 1]
...
My only contribution to this project is writing the Python bindings. All of the credit belongs to