Simple library for detecting gibberish tiles from histopathological whole-slide images (WSI).
By gibberish tiles I mean tiles with pen marks and similar artifacts:
detects background tiles (based on the Otsu algorithm), red / green / blue pen marks, and black artifacts.
The typical use case for wsi_tile_cleanup
is preprocessing whole-slide images (WSI) before loading tiles into a neural network (yes, deep learning).
If you are building a deep learning pipeline, the following repositories might be of interest: @lucasrla/wsi-preprocessing and @lucasrla/wsi-preprocessing-sos-workflow.
conda create --name YOUR_ENV_NAME --channel conda-forge python=3.6 libvips pyvips numpy
conda activate YOUR_ENV_NAME
python3.6 -m pip install git+
# note: `python3.6 -m pip` is to make sure we are using pip from python=3.6
# first of all, install libvips
# (tip: have it installed with openslide support)
# next, create a new virtualenv and activate it using your tool of choice
# (e.g., pyenv, virtualenv, etc)
# then, depending on your dependency manager, run either:
poetry add git+
# or
pip install git+
from wsi_tile_cleanup import filters, utils
img = utils.read_image("data/images/tiles/5.jpeg")
bands = utils.split_rgb(img)
colors = ["red", "green", "blue"]
for color in colors:
perc = filters.pen_percentage(bands, color)
print(f"{color}: {perc:.5f}")
See also:
Please note that wsi_tile_cleanup
is just a very thin wrapper around libvips
, pyvips
and numpy
. They are the ones doing the heavy lifting (and doing it amazingly well).
libvips: A fast image processing library with low memory needs. The official Python bindings are called pyvips.
NumPy: The fundamental package for scientific computing with Python.
deep-histopath: I ported some of their filters to
. If you are interested in a preprocessing pipeline for deep learning, check out their nice write-up. -
scikit-image: I ported their implementation of the Otsu algorithm to
. -
Cancer Digital Slide Archive: TCGA slides hosted online by the Winship Cancer Institute at Emory University.
is Free Software distributed under the GNU General Public License v3.0.
Dependencies have their own licenses, check them out.