Skip to content
View OlivierBinette's full-sized avatar
🧑‍💻
🧑‍💻

Sponsors

@Carbil13

Block or report OlivierBinette

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
OlivierBinette/README.md

Hi 👋, I'm Olivier

Data Scientist at American Institutes for Research

PhD in Statistical Science from Duke University

🤓 Some stats

🧑‍💻 Some of the software I've worked on

  • ER-Evaluation (Python package on PyPI): Toolkit for the (statistical) evaluation of entity resolution, record linkage, and micro-clustering models.

  • StreamlitSurvey (Python package on PyPI): Streamlit apps can be a bit tricky to get right. Especially if you want multi-pages forms/surveys that keep track of progress and that can be restored from a saved state. This Streamlit component makes it easy to incorporate more complex forms on in your apps.

  • StringCompare (Python/C++ package): Most implementations of the Levenshtein distance are pretty bad. Here's how the basic algorithm can be implemented with minimal time and memory complexity.

  • CSVMeta (Python package on PyPI): You know how excel has been repeatedly blamed for high-profile disasters? Well, CSV files are part of the problem. If you're going to use CSV files, you need to specify your CSV dialect in a metadata file. This package reads/write CSV files with attached dialect information.

  • assert (R package on CRAN): Informative assertions in 16 lines of code with some R introspection magic (no dependencies!)

  • cache (R package on CRAN): On-disk caching for data science workflows in 15 lines of code (no dependencies!)

  • dgaFast/dga (R package on CRAN): I needed the dga R package (that I know maintain) to go fast. I re-implemented bottlenecks in C++ and did some linear algebra magic for a 100x speedup.

  • Fractals (Javascript App): More than 10 years ago, I deep dived into browser multithreading to create a Javascript Mendelbrot set visualization app.

  • Welcome to the Moon App (Javascript App): I love board games! I created this app as a companion to the wonderful Welcome to the Moon board game.

  • FingermatchR (R & C package): Ever needed an R wrapper to fingerprint matching C libraries? No? Well, a Duke postdoc did and sweet-talked me into building this... It may not be the most useful thing, but it was interesting!

🧰 Some tools I use

Python R C C++ Java bash JavaScript DuckDB AWS Docker ElasticSearch Git Github Actions conda Linux vscode

NumPy Pandas Plotly scikit-learn PyTorch statsmodels

Pytest Pytest

Pinned Loading

  1. Awesome-Entity-Resolution Awesome-Entity-Resolution Public

    List of entity resolution software and resources.

    42 2

  2. er-evaluation er-evaluation Public

    An End-to-End Evaluation Framework for Entity Resolution Systems

    Python 26 2

  3. streamlit-survey streamlit-survey Public

    Survey components for Streamlit apps

    Python 21 13

  4. StringCompare StringCompare Public

    Efficient String Comparison Functions and Fuzzy String Matching

    Python 17 2

  5. CSVMeta CSVMeta Public

    Lightweight csv read/write, keeping track of csv dialect and other metadata.

    Python 1 1

  6. cache cache Public

    Easily cache and retrieve computation results in R

    R 6 1