Stars
Official repository for the official implementation of HyperBitT, HyperBitBit64, and HyperTwoBits
Reference implementation of the Affirmative Sampling algorithm by Jérémie Lumbroso and Conrado Martínez (2022). 🍀
A simple, time-tested, family of random hash functions in Python, based on CRC32 and xxHash, affine transformations, and the Mersenne Twister. 🎲
Generate random critical Galton-Watson trees and display their free trees in a force-directed layout
HyperMinHash: Bringing intersections to HyperLogLog
HyperLogLog with lots of sugar (Sparse, LogLog-Beta bias correction and TailCut space reduction) brought to you by Axiom
SuperMinHash: A New Minwise Hashing Algorithm for Jaccard Similarity Estimation
A test project for the proposed HyperBitBit algorithm.
Implementation of 'Recordinality' cardinality estimation sketch with distinct value sampling
Introduction to common Probabilistic Algorithms: Approximate Counting, Flajolet-Martin, LogLog, HyperLogLog, Bloom Filters
An implementation of the Recordinality sketch in Python, with a CLI
A book about the hard computer science problems that are at the core of creating accessible technology.
A short book on teaching in the University of Washington Informatics undergraduate program.
A short book on cooperative aspects of software engineering.
CleverCSV is a Python package for handling messy CSV files. It provides a drop-in replacement for the builtin CSV module with improved dialect detection, and comes with a handy command line applica…
Four-year course planner for Princeton University students
My bachelor thesis. Design, implementation and experimentation of LotterySampling algorithm (between others) to solve the Heavy Hitters and Top-k problems in large streams
Supplementary material for solving projects assigned in The Data Mine.
Provides a convenient Python interface to the codePost API. Start scripting!
Python CSV, and delimiter-spaced files, for humans!
proof of concept implementation of SortedMap via median-of-k jumplists
Companian code for the paper "Sesquickselect: One and a half pivots for cache-efficient selection"
Algorithms, 4th edition textbook code and libraries