Skip to content

Files

Latest commit

Jun 30, 2022
027f35c · Jun 30, 2022

History

History

tooling

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
Jun 30, 2022

Go Data Science Tooling, Packages, Libraries, etc.

This is a curated list of well-maintained and developing tools, packages, libraries, etc. related to doing data science with Go.

Also, this space includes a list of proposed packages that would fill certain gaps in the ecosystem or provide enhanced functionality.

Proposed

Arithmetic

Bioinformatics

Classification

Clustering

  • github.com/salkj/kmeans - A ready-to-use naive kmeans package for Go.
  • github.com/mpraski/clusters - Go implementations of several clustering algoritms (k-means++, DBSCAN, OPTICS), as well as utilities for importing data and estimating optimal number of clusters.

CSV

Distributed Data Analysis/Pipelining

Geospatial

General data munging

General purpose machine learning

Graphs

JSON

I/O

Matrices/Arrays/Linear Algebra

Neural Networks

NLP

Non-SQL Database Interactions

Parquet

Plotting/dashboarding

Probability/statistics/experiments

Recommendation Systems

Regression

SQL-like Database Interactions

Time Series

Web Scraping

Proposed

  • Multi-dimensional slices within Go itself (Proposal).
  • A robust (and concurrent) package to handle minimizations/fits of data and histograms (gonum/optimize would provide a nice foundation for this).
  • A robust (and concurrent) package to describe statistical models (Bayesian and frequentist) with many nuisance parameters, etc...
  • A Go native package for A/B testing.
  • A database with datalog querying. Inspiration can be drawn from Rich Hickey's Datomic database, but open source.
  • A datalog query system for distributed computation. Similar to Cascalog for the Hadoop ecosystem, but integrating with some of the Go tools instead.