Skip to content

ibis-project/ibis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Mar 10, 2025
f78a8fb · Mar 10, 2025
Mar 10, 2025
Mar 10, 2025
Jan 13, 2023
Feb 22, 2025
Feb 19, 2025
Feb 19, 2025
Mar 10, 2025
Mar 10, 2025
Mar 7, 2025
Jan 27, 2025
Apr 13, 2023
Feb 9, 2024
Mar 10, 2019
Dec 31, 2024
Nov 10, 2024
Jan 18, 2024
Dec 20, 2024
Apr 22, 2022
Mar 10, 2025
Jan 27, 2025
Feb 9, 2024
Feb 9, 2024
Jan 25, 2025
Mar 29, 2023
Oct 18, 2023
Sep 13, 2024
Nov 7, 2023
Mar 10, 2025
Sep 13, 2024
Mar 25, 2024
Mar 9, 2025
Jan 28, 2024
Mar 10, 2025
Mar 7, 2025
Mar 10, 2025
Sep 7, 2024
Mar 10, 2025
Sep 6, 2023
Jan 2, 2023
Mar 10, 2025
Mar 10, 2025
Mar 10, 2025

Ibis

Documentation status Project chat Anaconda badge PyPI Build status Build status Codecov branch

What is Ibis?

Ibis is the portable Python dataframe library:

See the documentation on "Why Ibis?" to learn more.

Getting started

You can pip install Ibis with a backend and example data:

pip install 'ibis-framework[duckdb,examples]'

💡 Tip

See the installation guide for more installation options.

Then use Ibis:

>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()
>>> t
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ speciesislandbill_length_mmbill_depth_mmflipper_length_mmbody_mass_gsexyear  ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ stringstringfloat64float64int64int64stringint64 │
├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ AdelieTorgersen39.118.71813750male2007 │
│ AdelieTorgersen39.517.41863800female2007 │
│ AdelieTorgersen40.318.01953250female2007 │
│ AdelieTorgersenNULLNULLNULLNULLNULL2007 │
│ AdelieTorgersen36.719.31933450female2007 │
│ AdelieTorgersen39.320.61903650male2007 │
│ AdelieTorgersen38.917.81813625female2007 │
│ AdelieTorgersen39.219.61954675male2007 │
│ AdelieTorgersen34.118.11933475NULL2007 │
│ AdelieTorgersen42.020.21904250NULL2007 │
│ …       │ …         │              … │             … │                 … │           … │ …      │     … │
└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘
>>> g = t.group_by("species", "island").agg(count=t.count()).order_by("count")
>>> g
┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━┓
┃ speciesislandcount ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━┩
│ stringstringint64 │
├───────────┼───────────┼───────┤
│ AdelieBiscoe44 │
│ AdelieTorgersen52 │
│ AdelieDream56 │
│ ChinstrapDream68 │
│ GentooBiscoe124 │
└───────────┴───────────┴───────┘

💡 Tip

See the getting started tutorial for a full introduction to Ibis.

Python + SQL: better together

For most backends, Ibis works by compiling its dataframe expressions into SQL:

>>> ibis.to_sql(g)
SELECT
  "t1"."species",
  "t1"."island",
  "t1"."count"
FROM (
  SELECT
    "t0"."species",
    "t0"."island",
    COUNT(*) AS "count"
  FROM "penguins" AS "t0"
  GROUP BY
    1,
    2
) AS "t1"
ORDER BY
  "t1"."count" ASC

You can mix SQL and Python code:

>>> a = t.sql("SELECT species, island, count(*) AS count FROM penguins GROUP BY 1, 2")
>>> a
┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━┓
┃ speciesislandcount ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━┩
│ stringstringint64 │
├───────────┼───────────┼───────┤
│ AdelieTorgersen52 │
│ AdelieBiscoe44 │
│ AdelieDream56 │
│ GentooBiscoe124 │
│ ChinstrapDream68 │
└───────────┴───────────┴───────┘
>>> b = a.order_by("count")
>>> b
┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━┓
┃ speciesislandcount ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━┩
│ stringstringint64 │
├───────────┼───────────┼───────┤
│ AdelieBiscoe44 │
│ AdelieTorgersen52 │
│ AdelieDream56 │
│ ChinstrapDream68 │
│ GentooBiscoe124 │
└───────────┴───────────┴───────┘

This allows you to combine the flexibility of Python with the scale and performance of modern SQL.

Backends

Ibis supports nearly 20 backends:

How it works

Most Python dataframes are tightly coupled to their execution engine. And many databases only support SQL, with no Python API. Ibis solves this problem by providing a common API for data manipulation in Python, and compiling that API into the backend’s native language. This means you can learn a single API and use it across any supported backend (execution engine).

Ibis broadly supports two types of backend:

  1. SQL-generating backends
  2. DataFrame-generating backends

Ibis backend types

Portability

To use different backends, you can set the backend Ibis uses:

>>> ibis.set_backend("duckdb")
>>> ibis.set_backend("polars")
>>> ibis.set_backend("datafusion")

Typically, you'll create a connection object:

>>> con = ibis.duckdb.connect()
>>> con = ibis.polars.connect()
>>> con = ibis.datafusion.connect()

And work with tables in that backend:

>>> con.list_tables()
['penguins']
>>> t = con.table("penguins")

You can also read from common file formats like CSV or Apache Parquet:

>>> t = con.read_csv("penguins.csv")
>>> t = con.read_parquet("penguins.parquet")

This allows you to iterate locally and deploy remotely by changing a single line of code.

💡 Tip

Check out the blog on backend agnostic arrays for one example using the same code across DuckDB and BigQuery.

Community and contributing

Ibis is an open source project and welcomes contributions from anyone in the community.

Join our community by interacting on GitHub or chatting with us on Zulip.

For more information visit https://ibis-project.org/.

Governance

The Ibis project is an independently governed open source community project to build and maintain the portable Python dataframe library. Ibis has contributors across a range of data companies and institutions.