Table expressions

Tables are one of the core data structures in Ibis.

Table

Table(self, arg)

An immutable and lazy dataframe.

Analogous to a SQL table or a pandas DataFrame. A table expression contains an ordered set of named columns, each with a single known type. Unless explicitly ordered with an .order_by(), the order of rows is undefined.

Table immutability means that the data underlying an Ibis Table cannot be modified: every method on a Table returns a new Table with those changes. Laziness means that an Ibis Table expression does not run your computation every time you call one of its methods. Instead, it is a symbolic expression that represents a set of operations to be performed, which typically is translated into a SQL query. That SQL query is then executed on a backend, where the data actually lives. The result (now small enough to be manageable) can then be materialized back into python as a pandas/pyarrow/python DataFrame/Column/scalar.

You will not create Table objects directly. Instead, you will create one

from a pandas DataFrame, pyarrow table, Polars table, or raw python dicts/lists with ibis.memtable(df)
from an existing table in a data platform with connection.table("name")
from a file or URL, into a specific backend with connection.read_csv/parquet/json("path/to/file") (only some backends, typically local ones, support this)
from a file or URL, into the default backend with ibis.read_csv/read_json/read_parquet("path/to/file")

See the user guide for more info.

Methods

Name	Description
alias	Create a table expression with a specific name `alias`.
as_scalar	Inform ibis that the table expression should be treated as a scalar.
count	Compute the number of rows in the table.
difference	Compute the set difference of multiple table expressions.
distinct	Return a Table with duplicate rows removed.
dropna	Deprecated - use `drop_null` instead.
fillna	Deprecated - use `fill_null` instead.
filter	Select rows from `table` based on `predicates`.
intersect	Compute the set intersection of multiple table expressions.
limit	Select `n` rows from `self` starting at `offset`.
order_by	Sort a table by one or more expressions.
sample	Sample a fraction of rows from a table.
select	Compute a new table expression using `exprs` and `named_exprs`.
sql	Run a SQL query against a table expression.
union	Compute the set union of multiple table expressions.
view	Create a new table expression distinct from the current one.

alias

alias(alias)

Create a table expression with a specific name alias.

This method is useful for exposing an ibis expression to the underlying backend for use in the Table.sql method.

.alias will create a temporary view

.alias creates a temporary view in the database.

This side effect will be removed in a future version of ibis and is not part of the public API.

Parameters

Name	Type	Description	Default
alias	str	Name of the child expression	required

Returns

Name	Type	Description
	Table	An table expression

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t = letsql.examples.penguins.fetch()
>>> expr = t.alias("pingüinos").sql('SELECT * FROM "pingüinos" LIMIT 5')
>>> expr

┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ species ┃ island    ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃ sex    ┃ year  ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ string  │ string    │ float64        │ float64       │ float64           │ float64     │ string │ int64 │
├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Adelie  │ Torgersen │           39.1 │          18.7 │             181.0 │      3750.0 │ male   │  2007 │
│ Adelie  │ Torgersen │           39.5 │          17.4 │             186.0 │      3800.0 │ female │  2007 │
│ Adelie  │ Torgersen │           40.3 │          18.0 │             195.0 │      3250.0 │ female │  2007 │
│ Adelie  │ Torgersen │           NULL │          NULL │              NULL │        NULL │ NULL   │  2007 │
│ Adelie  │ Torgersen │           36.7 │          19.3 │             193.0 │      3450.0 │ female │  2007 │
└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘

as_scalar

as_scalar()

Inform ibis that the table expression should be treated as a scalar.

Note that the table must have exactly one column and one row for this to work. If the table has more than one column an error will be raised in expression construction time. If the table has more than one row an error will be raised by the backend when the expression is executed.

Returns

Name	Type	Description
	Scalar	A scalar subquery

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t = letsql.examples.penguins.fetch()
>>> heavy_gentoo = t.filter(t.species == "Gentoo", t.body_mass_g > 6200)
>>> from_that_island = t.filter(t.island == heavy_gentoo.select("island").as_scalar())
>>> from_that_island.species.value_counts().order_by("species")

┏━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ species ┃ species_count ┃
┡━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ string  │ int64         │
├─────────┼───────────────┤
│ Adelie  │            44 │
│ Gentoo  │           124 │
└─────────┴───────────────┘

count

count(where=None)

Compute the number of rows in the table.

Parameters

Name	Type	Description	Default
where	ir.BooleanValue \| None	Optional boolean expression to filter rows when counting.	`None`

Returns

Name	Type	Description
	IntegerScalar	Number of rows in the table

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"a": ["foo", "bar", "baz"]})
>>> t

┏━━━━━━━━┓
┃ a      ┃
┡━━━━━━━━┩
│ string │
├────────┤
│ foo    │
│ bar    │
│ baz    │
└────────┘

>>> t.count()

┌───┐
│ 3 │
└───┘

>>> t.count(t.a != "foo")

┌───┐
│ 2 │
└───┘

>>> type(t.count())

ibis.expr.types.numeric.IntegerScalar

difference

difference(table, *rest, distinct=True)

Compute the set difference of multiple table expressions.

The input tables must have identical schemas.

Parameters

Name	Type	Description	Default
table	Table	A table expression	required
*rest	Table	Additional table expressions	`()`
distinct	bool	Only diff distinct rows not occurring in the calling table	`True`

Returns

Name	Type	Description
	Table	The rows present in `self` that are not present in `tables`.

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t1 = ibis.memtable({"a": [1, 2]})
>>> t1

┏━━━━━━━┓
┃ a     ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│     1 │
│     2 │
└───────┘

>>> t2 = ibis.memtable({"a": [2, 3]})
>>> t2

┏━━━━━━━┓
┃ a     ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│     2 │
│     3 │
└───────┘

>>> t1.difference(t2)

┏━━━━━━━┓
┃ a     ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│     1 │
└───────┘

distinct

distinct(on=None, keep='first')

Return a Table with duplicate rows removed.

Similar to pandas.DataFrame.drop_duplicates().

Some backends do not support keep='last'

Parameters

Name	Type	Description	Default
on	str \| Iterable[str] \| s.Selector \| None	Only consider certain columns for identifying duplicates. By default deduplicate all of the columns.	`None`
keep	Literal['first', 'last'] \| None	Determines which duplicates to keep. - `"first"`: Drop duplicates except for the first occurrence. - `"last"`: Drop duplicates except for the last occurrence. - `None`: Drop all duplicates	`'first'`

Examples

>>> import ibis
>>> import letsql.examples as ex
>>> import ibis.selectors as s
>>> ibis.options.interactive = True
>>> t = ex.penguins.fetch()
>>> t

┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ species ┃ island    ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃ sex    ┃ year  ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ string  │ string    │ float64        │ float64       │ float64           │ float64     │ string │ int64 │
├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Adelie  │ Torgersen │           39.1 │          18.7 │             181.0 │      3750.0 │ male   │  2007 │
│ Adelie  │ Torgersen │           39.5 │          17.4 │             186.0 │      3800.0 │ female │  2007 │
│ Adelie  │ Torgersen │           40.3 │          18.0 │             195.0 │      3250.0 │ female │  2007 │
│ Adelie  │ Torgersen │           NULL │          NULL │              NULL │        NULL │ NULL   │  2007 │
│ Adelie  │ Torgersen │           36.7 │          19.3 │             193.0 │      3450.0 │ female │  2007 │
│ Adelie  │ Torgersen │           39.3 │          20.6 │             190.0 │      3650.0 │ male   │  2007 │
│ Adelie  │ Torgersen │           38.9 │          17.8 │             181.0 │      3625.0 │ female │  2007 │
│ Adelie  │ Torgersen │           39.2 │          19.6 │             195.0 │      4675.0 │ male   │  2007 │
│ Adelie  │ Torgersen │           34.1 │          18.1 │             193.0 │      3475.0 │ NULL   │  2007 │
│ Adelie  │ Torgersen │           42.0 │          20.2 │             190.0 │      4250.0 │ NULL   │  2007 │
│ …       │ …         │              … │             … │                 … │           … │ …      │     … │
└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘

Compute the distinct rows of a subset of columns

>>> t[["species", "island"]].distinct().order_by(s.all())

┏━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ species   ┃ island    ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━┩
│ string    │ string    │
├───────────┼───────────┤
│ Adelie    │ Biscoe    │
│ Adelie    │ Dream     │
│ Adelie    │ Torgersen │
│ Chinstrap │ Dream     │
│ Gentoo    │ Biscoe    │
└───────────┴───────────┘

Drop all duplicate rows except the first

>>> t.distinct(on=["species", "island"], keep="first").order_by(s.all())

┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ species   ┃ island    ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃ sex    ┃ year  ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ string    │ string    │ float64        │ float64       │ float64           │ float64     │ string │ int64 │
├───────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Adelie    │ Biscoe    │           37.8 │          18.3 │             174.0 │      3400.0 │ female │  2007 │
│ Adelie    │ Dream     │           39.5 │          16.7 │             178.0 │      3250.0 │ female │  2007 │
│ Adelie    │ Torgersen │           39.1 │          18.7 │             181.0 │      3750.0 │ male   │  2007 │
│ Chinstrap │ Dream     │           46.5 │          17.9 │             192.0 │      3500.0 │ female │  2007 │
│ Gentoo    │ Biscoe    │           46.1 │          13.2 │             211.0 │      4500.0 │ female │  2007 │
└───────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘

Drop all duplicate rows except the last

>>> t.distinct(on=["species", "island"], keep="last").order_by(s.all())

┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ species   ┃ island    ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃ sex    ┃ year  ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ string    │ string    │ float64        │ float64       │ float64           │ float64     │ string │ int64 │
├───────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Adelie    │ Biscoe    │           42.7 │          18.3 │             196.0 │      4075.0 │ male   │  2009 │
│ Adelie    │ Dream     │           41.5 │          18.5 │             201.0 │      4000.0 │ male   │  2009 │
│ Adelie    │ Torgersen │           43.1 │          19.2 │             197.0 │      3500.0 │ male   │  2009 │
│ Chinstrap │ Dream     │           50.2 │          18.7 │             198.0 │      3775.0 │ female │  2009 │
│ Gentoo    │ Biscoe    │           49.9 │          16.1 │             213.0 │      5400.0 │ male   │  2009 │
└───────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘

Drop all duplicated rows

>>> expr = t.distinct(on=["species", "island", "year", "bill_length_mm"], keep=None)
>>> expr.count()

┌─────┐
│ 273 │
└─────┘

>>> t.count()

┌─────┐
│ 344 │
└─────┘

You can pass selectors to on

>>> t.distinct(on=~s.numeric())

┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ species   ┃ island    ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃ sex    ┃ year  ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ string    │ string    │ float64        │ float64       │ float64           │ float64     │ string │ int64 │
├───────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Chinstrap │ Dream     │           46.5 │          17.9 │             192.0 │      3500.0 │ female │  2007 │
│ Chinstrap │ Dream     │           50.0 │          19.5 │             196.0 │      3900.0 │ male   │  2007 │
│ Adelie    │ Biscoe    │           37.8 │          18.3 │             174.0 │      3400.0 │ female │  2007 │
│ Adelie    │ Biscoe    │           37.7 │          18.7 │             180.0 │      3600.0 │ male   │  2007 │
│ Adelie    │ Dream     │           39.5 │          16.7 │             178.0 │      3250.0 │ female │  2007 │
│ Adelie    │ Dream     │           37.2 │          18.1 │             178.0 │      3900.0 │ male   │  2007 │
│ Adelie    │ Dream     │           37.5 │          18.9 │             179.0 │      2975.0 │ NULL   │  2007 │
│ Adelie    │ Torgersen │           39.1 │          18.7 │             181.0 │      3750.0 │ male   │  2007 │
│ Adelie    │ Torgersen │           39.5 │          17.4 │             186.0 │      3800.0 │ female │  2007 │
│ Adelie    │ Torgersen │           34.1 │          18.1 │             193.0 │      3475.0 │ NULL   │  2007 │
│ …         │ …         │              … │             … │                 … │           … │ …      │     … │
└───────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘

The only valid values of keep are "first", "last" and .

>>> t.distinct(on="species", keep="second")

IbisError: Invalid value for `keep`: 'second', must be 'first', 'last' or None

dropna

dropna(subset=None, how='any')

Deprecated - use drop_null instead.

fillna

fillna(replacements)

Deprecated - use fill_null instead.

filter

filter(*predicates)

Select rows from table based on predicates.

Parameters

Name	Type	Description	Default
predicates	ir.BooleanValue \| Sequence[ir.BooleanValue] \| IfAnyAll	Boolean value expressions used to select rows in `table`.	`()`

Returns

Name	Type	Description
	Table	Filtered table expression

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t = letsql.examples.penguins.fetch()
>>> t

┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ species ┃ island    ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃ sex    ┃ year  ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ string  │ string    │ float64        │ float64       │ float64           │ float64     │ string │ int64 │
├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Adelie  │ Torgersen │           39.1 │          18.7 │             181.0 │      3750.0 │ male   │  2007 │
│ Adelie  │ Torgersen │           39.5 │          17.4 │             186.0 │      3800.0 │ female │  2007 │
│ Adelie  │ Torgersen │           40.3 │          18.0 │             195.0 │      3250.0 │ female │  2007 │
│ Adelie  │ Torgersen │           NULL │          NULL │              NULL │        NULL │ NULL   │  2007 │
│ Adelie  │ Torgersen │           36.7 │          19.3 │             193.0 │      3450.0 │ female │  2007 │
│ Adelie  │ Torgersen │           39.3 │          20.6 │             190.0 │      3650.0 │ male   │  2007 │
│ Adelie  │ Torgersen │           38.9 │          17.8 │             181.0 │      3625.0 │ female │  2007 │
│ Adelie  │ Torgersen │           39.2 │          19.6 │             195.0 │      4675.0 │ male   │  2007 │
│ Adelie  │ Torgersen │           34.1 │          18.1 │             193.0 │      3475.0 │ NULL   │  2007 │
│ Adelie  │ Torgersen │           42.0 │          20.2 │             190.0 │      4250.0 │ NULL   │  2007 │
│ …       │ …         │              … │             … │                 … │           … │ …      │     … │
└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘

>>> t.filter([t.species == "Adelie", t.body_mass_g > 3500]).sex.value_counts().drop_null(
...     "sex"
... ).order_by("sex")

┏━━━━━━━━┳━━━━━━━━━━━┓
┃ sex    ┃ sex_count ┃
┡━━━━━━━━╇━━━━━━━━━━━┩
│ string │ int64     │
├────────┼───────────┤
│ female │        22 │
│ male   │        68 │
└────────┴───────────┘

intersect

intersect(table, *rest, distinct=True)

Compute the set intersection of multiple table expressions.

The input tables must have identical schemas.

Parameters

Name	Type	Description	Default
table	Table	A table expression	required
*rest	Table	Additional table expressions	`()`
distinct	bool	Only return distinct rows	`True`

Returns

Name	Type	Description
	Table	A new table containing the intersection of all input tables.

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t1 = ibis.memtable({"a": [1, 2]})
>>> t1

┏━━━━━━━┓
┃ a     ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│     1 │
│     2 │
└───────┘

>>> t2 = ibis.memtable({"a": [2, 3]})
>>> t2

┏━━━━━━━┓
┃ a     ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│     2 │
│     3 │
└───────┘

>>> t1.intersect(t2)

┏━━━━━━━┓
┃ a     ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│     2 │
└───────┘

limit

limit(n, offset=0)

Select n rows from self starting at offset.

The result set is not deterministic without a call to order_by.

Parameters

Name	Type	Description	Default
n	int \| None	Number of rows to include. If `None`, the entire table is selected starting from `offset`.	required
offset	int	Number of rows to skip first	`0`

Returns

Name	Type	Description
	Table	The first `n` rows of `self` starting at `offset`

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"a": [1, 1, 2], "b": ["c", "a", "a"]})
>>> t

┏━━━━━━━┳━━━━━━━━┓
┃ a     ┃ b      ┃
┡━━━━━━━╇━━━━━━━━┩
│ int64 │ string │
├───────┼────────┤
│     1 │ c      │
│     1 │ a      │
│     2 │ a      │
└───────┴────────┘

>>> t.limit(2)

┏━━━━━━━┳━━━━━━━━┓
┃ a     ┃ b      ┃
┡━━━━━━━╇━━━━━━━━┩
│ int64 │ string │
├───────┼────────┤
│     1 │ c      │
│     1 │ a      │
└───────┴────────┘

You can use None with offset to slice starting from a particular row

>>> t.limit(None, offset=1)

┏━━━━━━━┳━━━━━━━━┓
┃ a     ┃ b      ┃
┡━━━━━━━╇━━━━━━━━┩
│ int64 │ string │
├───────┼────────┤
│     1 │ a      │
│     2 │ a      │
└───────┴────────┘

order_by

order_by(*by)

Sort a table by one or more expressions.

Similar to pandas.DataFrame.sort_values().

Parameters

Name	Type	Description	Default
by	str \| ir.Column \| s.Selector \| Sequence[str] \| Sequence[ir.Column] \| Sequence[s.Selector] \| None	Expressions to sort the table by.	`()`

Returns

Name	Type	Description
	Table	Sorted table

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable(
...     {
...         "a": [3, 2, 1, 3],
...         "b": ["a", "B", "c", "D"],
...         "c": [4, 6, 5, 7],
...     }
... )
>>> t

┏━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ a     ┃ b      ┃ c     ┃
┡━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ int64 │ string │ int64 │
├───────┼────────┼───────┤
│     3 │ a      │     4 │
│     2 │ B      │     6 │
│     1 │ c      │     5 │
│     3 │ D      │     7 │
└───────┴────────┴───────┘

Sort by b. Default is ascending. Note how capital letters come before lowercase

>>> t.order_by("b")

┏━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ a     ┃ b      ┃ c     ┃
┡━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ int64 │ string │ int64 │
├───────┼────────┼───────┤
│     2 │ B      │     6 │
│     3 │ D      │     7 │
│     3 │ a      │     4 │
│     1 │ c      │     5 │
└───────┴────────┴───────┘

Sort in descending order

>>> t.order_by(ibis.desc("b"))

┏━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ a     ┃ b      ┃ c     ┃
┡━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ int64 │ string │ int64 │
├───────┼────────┼───────┤
│     1 │ c      │     5 │
│     3 │ a      │     4 │
│     3 │ D      │     7 │
│     2 │ B      │     6 │
└───────┴────────┴───────┘

You can also use the deferred API to get the same result

>>> from ibis import _
>>> t.order_by(_.b.desc())

┏━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ a     ┃ b      ┃ c     ┃
┡━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ int64 │ string │ int64 │
├───────┼────────┼───────┤
│     1 │ c      │     5 │
│     3 │ a      │     4 │
│     3 │ D      │     7 │
│     2 │ B      │     6 │
└───────┴────────┴───────┘

Sort by multiple columns/expressions

>>> t.order_by(["a", _.c.desc()])

┏━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ a     ┃ b      ┃ c     ┃
┡━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ int64 │ string │ int64 │
├───────┼────────┼───────┤
│     1 │ c      │     5 │
│     2 │ B      │     6 │
│     3 │ D      │     7 │
│     3 │ a      │     4 │
└───────┴────────┴───────┘

You can actually pass arbitrary expressions to use as sort keys. For example, to ignore the case of the strings in column b

>>> t.order_by(_.b.lower())

┏━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ a     ┃ b      ┃ c     ┃
┡━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ int64 │ string │ int64 │
├───────┼────────┼───────┤
│     3 │ a      │     4 │
│     2 │ B      │     6 │
│     1 │ c      │     5 │
│     3 │ D      │     7 │
└───────┴────────┴───────┘

This means that shuffling a Table is super simple

>>> t.order_by(ibis.random())

┏━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ a     ┃ b      ┃ c     ┃
┡━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ int64 │ string │ int64 │
├───────┼────────┼───────┤
│     2 │ B      │     6 │
│     1 │ c      │     5 │
│     3 │ a      │     4 │
│     3 │ D      │     7 │
└───────┴────────┴───────┘

Selectors are allowed as sort keys and are a concise way to sort by multiple columns matching some criteria

>>> import ibis.selectors as s
>>> penguins = letsql.examples.penguins.fetch()
>>> penguins[["year", "island"]].value_counts().order_by(s.startswith("year"))

┏━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓
┃ year  ┃ island    ┃ year_island_count ┃
┡━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩
│ int64 │ string    │ int64             │
├───────┼───────────┼───────────────────┤
│  2007 │ Torgersen │                20 │
│  2007 │ Biscoe    │                44 │
│  2007 │ Dream     │                46 │
│  2008 │ Torgersen │                16 │
│  2008 │ Dream     │                34 │
│  2008 │ Biscoe    │                64 │
│  2009 │ Torgersen │                16 │
│  2009 │ Dream     │                44 │
│  2009 │ Biscoe    │                60 │
└───────┴───────────┴───────────────────┘

Use the across selector to apply a specific order to multiple columns

>>> penguins[["year", "island"]].value_counts().order_by(
...     s.across(s.startswith("year"), _.desc())
... )

┏━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓
┃ year  ┃ island    ┃ year_island_count ┃
┡━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩
│ int64 │ string    │ int64             │
├───────┼───────────┼───────────────────┤
│  2009 │ Biscoe    │                60 │
│  2009 │ Dream     │                44 │
│  2009 │ Torgersen │                16 │
│  2008 │ Biscoe    │                64 │
│  2008 │ Dream     │                34 │
│  2008 │ Torgersen │                16 │
│  2007 │ Dream     │                46 │
│  2007 │ Biscoe    │                44 │
│  2007 │ Torgersen │                20 │
└───────┴───────────┴───────────────────┘

sample

sample(fraction, *, method='row', seed=None)

Sample a fraction of rows from a table.

Results may be non-repeatable

Sampling is by definition a random operation. Some backends support specifying a seed for repeatable results, but not all backends support that option. And some backends (duckdb, for example) do support specifying a seed but may still not have repeatable results in all cases.

In all cases, results are backend-specific. An execution against one backend is unlikely to sample the same rows when executed against a different backend, even with the same seed set.

Parameters

Name	Type	Description	Default
fraction	float	The percentage of rows to include in the sample, expressed as a float between 0 and 1.	required
method	Literal['row', 'block']	The sampling method to use. The default is “row”, which includes each row with a probability of `fraction`. If method is “block”, some backends may instead perform sampling a fraction of blocks of rows (where “block” is a backend dependent definition). This is identical to “row” for backends lacking a blockwise sampling implementation. For those coming from SQL, “row” and “block” correspond to “bernoulli” and “system” respectively in a TABLESAMPLE clause.	`'row'`
seed	int \| None	An optional random seed to use, for repeatable sampling. The range of possible seed values is backend specific (most support at least `[0, 2**31 - 1]`). Backends that never support specifying a seed for repeatable sampling will error appropriately. Note that some backends (like DuckDB) do support specifying a seed, but may still not have repeatable results in all cases.	`None`

Returns

Name	Type	Description
	Table	The input table, with `fraction` of rows selected.

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"x": [1, 2, 3, 4], "y": ["a", "b", "c", "d"]})
>>> t

┏━━━━━━━┳━━━━━━━━┓
┃ x     ┃ y      ┃
┡━━━━━━━╇━━━━━━━━┩
│ int64 │ string │
├───────┼────────┤
│     1 │ a      │
│     2 │ b      │
│     3 │ c      │
│     4 │ d      │
└───────┴────────┘

Sample approximately half the rows, with a seed specified for reproducibility.

>>> t.sample(0.5, seed=1234)

┏━━━━━━━┳━━━━━━━━┓
┃ x     ┃ y      ┃
┡━━━━━━━╇━━━━━━━━┩
│ int64 │ string │
├───────┼────────┤
│     2 │ b      │
│     3 │ c      │
└───────┴────────┘

select

select(*exprs, **named_exprs)

Compute a new table expression using exprs and named_exprs.

Passing an aggregate function to this method will broadcast the aggregate’s value over the number of rows in the table and automatically constructs a window function expression. See the examples section for more details.

For backwards compatibility the keyword argument exprs is reserved and cannot be used to name an expression. This behavior will be removed in v4.

Parameters

Name	Type	Description	Default
exprs	ir.Value \| str \| Iterable[ir.Value \| str]	Column expression, string, or list of column expressions and strings.	`()`
named_exprs	ir.Value \| str	Column expressions	`{}`

Returns

Name	Type	Description
	Table	Table expression

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t = letsql.examples.penguins.fetch()
>>> t

┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ species ┃ island    ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃ sex    ┃ year  ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ string  │ string    │ float64        │ float64       │ float64           │ float64     │ string │ int64 │
├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Adelie  │ Torgersen │           39.1 │          18.7 │             181.0 │      3750.0 │ male   │  2007 │
│ Adelie  │ Torgersen │           39.5 │          17.4 │             186.0 │      3800.0 │ female │  2007 │
│ Adelie  │ Torgersen │           40.3 │          18.0 │             195.0 │      3250.0 │ female │  2007 │
│ Adelie  │ Torgersen │           NULL │          NULL │              NULL │        NULL │ NULL   │  2007 │
│ Adelie  │ Torgersen │           36.7 │          19.3 │             193.0 │      3450.0 │ female │  2007 │
│ Adelie  │ Torgersen │           39.3 │          20.6 │             190.0 │      3650.0 │ male   │  2007 │
│ Adelie  │ Torgersen │           38.9 │          17.8 │             181.0 │      3625.0 │ female │  2007 │
│ Adelie  │ Torgersen │           39.2 │          19.6 │             195.0 │      4675.0 │ male   │  2007 │
│ Adelie  │ Torgersen │           34.1 │          18.1 │             193.0 │      3475.0 │ NULL   │  2007 │
│ Adelie  │ Torgersen │           42.0 │          20.2 │             190.0 │      4250.0 │ NULL   │  2007 │
│ …       │ …         │              … │             … │                 … │           … │ …      │     … │
└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘

Simple projection

>>> t.select("island", "bill_length_mm").head()

┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┓
┃ island    ┃ bill_length_mm ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━┩
│ string    │ float64        │
├───────────┼────────────────┤
│ Torgersen │           39.1 │
│ Torgersen │           39.5 │
│ Torgersen │           40.3 │
│ Torgersen │           NULL │
│ Torgersen │           36.7 │
└───────────┴────────────────┘

In that simple case, you could also just use python’s indexing syntax

>>> t[["island", "bill_length_mm"]].head()

┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┓
┃ island    ┃ bill_length_mm ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━┩
│ string    │ float64        │
├───────────┼────────────────┤
│ Torgersen │           39.1 │
│ Torgersen │           39.5 │
│ Torgersen │           40.3 │
│ Torgersen │           NULL │
│ Torgersen │           36.7 │
└───────────┴────────────────┘

Projection by zero-indexed column position

>>> t.select(t[0], t[4]).head()

┏━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓
┃ species ┃ flipper_length_mm ┃
┡━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩
│ string  │ float64           │
├─────────┼───────────────────┤
│ Adelie  │             181.0 │
│ Adelie  │             186.0 │
│ Adelie  │             195.0 │
│ Adelie  │              NULL │
│ Adelie  │             193.0 │
└─────────┴───────────────────┘

Projection with renaming and compute in one call

>>> t.select(next_year=t.year + 1).head()

┏━━━━━━━━━━━┓
┃ next_year ┃
┡━━━━━━━━━━━┩
│ int64     │
├───────────┤
│      2008 │
│      2008 │
│      2008 │
│      2008 │
│      2008 │
└───────────┘

You can do the same thing with a named expression, and using the deferred API

>>> from ibis import _
>>> t.select((_.year + 1).name("next_year")).head()

┏━━━━━━━━━━━┓
┃ next_year ┃
┡━━━━━━━━━━━┩
│ int64     │
├───────────┤
│      2008 │
│      2008 │
│      2008 │
│      2008 │
│      2008 │
└───────────┘

Projection with aggregation expressions

>>> t.select("island", bill_mean=t.bill_length_mm.mean()).head()

┏━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ island    ┃ bill_mean ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━┩
│ string    │ float64   │
├───────────┼───────────┤
│ Torgersen │  43.92193 │
│ Torgersen │  43.92193 │
│ Torgersen │  43.92193 │
│ Torgersen │  43.92193 │
│ Torgersen │  43.92193 │
└───────────┴───────────┘

Projection with a selector

>>> import ibis.selectors as s
>>> t.select(s.numeric() & ~s.cols("year")).head()

┏━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃
┡━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ float64        │ float64       │ float64           │ float64     │
├────────────────┼───────────────┼───────────────────┼─────────────┤
│           39.1 │          18.7 │             181.0 │      3750.0 │
│           39.5 │          17.4 │             186.0 │      3800.0 │
│           40.3 │          18.0 │             195.0 │      3250.0 │
│           NULL │          NULL │              NULL │        NULL │
│           36.7 │          19.3 │             193.0 │      3450.0 │
└────────────────┴───────────────┴───────────────────┴─────────────┘

Projection + aggregation across multiple columns

>>> from ibis import _
>>> t.select(s.across(s.numeric() & ~s.cols("year"), _.mean())).head()

┏━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃
┡━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ float64        │ float64       │ float64           │ float64     │
├────────────────┼───────────────┼───────────────────┼─────────────┤
│       43.92193 │      17.15117 │        200.915205 │ 4201.754386 │
│       43.92193 │      17.15117 │        200.915205 │ 4201.754386 │
│       43.92193 │      17.15117 │        200.915205 │ 4201.754386 │
│       43.92193 │      17.15117 │        200.915205 │ 4201.754386 │
│       43.92193 │      17.15117 │        200.915205 │ 4201.754386 │
└────────────────┴───────────────┴───────────────────┴─────────────┘

sql

sql(query, dialect=None)

Run a SQL query against a table expression.

Parameters

Name	Type	Description	Default
query	str	Query string	required
dialect	str \| None	Optional string indicating the dialect of `query`. Defaults to the backend’s native dialect.	`None`

Returns

Name	Type	Description
	Table	An opaque table expression

Examples

>>> import ibis
>>> from ibis import _
>>> ibis.options.interactive = True
>>> t = letsql.examples.penguins.fetch(table_name="penguins")
>>> expr = t.sql(
...     """
...     SELECT island, mean(bill_length_mm) AS avg_bill_length
...     FROM penguins
...     GROUP BY 1
...     ORDER BY 2 DESC
...     """
... )
>>> expr

┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ island    ┃ avg_bill_length ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
│ string    │ float64         │
├───────────┼─────────────────┤
│ Biscoe    │       45.257485 │
│ Dream     │       44.167742 │
│ Torgersen │       38.950980 │
└───────────┴─────────────────┘

Mix and match ibis expressions with SQL queries

>>> t = letsql.examples.penguins.fetch(table_name="penguins")
>>> expr = t.sql(
...     """
...     SELECT island, mean(bill_length_mm) AS avg_bill_length
...     FROM penguins
...     GROUP BY 1
...     ORDER BY 2 DESC
...     """
... )
>>> expr = expr.mutate(
...     island=_.island.lower(),
...     avg_bill_length=_.avg_bill_length.round(1),
... )
>>> expr

┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ island    ┃ avg_bill_length ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
│ string    │ float64         │
├───────────┼─────────────────┤
│ torgersen │            39.0 │
│ biscoe    │            45.3 │
│ dream     │            44.2 │
└───────────┴─────────────────┘

Because ibis expressions aren’t named, they aren’t visible to subsequent .sql calls. Use the alias method to assign a name to an expression.

>>> expr.alias("b").sql("SELECT * FROM b WHERE avg_bill_length > 40")

┏━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ island ┃ avg_bill_length ┃
┡━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
│ string │ float64         │
├────────┼─────────────────┤
│ biscoe │            45.3 │
│ dream  │            44.2 │
└────────┴─────────────────┘

union

union(table, *rest, distinct=False)

Compute the set union of multiple table expressions.

The input tables must have identical schemas.

Parameters

Name	Type	Description	Default
table	Table	A table expression	required
*rest	Table	Additional table expressions	`()`
distinct	bool	Only return distinct rows	`False`

Returns

Name	Type	Description
	Table	A new table containing the union of all input tables.

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t1 = ibis.memtable({"a": [1, 2]})
>>> t1

┏━━━━━━━┓
┃ a     ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│     1 │
│     2 │
└───────┘

>>> t2 = ibis.memtable({"a": [2, 3]})
>>> t2

┏━━━━━━━┓
┃ a     ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│     2 │
│     3 │
└───────┘

>>> t1.union(t2)  # union all by default

┏━━━━━━━┓
┃ a     ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│     1 │
│     2 │
│     2 │
│     3 │
└───────┘

>>> t1.union(t2, distinct=True).order_by("a")

┏━━━━━━━┓
┃ a     ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│     1 │
│     2 │
│     3 │
└───────┘

view

view()

Create a new table expression distinct from the current one.

Use this API for any self-referencing operations like a self-join.

Returns

Name	Type	Description
	Table	Table expression

Table

Methods

alias

Parameters

Returns

Examples

as_scalar

Returns

Examples

count

Parameters

Returns

Examples

difference

Parameters

See Also

Returns

Examples

distinct

Parameters

Examples

dropna

fillna

filter

Parameters

Returns

Examples

intersect

Parameters

Returns

See Also

Examples

limit

Parameters

Returns

Examples

See Also

order_by

Parameters

Returns

Examples

sample

Parameters

Returns

Examples

select

Parameters

Returns

Examples

sql

Parameters

Returns

Examples

See Also

union

Parameters

Returns

See Also

Examples

view

Returns