Table expressions

Tables are one of the core data structures in Ibis.

Table

Table(self, arg)

An immutable and lazy dataframe.

Analogous to a SQL table or a pandas DataFrame. A table expression contains an ordered set of named columns, each with a single known type. Unless explicitly ordered with an .order_by(), the order of rows is undefined.

Table immutability means that the data underlying an Ibis Table cannot be modified: every method on a Table returns a new Table with those changes. Laziness means that an Ibis Table expression does not run your computation every time you call one of its methods. Instead, it is a symbolic expression that represents a set of operations to be performed, which typically is translated into a SQL query. That SQL query is then executed on a backend, where the data actually lives. The result (now small enough to be manageable) can then be materialized back into python as a pandas/pyarrow/python DataFrame/Column/scalar.

You will not create Table objects directly. Instead, you will create one

See the user guide for more info.

Methods

Name Description
alias Create a table expression with a specific name alias.
as_scalar Inform ibis that the table expression should be treated as a scalar.
count Compute the number of rows in the table.
difference Compute the set difference of multiple table expressions.
distinct Return a Table with duplicate rows removed.
dropna Deprecated - use drop_null instead.
fillna Deprecated - use fill_null instead.
filter Select rows from table based on predicates.
intersect Compute the set intersection of multiple table expressions.
limit Select n rows from self starting at offset.
order_by Sort a table by one or more expressions.
sample Sample a fraction of rows from a table.
select Compute a new table expression using exprs and named_exprs.
sql Run a SQL query against a table expression.
union Compute the set union of multiple table expressions.
view Create a new table expression distinct from the current one.

alias

alias(alias)

Create a table expression with a specific name alias.

This method is useful for exposing an ibis expression to the underlying backend for use in the Table.sql method.

.alias will create a temporary view

.alias creates a temporary view in the database.

This side effect will be removed in a future version of ibis and is not part of the public API.

Parameters

Name Type Description Default
alias str Name of the child expression required

Returns

Name Type Description
Table An table expression

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t = letsql.examples.penguins.fetch()
>>> expr = t.alias("pingüinos").sql('SELECT * FROM "pingüinos" LIMIT 5')
>>> expr
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ species  island     bill_length_mm  bill_depth_mm  flipper_length_mm  body_mass_g  sex     year  ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ stringstringfloat64float64float64float64stringint64 │
├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Adelie Torgersen39.118.7181.03750.0male  2007 │
│ Adelie Torgersen39.517.4186.03800.0female2007 │
│ Adelie Torgersen40.318.0195.03250.0female2007 │
│ Adelie TorgersenNULLNULLNULLNULLNULL2007 │
│ Adelie Torgersen36.719.3193.03450.0female2007 │
└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘

as_scalar

as_scalar()

Inform ibis that the table expression should be treated as a scalar.

Note that the table must have exactly one column and one row for this to work. If the table has more than one column an error will be raised in expression construction time. If the table has more than one row an error will be raised by the backend when the expression is executed.

Returns

Name Type Description
Scalar A scalar subquery

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t = letsql.examples.penguins.fetch()
>>> heavy_gentoo = t.filter(t.species == "Gentoo", t.body_mass_g > 6200)
>>> from_that_island = t.filter(t.island == heavy_gentoo.select("island").as_scalar())
>>> from_that_island.species.value_counts().order_by("species")
┏━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ species  species_count ┃
┡━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ stringint64         │
├─────────┼───────────────┤
│ Adelie 44 │
│ Gentoo 124 │
└─────────┴───────────────┘

count

count(where=None)

Compute the number of rows in the table.

Parameters

Name Type Description Default
where ir.BooleanValue | None Optional boolean expression to filter rows when counting. None

Returns

Name Type Description
IntegerScalar Number of rows in the table

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"a": ["foo", "bar", "baz"]})
>>> t
┏━━━━━━━━┓
┃ a      ┃
┡━━━━━━━━┩
│ string │
├────────┤
│ foo    │
│ bar    │
│ baz    │
└────────┘
>>> t.count()

┌───┐
│ 3 │
└───┘
>>> t.count(t.a != "foo")

┌───┐
│ 2 │
└───┘
>>> type(t.count())
ibis.expr.types.numeric.IntegerScalar

difference

difference(table, *rest, distinct=True)

Compute the set difference of multiple table expressions.

The input tables must have identical schemas.

Parameters

Name Type Description Default
table Table A table expression required
*rest Table Additional table expressions ()
distinct bool Only diff distinct rows not occurring in the calling table True

See Also

ibis.difference

Returns

Name Type Description
Table The rows present in self that are not present in tables.

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t1 = ibis.memtable({"a": [1, 2]})
>>> t1
┏━━━━━━━┓
┃ a     ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│     1 │
│     2 │
└───────┘
>>> t2 = ibis.memtable({"a": [2, 3]})
>>> t2
┏━━━━━━━┓
┃ a     ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│     2 │
│     3 │
└───────┘
>>> t1.difference(t2)
┏━━━━━━━┓
┃ a     ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│     1 │
└───────┘

distinct

distinct(on=None, keep='first')

Return a Table with duplicate rows removed.

Similar to pandas.DataFrame.drop_duplicates().

Some backends do not support keep='last'

Parameters

Name Type Description Default
on str | Iterable[str] | s.Selector | None Only consider certain columns for identifying duplicates. By default deduplicate all of the columns. None
keep Literal['first', 'last'] | None Determines which duplicates to keep. - "first": Drop duplicates except for the first occurrence. - "last": Drop duplicates except for the last occurrence. - None: Drop all duplicates 'first'

Examples

>>> import ibis
>>> import letsql.examples as ex
>>> import ibis.selectors as s
>>> ibis.options.interactive = True
>>> t = ex.penguins.fetch()
>>> t
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ species  island     bill_length_mm  bill_depth_mm  flipper_length_mm  body_mass_g  sex     year  ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ stringstringfloat64float64float64float64stringint64 │
├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Adelie Torgersen39.118.7181.03750.0male  2007 │
│ Adelie Torgersen39.517.4186.03800.0female2007 │
│ Adelie Torgersen40.318.0195.03250.0female2007 │
│ Adelie TorgersenNULLNULLNULLNULLNULL2007 │
│ Adelie Torgersen36.719.3193.03450.0female2007 │
│ Adelie Torgersen39.320.6190.03650.0male  2007 │
│ Adelie Torgersen38.917.8181.03625.0female2007 │
│ Adelie Torgersen39.219.6195.04675.0male  2007 │
│ Adelie Torgersen34.118.1193.03475.0NULL2007 │
│ Adelie Torgersen42.020.2190.04250.0NULL2007 │
│  │
└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘

Compute the distinct rows of a subset of columns

>>> t[["species", "island"]].distinct().order_by(s.all())
┏━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ species    island    ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━┩
│ stringstring    │
├───────────┼───────────┤
│ Adelie   Biscoe    │
│ Adelie   Dream     │
│ Adelie   Torgersen │
│ ChinstrapDream     │
│ Gentoo   Biscoe    │
└───────────┴───────────┘

Drop all duplicate rows except the first

>>> t.distinct(on=["species", "island"], keep="first").order_by(s.all())
┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ species    island     bill_length_mm  bill_depth_mm  flipper_length_mm  body_mass_g  sex     year  ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ stringstringfloat64float64float64float64stringint64 │
├───────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Adelie   Biscoe   37.818.3174.03400.0female2007 │
│ Adelie   Dream    39.516.7178.03250.0female2007 │
│ Adelie   Torgersen39.118.7181.03750.0male  2007 │
│ ChinstrapDream    46.517.9192.03500.0female2007 │
│ Gentoo   Biscoe   46.113.2211.04500.0female2007 │
└───────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘

Drop all duplicate rows except the last

>>> t.distinct(on=["species", "island"], keep="last").order_by(s.all())
┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ species    island     bill_length_mm  bill_depth_mm  flipper_length_mm  body_mass_g  sex     year  ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ stringstringfloat64float64float64float64stringint64 │
├───────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Adelie   Biscoe   42.718.3196.04075.0male  2009 │
│ Adelie   Dream    41.518.5201.04000.0male  2009 │
│ Adelie   Torgersen43.119.2197.03500.0male  2009 │
│ ChinstrapDream    50.218.7198.03775.0female2009 │
│ Gentoo   Biscoe   49.916.1213.05400.0male  2009 │
└───────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘

Drop all duplicated rows

>>> expr = t.distinct(on=["species", "island", "year", "bill_length_mm"], keep=None)
>>> expr.count()

┌─────┐
│ 273 │
└─────┘
>>> t.count()

┌─────┐
│ 344 │
└─────┘

You can pass selectors to on

>>> t.distinct(on=~s.numeric())
┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ species    island     bill_length_mm  bill_depth_mm  flipper_length_mm  body_mass_g  sex     year  ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ stringstringfloat64float64float64float64stringint64 │
├───────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ ChinstrapDream    46.517.9192.03500.0female2007 │
│ ChinstrapDream    50.019.5196.03900.0male  2007 │
│ Adelie   Torgersen39.118.7181.03750.0male  2007 │
│ Adelie   Torgersen39.517.4186.03800.0female2007 │
│ Adelie   Torgersen34.118.1193.03475.0NULL2007 │
│ Gentoo   Biscoe   46.113.2211.04500.0female2007 │
│ Gentoo   Biscoe   50.016.3230.05700.0male  2007 │
│ Gentoo   Biscoe   44.514.3216.04100.0NULL2007 │
│ Adelie   Biscoe   37.818.3174.03400.0female2007 │
│ Adelie   Biscoe   37.718.7180.03600.0male  2007 │
│  │
└───────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘

The only valid values of keep are "first", "last" and .

>>> t.distinct(on="species", keep="second")  
IbisError: Invalid value for `keep`: 'second', must be 'first', 'last' or None

dropna

dropna(subset=None, how='any')

Deprecated - use drop_null instead.

fillna

fillna(replacements)

Deprecated - use fill_null instead.

filter

filter(*predicates)

Select rows from table based on predicates.

Parameters

Name Type Description Default
predicates ir.BooleanValue | Sequence[ir.BooleanValue] | IfAnyAll Boolean value expressions used to select rows in table. ()

Returns

Name Type Description
Table Filtered table expression

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t = letsql.examples.penguins.fetch()
>>> t
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ species  island     bill_length_mm  bill_depth_mm  flipper_length_mm  body_mass_g  sex     year  ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ stringstringfloat64float64float64float64stringint64 │
├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Adelie Torgersen39.118.7181.03750.0male  2007 │
│ Adelie Torgersen39.517.4186.03800.0female2007 │
│ Adelie Torgersen40.318.0195.03250.0female2007 │
│ Adelie TorgersenNULLNULLNULLNULLNULL2007 │
│ Adelie Torgersen36.719.3193.03450.0female2007 │
│ Adelie Torgersen39.320.6190.03650.0male  2007 │
│ Adelie Torgersen38.917.8181.03625.0female2007 │
│ Adelie Torgersen39.219.6195.04675.0male  2007 │
│ Adelie Torgersen34.118.1193.03475.0NULL2007 │
│ Adelie Torgersen42.020.2190.04250.0NULL2007 │
│  │
└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘
>>> t.filter([t.species == "Adelie", t.body_mass_g > 3500]).sex.value_counts().drop_null(
...     "sex"
... ).order_by("sex")
┏━━━━━━━━┳━━━━━━━━━━━┓
┃ sex     sex_count ┃
┡━━━━━━━━╇━━━━━━━━━━━┩
│ stringint64     │
├────────┼───────────┤
│ female22 │
│ male  68 │
└────────┴───────────┘

intersect

intersect(table, *rest, distinct=True)

Compute the set intersection of multiple table expressions.

The input tables must have identical schemas.

Parameters

Name Type Description Default
table Table A table expression required
*rest Table Additional table expressions ()
distinct bool Only return distinct rows True

Returns

Name Type Description
Table A new table containing the intersection of all input tables.

See Also

ibis.intersect

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t1 = ibis.memtable({"a": [1, 2]})
>>> t1
┏━━━━━━━┓
┃ a     ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│     1 │
│     2 │
└───────┘
>>> t2 = ibis.memtable({"a": [2, 3]})
>>> t2
┏━━━━━━━┓
┃ a     ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│     2 │
│     3 │
└───────┘
>>> t1.intersect(t2)
┏━━━━━━━┓
┃ a     ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│     2 │
└───────┘

limit

limit(n, offset=0)

Select n rows from self starting at offset.

The result set is not deterministic without a call to order_by.

Parameters

Name Type Description Default
n int | None Number of rows to include. If None, the entire table is selected starting from offset. required
offset int Number of rows to skip first 0

Returns

Name Type Description
Table The first n rows of self starting at offset

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"a": [1, 1, 2], "b": ["c", "a", "a"]})
>>> t
┏━━━━━━━┳━━━━━━━━┓
┃ a      b      ┃
┡━━━━━━━╇━━━━━━━━┩
│ int64string │
├───────┼────────┤
│     1c      │
│     1a      │
│     2a      │
└───────┴────────┘
>>> t.limit(2)
┏━━━━━━━┳━━━━━━━━┓
┃ a      b      ┃
┡━━━━━━━╇━━━━━━━━┩
│ int64string │
├───────┼────────┤
│     1c      │
│     1a      │
└───────┴────────┘

You can use None with offset to slice starting from a particular row

>>> t.limit(None, offset=1)
┏━━━━━━━┳━━━━━━━━┓
┃ a      b      ┃
┡━━━━━━━╇━━━━━━━━┩
│ int64string │
├───────┼────────┤
│     1a      │
│     2a      │
└───────┴────────┘

See Also

Table.order_by

order_by

order_by(*by)

Sort a table by one or more expressions.

Similar to pandas.DataFrame.sort_values().

Parameters

Name Type Description Default
by str | ir.Column | s.Selector | Sequence[str] | Sequence[ir.Column] | Sequence[s.Selector] | None Expressions to sort the table by. ()

Returns

Name Type Description
Table Sorted table

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable(
...     {
...         "a": [3, 2, 1, 3],
...         "b": ["a", "B", "c", "D"],
...         "c": [4, 6, 5, 7],
...     }
... )
>>> t
┏━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ a      b       c     ┃
┡━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ int64stringint64 │
├───────┼────────┼───────┤
│     3a     4 │
│     2B     6 │
│     1c     5 │
│     3D     7 │
└───────┴────────┴───────┘

Sort by b. Default is ascending. Note how capital letters come before lowercase

>>> t.order_by("b")
┏━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ a      b       c     ┃
┡━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ int64stringint64 │
├───────┼────────┼───────┤
│     2B     6 │
│     3D     7 │
│     3a     4 │
│     1c     5 │
└───────┴────────┴───────┘

Sort in descending order

>>> t.order_by(ibis.desc("b"))
┏━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ a      b       c     ┃
┡━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ int64stringint64 │
├───────┼────────┼───────┤
│     1c     5 │
│     3a     4 │
│     3D     7 │
│     2B     6 │
└───────┴────────┴───────┘

You can also use the deferred API to get the same result

>>> from ibis import _
>>> t.order_by(_.b.desc())
┏━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ a      b       c     ┃
┡━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ int64stringint64 │
├───────┼────────┼───────┤
│     1c     5 │
│     3a     4 │
│     3D     7 │
│     2B     6 │
└───────┴────────┴───────┘

Sort by multiple columns/expressions

>>> t.order_by(["a", _.c.desc()])
┏━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ a      b       c     ┃
┡━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ int64stringint64 │
├───────┼────────┼───────┤
│     1c     5 │
│     2B     6 │
│     3D     7 │
│     3a     4 │
└───────┴────────┴───────┘

You can actually pass arbitrary expressions to use as sort keys. For example, to ignore the case of the strings in column b

>>> t.order_by(_.b.lower())
┏━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ a      b       c     ┃
┡━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ int64stringint64 │
├───────┼────────┼───────┤
│     3a     4 │
│     2B     6 │
│     1c     5 │
│     3D     7 │
└───────┴────────┴───────┘

This means that shuffling a Table is super simple

>>> t.order_by(ibis.random())
┏━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ a      b       c     ┃
┡━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ int64stringint64 │
├───────┼────────┼───────┤
│     3D     7 │
│     3a     4 │
│     2B     6 │
│     1c     5 │
└───────┴────────┴───────┘

Selectors are allowed as sort keys and are a concise way to sort by multiple columns matching some criteria

>>> import ibis.selectors as s
>>> penguins = letsql.examples.penguins.fetch()
>>> penguins[["year", "island"]].value_counts().order_by(s.startswith("year"))
┏━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓
┃ year   island     year_island_count ┃
┡━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩
│ int64stringint64             │
├───────┼───────────┼───────────────────┤
│  2007Torgersen20 │
│  2007Biscoe   44 │
│  2007Dream    46 │
│  2008Torgersen16 │
│  2008Dream    34 │
│  2008Biscoe   64 │
│  2009Torgersen16 │
│  2009Dream    44 │
│  2009Biscoe   60 │
└───────┴───────────┴───────────────────┘

Use the across selector to apply a specific order to multiple columns

>>> penguins[["year", "island"]].value_counts().order_by(
...     s.across(s.startswith("year"), _.desc())
... )
┏━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓
┃ year   island     year_island_count ┃
┡━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩
│ int64stringint64             │
├───────┼───────────┼───────────────────┤
│  2009Biscoe   60 │
│  2009Dream    44 │
│  2009Torgersen16 │
│  2008Biscoe   64 │
│  2008Dream    34 │
│  2008Torgersen16 │
│  2007Dream    46 │
│  2007Biscoe   44 │
│  2007Torgersen20 │
└───────┴───────────┴───────────────────┘

sample

sample(fraction, *, method='row', seed=None)

Sample a fraction of rows from a table.

Results may be non-repeatable

Sampling is by definition a random operation. Some backends support specifying a seed for repeatable results, but not all backends support that option. And some backends (duckdb, for example) do support specifying a seed but may still not have repeatable results in all cases.

In all cases, results are backend-specific. An execution against one backend is unlikely to sample the same rows when executed against a different backend, even with the same seed set.

Parameters

Name Type Description Default
fraction float The percentage of rows to include in the sample, expressed as a float between 0 and 1. required
method Literal['row', 'block'] The sampling method to use. The default is “row”, which includes each row with a probability of fraction. If method is “block”, some backends may instead perform sampling a fraction of blocks of rows (where “block” is a backend dependent definition). This is identical to “row” for backends lacking a blockwise sampling implementation. For those coming from SQL, “row” and “block” correspond to “bernoulli” and “system” respectively in a TABLESAMPLE clause. 'row'
seed int | None An optional random seed to use, for repeatable sampling. The range of possible seed values is backend specific (most support at least [0, 2**31 - 1]). Backends that never support specifying a seed for repeatable sampling will error appropriately. Note that some backends (like DuckDB) do support specifying a seed, but may still not have repeatable results in all cases. None

Returns

Name Type Description
Table The input table, with fraction of rows selected.

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"x": [1, 2, 3, 4], "y": ["a", "b", "c", "d"]})
>>> t
┏━━━━━━━┳━━━━━━━━┓
┃ x      y      ┃
┡━━━━━━━╇━━━━━━━━┩
│ int64string │
├───────┼────────┤
│     1a      │
│     2b      │
│     3c      │
│     4d      │
└───────┴────────┘

Sample approximately half the rows, with a seed specified for reproducibility.

>>> t.sample(0.5, seed=1234)
┏━━━━━━━┳━━━━━━━━┓
┃ x      y      ┃
┡━━━━━━━╇━━━━━━━━┩
│ int64string │
├───────┼────────┤
│     2b      │
│     3c      │
└───────┴────────┘

select

select(*exprs, **named_exprs)

Compute a new table expression using exprs and named_exprs.

Passing an aggregate function to this method will broadcast the aggregate’s value over the number of rows in the table and automatically constructs a window function expression. See the examples section for more details.

For backwards compatibility the keyword argument exprs is reserved and cannot be used to name an expression. This behavior will be removed in v4.

Parameters

Name Type Description Default
exprs ir.Value | str | Iterable[ir.Value | str] Column expression, string, or list of column expressions and strings. ()
named_exprs ir.Value | str Column expressions {}

Returns

Name Type Description
Table Table expression

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t = letsql.examples.penguins.fetch()
>>> t
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ species  island     bill_length_mm  bill_depth_mm  flipper_length_mm  body_mass_g  sex     year  ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ stringstringfloat64float64float64float64stringint64 │
├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Adelie Torgersen39.118.7181.03750.0male  2007 │
│ Adelie Torgersen39.517.4186.03800.0female2007 │
│ Adelie Torgersen40.318.0195.03250.0female2007 │
│ Adelie TorgersenNULLNULLNULLNULLNULL2007 │
│ Adelie Torgersen36.719.3193.03450.0female2007 │
│ Adelie Torgersen39.320.6190.03650.0male  2007 │
│ Adelie Torgersen38.917.8181.03625.0female2007 │
│ Adelie Torgersen39.219.6195.04675.0male  2007 │
│ Adelie Torgersen34.118.1193.03475.0NULL2007 │
│ Adelie Torgersen42.020.2190.04250.0NULL2007 │
│  │
└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘

Simple projection

>>> t.select("island", "bill_length_mm").head()
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┓
┃ island     bill_length_mm ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━┩
│ stringfloat64        │
├───────────┼────────────────┤
│ Torgersen39.1 │
│ Torgersen39.5 │
│ Torgersen40.3 │
│ TorgersenNULL │
│ Torgersen36.7 │
└───────────┴────────────────┘

In that simple case, you could also just use python’s indexing syntax

>>> t[["island", "bill_length_mm"]].head()
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┓
┃ island     bill_length_mm ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━┩
│ stringfloat64        │
├───────────┼────────────────┤
│ Torgersen39.1 │
│ Torgersen39.5 │
│ Torgersen40.3 │
│ TorgersenNULL │
│ Torgersen36.7 │
└───────────┴────────────────┘

Projection by zero-indexed column position

>>> t.select(t[0], t[4]).head()
┏━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓
┃ species  flipper_length_mm ┃
┡━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩
│ stringfloat64           │
├─────────┼───────────────────┤
│ Adelie 181.0 │
│ Adelie 186.0 │
│ Adelie 195.0 │
│ Adelie NULL │
│ Adelie 193.0 │
└─────────┴───────────────────┘

Projection with renaming and compute in one call

>>> t.select(next_year=t.year + 1).head()
┏━━━━━━━━━━━┓
┃ next_year ┃
┡━━━━━━━━━━━┩
│ int64     │
├───────────┤
│      2008 │
│      2008 │
│      2008 │
│      2008 │
│      2008 │
└───────────┘

You can do the same thing with a named expression, and using the deferred API

>>> from ibis import _
>>> t.select((_.year + 1).name("next_year")).head()
┏━━━━━━━━━━━┓
┃ next_year ┃
┡━━━━━━━━━━━┩
│ int64     │
├───────────┤
│      2008 │
│      2008 │
│      2008 │
│      2008 │
│      2008 │
└───────────┘

Projection with aggregation expressions

>>> t.select("island", bill_mean=t.bill_length_mm.mean()).head()
┏━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ island     bill_mean ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━┩
│ stringfloat64   │
├───────────┼───────────┤
│ Torgersen43.92193 │
│ Torgersen43.92193 │
│ Torgersen43.92193 │
│ Torgersen43.92193 │
│ Torgersen43.92193 │
└───────────┴───────────┘

Projection with a selector

>>> import ibis.selectors as s
>>> t.select(s.numeric() & ~s.cols("year")).head()
┏━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ bill_length_mm  bill_depth_mm  flipper_length_mm  body_mass_g ┃
┡━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ float64float64float64float64     │
├────────────────┼───────────────┼───────────────────┼─────────────┤
│           39.118.7181.03750.0 │
│           39.517.4186.03800.0 │
│           40.318.0195.03250.0 │
│           NULLNULLNULLNULL │
│           36.719.3193.03450.0 │
└────────────────┴───────────────┴───────────────────┴─────────────┘

Projection + aggregation across multiple columns

>>> from ibis import _
>>> t.select(s.across(s.numeric() & ~s.cols("year"), _.mean())).head()
┏━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ bill_length_mm  bill_depth_mm  flipper_length_mm  body_mass_g ┃
┡━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ float64float64float64float64     │
├────────────────┼───────────────┼───────────────────┼─────────────┤
│       43.9219317.15117200.9152054201.754386 │
│       43.9219317.15117200.9152054201.754386 │
│       43.9219317.15117200.9152054201.754386 │
│       43.9219317.15117200.9152054201.754386 │
│       43.9219317.15117200.9152054201.754386 │
└────────────────┴───────────────┴───────────────────┴─────────────┘

sql

sql(query, dialect=None)

Run a SQL query against a table expression.

Parameters

Name Type Description Default
query str Query string required
dialect str | None Optional string indicating the dialect of query. Defaults to the backend’s native dialect. None

Returns

Name Type Description
Table An opaque table expression

Examples

>>> import ibis
>>> from ibis import _
>>> ibis.options.interactive = True
>>> t = letsql.examples.penguins.fetch(table_name="penguins")
>>> expr = t.sql(
...     """
...     SELECT island, mean(bill_length_mm) AS avg_bill_length
...     FROM penguins
...     GROUP BY 1
...     ORDER BY 2 DESC
...     """
... )
>>> expr
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ island     avg_bill_length ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
│ stringfloat64         │
├───────────┼─────────────────┤
│ Biscoe   45.257485 │
│ Dream    44.167742 │
│ Torgersen38.950980 │
└───────────┴─────────────────┘

Mix and match ibis expressions with SQL queries

>>> t = letsql.examples.penguins.fetch(table_name="penguins")
>>> expr = t.sql(
...     """
...     SELECT island, mean(bill_length_mm) AS avg_bill_length
...     FROM penguins
...     GROUP BY 1
...     ORDER BY 2 DESC
...     """
... )
>>> expr = expr.mutate(
...     island=_.island.lower(),
...     avg_bill_length=_.avg_bill_length.round(1),
... )
>>> expr
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ island     avg_bill_length ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
│ stringfloat64         │
├───────────┼─────────────────┤
│ torgersen39.0 │
│ biscoe   45.3 │
│ dream    44.2 │
└───────────┴─────────────────┘

Because ibis expressions aren’t named, they aren’t visible to subsequent .sql calls. Use the alias method to assign a name to an expression.

>>> expr.alias("b").sql("SELECT * FROM b WHERE avg_bill_length > 40")
┏━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ island  avg_bill_length ┃
┡━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
│ stringfloat64         │
├────────┼─────────────────┤
│ biscoe45.3 │
│ dream 44.2 │
└────────┴─────────────────┘

See Also

Table.alias

union

union(table, *rest, distinct=False)

Compute the set union of multiple table expressions.

The input tables must have identical schemas.

Parameters

Name Type Description Default
table Table A table expression required
*rest Table Additional table expressions ()
distinct bool Only return distinct rows False

Returns

Name Type Description
Table A new table containing the union of all input tables.

See Also

ibis.union

Examples

>>> import ibis
>>> ibis.options.interactive = True
>>> t1 = ibis.memtable({"a": [1, 2]})
>>> t1
┏━━━━━━━┓
┃ a     ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│     1 │
│     2 │
└───────┘
>>> t2 = ibis.memtable({"a": [2, 3]})
>>> t2
┏━━━━━━━┓
┃ a     ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│     2 │
│     3 │
└───────┘
>>> t1.union(t2)  # union all by default
┏━━━━━━━┓
┃ a     ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│     1 │
│     2 │
│     2 │
│     3 │
└───────┘
>>> t1.union(t2, distinct=True).order_by("a")
┏━━━━━━━┓
┃ a     ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│     1 │
│     2 │
│     3 │
└───────┘

view

view()

Create a new table expression distinct from the current one.

Use this API for any self-referencing operations like a self-join.

Returns

Name Type Description
Table Table expression