Tables are one of the core data structures in Ibis.
Table
An immutable and lazy dataframe.
Analogous to a SQL table or a pandas DataFrame. A table expression contains an ordered set of named columns , each with a single known type. Unless explicitly ordered with an .order_by()
, the order of rows is undefined.
Table immutability means that the data underlying an Ibis Table
cannot be modified: every method on a Table returns a new Table with those changes. Laziness means that an Ibis Table
expression does not run your computation every time you call one of its methods. Instead, it is a symbolic expression that represents a set of operations to be performed, which typically is translated into a SQL query. That SQL query is then executed on a backend, where the data actually lives. The result (now small enough to be manageable) can then be materialized back into python as a pandas/pyarrow/python DataFrame/Column/scalar.
You will not create Table objects directly. Instead, you will create one
See the user guide for more info.
Methods
alias
Create a table expression with a specific name alias
.
as_scalar
Inform ibis that the table expression should be treated as a scalar.
count
Compute the number of rows in the table.
difference
Compute the set difference of multiple table expressions.
distinct
Return a Table with duplicate rows removed.
dropna
Deprecated - use drop_null
instead.
fillna
Deprecated - use fill_null
instead.
filter
Select rows from table
based on predicates
.
intersect
Compute the set intersection of multiple table expressions.
limit
Select n
rows from self
starting at offset
.
order_by
Sort a table by one or more expressions.
sample
Sample a fraction of rows from a table.
select
Compute a new table expression using exprs
and named_exprs
.
sql
Run a SQL query against a table expression.
union
Compute the set union of multiple table expressions.
view
Create a new table expression distinct from the current one.
alias
Create a table expression with a specific name alias
.
This method is useful for exposing an ibis expression to the underlying backend for use in the Table.sql
method.
.alias
creates a temporary view in the database.
This side effect will be removed in a future version of ibis and is not part of the public API .
Parameters
alias
str
Name of the child expression
required
Returns
Table
An table expression
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = letsql.examples.penguins.fetch()
>>> expr = t.alias("pingüinos" ).sql('SELECT * FROM "pingüinos" LIMIT 5' )
>>> expr
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ species ┃ island ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃ sex ┃ year ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ string │ string │ float64 │ float64 │ float64 │ float64 │ string │ int64 │
├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Adelie │ Torgersen │ 39.1 │ 18.7 │ 181.0 │ 3750.0 │ male │ 2007 │
│ Adelie │ Torgersen │ 39.5 │ 17.4 │ 186.0 │ 3800.0 │ female │ 2007 │
│ Adelie │ Torgersen │ 40.3 │ 18.0 │ 195.0 │ 3250.0 │ female │ 2007 │
│ Adelie │ Torgersen │ NULL │ NULL │ NULL │ NULL │ NULL │ 2007 │
│ Adelie │ Torgersen │ 36.7 │ 19.3 │ 193.0 │ 3450.0 │ female │ 2007 │
└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘
as_scalar
Inform ibis that the table expression should be treated as a scalar.
Note that the table must have exactly one column and one row for this to work. If the table has more than one column an error will be raised in expression construction time. If the table has more than one row an error will be raised by the backend when the expression is executed.
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = letsql.examples.penguins.fetch()
>>> heavy_gentoo = t.filter (t.species == "Gentoo" , t.body_mass_g > 6200 )
>>> from_that_island = t.filter (t.island == heavy_gentoo.select("island" ).as_scalar())
>>> from_that_island.species.value_counts().order_by("species" )
┏━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ species ┃ species_count ┃
┡━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ string │ int64 │
├─────────┼───────────────┤
│ Adelie │ 44 │
│ Gentoo │ 124 │
└─────────┴───────────────┘
count
Compute the number of rows in the table.
Parameters
where
ir .BooleanValue | None
Optional boolean expression to filter rows when counting.
None
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"a" : ["foo" , "bar" , "baz" ]})
>>> t
┏━━━━━━━━┓
┃ a ┃
┡━━━━━━━━┩
│ string │
├────────┤
│ foo │
│ bar │
│ baz │
└────────┘
>>> t.count(t.a != "foo" )
ibis.expr.types.numeric.IntegerScalar
difference
difference(table, * rest, distinct= True )
Compute the set difference of multiple table expressions.
The input tables must have identical schemas.
Parameters
table
Table
A table expression
required
*rest
Table
Additional table expressions
()
distinct
bool
Only diff distinct rows not occurring in the calling table
True
Returns
Table
The rows present in self
that are not present in tables
.
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t1 = ibis.memtable({"a" : [1 , 2 ]})
>>> t1
┏━━━━━━━┓
┃ a ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│ 1 │
│ 2 │
└───────┘
>>> t2 = ibis.memtable({"a" : [2 , 3 ]})
>>> t2
┏━━━━━━━┓
┃ a ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│ 2 │
│ 3 │
└───────┘
┏━━━━━━━┓
┃ a ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│ 1 │
└───────┘
distinct
distinct(on= None , keep= 'first' )
Return a Table with duplicate rows removed.
Similar to pandas.DataFrame.drop_duplicates()
.
Parameters
on
str | Iterable [str ] | s .Selector | None
Only consider certain columns for identifying duplicates. By default deduplicate all of the columns.
None
keep
Literal ['first', 'last'] | None
Determines which duplicates to keep. - "first"
: Drop duplicates except for the first occurrence. - "last"
: Drop duplicates except for the last occurrence. - None
: Drop all duplicates
'first'
Examples
>>> import ibis
>>> import letsql.examples as ex
>>> import ibis.selectors as s
>>> ibis.options.interactive = True
>>> t = ex.penguins.fetch()
>>> t
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ species ┃ island ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃ sex ┃ year ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ string │ string │ float64 │ float64 │ float64 │ float64 │ string │ int64 │
├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Adelie │ Torgersen │ 39.1 │ 18.7 │ 181.0 │ 3750.0 │ male │ 2007 │
│ Adelie │ Torgersen │ 39.5 │ 17.4 │ 186.0 │ 3800.0 │ female │ 2007 │
│ Adelie │ Torgersen │ 40.3 │ 18.0 │ 195.0 │ 3250.0 │ female │ 2007 │
│ Adelie │ Torgersen │ NULL │ NULL │ NULL │ NULL │ NULL │ 2007 │
│ Adelie │ Torgersen │ 36.7 │ 19.3 │ 193.0 │ 3450.0 │ female │ 2007 │
│ Adelie │ Torgersen │ 39.3 │ 20.6 │ 190.0 │ 3650.0 │ male │ 2007 │
│ Adelie │ Torgersen │ 38.9 │ 17.8 │ 181.0 │ 3625.0 │ female │ 2007 │
│ Adelie │ Torgersen │ 39.2 │ 19.6 │ 195.0 │ 4675.0 │ male │ 2007 │
│ Adelie │ Torgersen │ 34.1 │ 18.1 │ 193.0 │ 3475.0 │ NULL │ 2007 │
│ Adelie │ Torgersen │ 42.0 │ 20.2 │ 190.0 │ 4250.0 │ NULL │ 2007 │
│ … │ … │ … │ … │ … │ … │ … │ … │
└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘
Compute the distinct rows of a subset of columns
>>> t[["species" , "island" ]].distinct().order_by(s.all ())
┏━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ species ┃ island ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━┩
│ string │ string │
├───────────┼───────────┤
│ Adelie │ Biscoe │
│ Adelie │ Dream │
│ Adelie │ Torgersen │
│ Chinstrap │ Dream │
│ Gentoo │ Biscoe │
└───────────┴───────────┘
Drop all duplicate rows except the first
>>> t.distinct(on= ["species" , "island" ], keep= "first" ).order_by(s.all ())
┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ species ┃ island ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃ sex ┃ year ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ string │ string │ float64 │ float64 │ float64 │ float64 │ string │ int64 │
├───────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Adelie │ Biscoe │ 37.8 │ 18.3 │ 174.0 │ 3400.0 │ female │ 2007 │
│ Adelie │ Dream │ 39.5 │ 16.7 │ 178.0 │ 3250.0 │ female │ 2007 │
│ Adelie │ Torgersen │ 39.1 │ 18.7 │ 181.0 │ 3750.0 │ male │ 2007 │
│ Chinstrap │ Dream │ 46.5 │ 17.9 │ 192.0 │ 3500.0 │ female │ 2007 │
│ Gentoo │ Biscoe │ 46.1 │ 13.2 │ 211.0 │ 4500.0 │ female │ 2007 │
└───────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘
Drop all duplicate rows except the last
>>> t.distinct(on= ["species" , "island" ], keep= "last" ).order_by(s.all ())
┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ species ┃ island ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃ sex ┃ year ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ string │ string │ float64 │ float64 │ float64 │ float64 │ string │ int64 │
├───────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Adelie │ Biscoe │ 42.7 │ 18.3 │ 196.0 │ 4075.0 │ male │ 2009 │
│ Adelie │ Dream │ 41.5 │ 18.5 │ 201.0 │ 4000.0 │ male │ 2009 │
│ Adelie │ Torgersen │ 43.1 │ 19.2 │ 197.0 │ 3500.0 │ male │ 2009 │
│ Chinstrap │ Dream │ 50.2 │ 18.7 │ 198.0 │ 3775.0 │ female │ 2009 │
│ Gentoo │ Biscoe │ 49.9 │ 16.1 │ 213.0 │ 5400.0 │ male │ 2009 │
└───────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘
Drop all duplicated rows
>>> expr = t.distinct(on= ["species" , "island" , "year" , "bill_length_mm" ], keep= None )
>>> expr.count()
You can pass selectors
to on
>>> t.distinct(on=~ s.numeric())
┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ species ┃ island ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃ sex ┃ year ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ string │ string │ float64 │ float64 │ float64 │ float64 │ string │ int64 │
├───────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Chinstrap │ Dream │ 46.5 │ 17.9 │ 192.0 │ 3500.0 │ female │ 2007 │
│ Chinstrap │ Dream │ 50.0 │ 19.5 │ 196.0 │ 3900.0 │ male │ 2007 │
│ Adelie │ Biscoe │ 37.8 │ 18.3 │ 174.0 │ 3400.0 │ female │ 2007 │
│ Adelie │ Biscoe │ 37.7 │ 18.7 │ 180.0 │ 3600.0 │ male │ 2007 │
│ Adelie │ Dream │ 39.5 │ 16.7 │ 178.0 │ 3250.0 │ female │ 2007 │
│ Adelie │ Dream │ 37.2 │ 18.1 │ 178.0 │ 3900.0 │ male │ 2007 │
│ Adelie │ Dream │ 37.5 │ 18.9 │ 179.0 │ 2975.0 │ NULL │ 2007 │
│ Adelie │ Torgersen │ 39.1 │ 18.7 │ 181.0 │ 3750.0 │ male │ 2007 │
│ Adelie │ Torgersen │ 39.5 │ 17.4 │ 186.0 │ 3800.0 │ female │ 2007 │
│ Adelie │ Torgersen │ 34.1 │ 18.1 │ 193.0 │ 3475.0 │ NULL │ 2007 │
│ … │ … │ … │ … │ … │ … │ … │ … │
└───────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘
The only valid values of keep
are "first"
, "last"
and .
>>> t.distinct(on= "species" , keep= "second" )
IbisError: Invalid value for `keep`: 'second', must be 'first', 'last' or None
dropna
dropna(subset= None , how= 'any' )
Deprecated - use drop_null
instead.
fillna
Deprecated - use fill_null
instead.
filter
Select rows from table
based on predicates
.
Returns
Table
Filtered table expression
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = letsql.examples.penguins.fetch()
>>> t
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ species ┃ island ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃ sex ┃ year ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ string │ string │ float64 │ float64 │ float64 │ float64 │ string │ int64 │
├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Adelie │ Torgersen │ 39.1 │ 18.7 │ 181.0 │ 3750.0 │ male │ 2007 │
│ Adelie │ Torgersen │ 39.5 │ 17.4 │ 186.0 │ 3800.0 │ female │ 2007 │
│ Adelie │ Torgersen │ 40.3 │ 18.0 │ 195.0 │ 3250.0 │ female │ 2007 │
│ Adelie │ Torgersen │ NULL │ NULL │ NULL │ NULL │ NULL │ 2007 │
│ Adelie │ Torgersen │ 36.7 │ 19.3 │ 193.0 │ 3450.0 │ female │ 2007 │
│ Adelie │ Torgersen │ 39.3 │ 20.6 │ 190.0 │ 3650.0 │ male │ 2007 │
│ Adelie │ Torgersen │ 38.9 │ 17.8 │ 181.0 │ 3625.0 │ female │ 2007 │
│ Adelie │ Torgersen │ 39.2 │ 19.6 │ 195.0 │ 4675.0 │ male │ 2007 │
│ Adelie │ Torgersen │ 34.1 │ 18.1 │ 193.0 │ 3475.0 │ NULL │ 2007 │
│ Adelie │ Torgersen │ 42.0 │ 20.2 │ 190.0 │ 4250.0 │ NULL │ 2007 │
│ … │ … │ … │ … │ … │ … │ … │ … │
└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘
>>> t.filter ([t.species == "Adelie" , t.body_mass_g > 3500 ]).sex.value_counts().drop_null(
... "sex"
... ).order_by("sex" )
┏━━━━━━━━┳━━━━━━━━━━━┓
┃ sex ┃ sex_count ┃
┡━━━━━━━━╇━━━━━━━━━━━┩
│ string │ int64 │
├────────┼───────────┤
│ female │ 22 │
│ male │ 68 │
└────────┴───────────┘
intersect
intersect(table, * rest, distinct= True )
Compute the set intersection of multiple table expressions.
The input tables must have identical schemas.
Parameters
table
Table
A table expression
required
*rest
Table
Additional table expressions
()
distinct
bool
Only return distinct rows
True
Returns
Table
A new table containing the intersection of all input tables.
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t1 = ibis.memtable({"a" : [1 , 2 ]})
>>> t1
┏━━━━━━━┓
┃ a ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│ 1 │
│ 2 │
└───────┘
>>> t2 = ibis.memtable({"a" : [2 , 3 ]})
>>> t2
┏━━━━━━━┓
┃ a ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│ 2 │
│ 3 │
└───────┘
┏━━━━━━━┓
┃ a ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│ 2 │
└───────┘
limit
Select n
rows from self
starting at offset
.
Parameters
n
int | None
Number of rows to include. If None
, the entire table is selected starting from offset
.
required
offset
int
Number of rows to skip first
0
Returns
Table
The first n
rows of self
starting at offset
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"a" : [1 , 1 , 2 ], "b" : ["c" , "a" , "a" ]})
>>> t
┏━━━━━━━┳━━━━━━━━┓
┃ a ┃ b ┃
┡━━━━━━━╇━━━━━━━━┩
│ int64 │ string │
├───────┼────────┤
│ 1 │ c │
│ 1 │ a │
│ 2 │ a │
└───────┴────────┘
┏━━━━━━━┳━━━━━━━━┓
┃ a ┃ b ┃
┡━━━━━━━╇━━━━━━━━┩
│ int64 │ string │
├───────┼────────┤
│ 1 │ c │
│ 1 │ a │
└───────┴────────┘
You can use None
with offset
to slice starting from a particular row
>>> t.limit(None , offset= 1 )
┏━━━━━━━┳━━━━━━━━┓
┃ a ┃ b ┃
┡━━━━━━━╇━━━━━━━━┩
│ int64 │ string │
├───────┼────────┤
│ 1 │ a │
│ 2 │ a │
└───────┴────────┘
order_by
Sort a table by one or more expressions.
Similar to pandas.DataFrame.sort_values()
.
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable(
... {
... "a" : [3 , 2 , 1 , 3 ],
... "b" : ["a" , "B" , "c" , "D" ],
... "c" : [4 , 6 , 5 , 7 ],
... }
... )
>>> t
┏━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ a ┃ b ┃ c ┃
┡━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ int64 │ string │ int64 │
├───────┼────────┼───────┤
│ 3 │ a │ 4 │
│ 2 │ B │ 6 │
│ 1 │ c │ 5 │
│ 3 │ D │ 7 │
└───────┴────────┴───────┘
Sort by b. Default is ascending. Note how capital letters come before lowercase
┏━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ a ┃ b ┃ c ┃
┡━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ int64 │ string │ int64 │
├───────┼────────┼───────┤
│ 2 │ B │ 6 │
│ 3 │ D │ 7 │
│ 3 │ a │ 4 │
│ 1 │ c │ 5 │
└───────┴────────┴───────┘
Sort in descending order
>>> t.order_by(ibis.desc("b" ))
┏━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ a ┃ b ┃ c ┃
┡━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ int64 │ string │ int64 │
├───────┼────────┼───────┤
│ 1 │ c │ 5 │
│ 3 │ a │ 4 │
│ 3 │ D │ 7 │
│ 2 │ B │ 6 │
└───────┴────────┴───────┘
You can also use the deferred API to get the same result
>>> from ibis import _
>>> t.order_by(_.b.desc())
┏━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ a ┃ b ┃ c ┃
┡━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ int64 │ string │ int64 │
├───────┼────────┼───────┤
│ 1 │ c │ 5 │
│ 3 │ a │ 4 │
│ 3 │ D │ 7 │
│ 2 │ B │ 6 │
└───────┴────────┴───────┘
Sort by multiple columns/expressions
>>> t.order_by(["a" , _.c.desc()])
┏━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ a ┃ b ┃ c ┃
┡━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ int64 │ string │ int64 │
├───────┼────────┼───────┤
│ 1 │ c │ 5 │
│ 2 │ B │ 6 │
│ 3 │ D │ 7 │
│ 3 │ a │ 4 │
└───────┴────────┴───────┘
You can actually pass arbitrary expressions to use as sort keys. For example, to ignore the case of the strings in column b
>>> t.order_by(_.b.lower())
┏━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ a ┃ b ┃ c ┃
┡━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ int64 │ string │ int64 │
├───────┼────────┼───────┤
│ 3 │ a │ 4 │
│ 2 │ B │ 6 │
│ 1 │ c │ 5 │
│ 3 │ D │ 7 │
└───────┴────────┴───────┘
This means that shuffling a Table is super simple
>>> t.order_by(ibis.random())
┏━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ a ┃ b ┃ c ┃
┡━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ int64 │ string │ int64 │
├───────┼────────┼───────┤
│ 2 │ B │ 6 │
│ 3 │ a │ 4 │
│ 1 │ c │ 5 │
│ 3 │ D │ 7 │
└───────┴────────┴───────┘
Selectors are allowed as sort keys and are a concise way to sort by multiple columns matching some criteria
>>> import ibis.selectors as s
>>> penguins = letsql.examples.penguins.fetch()
>>> penguins[["year" , "island" ]].value_counts().order_by(s.startswith("year" ))
┏━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓
┃ year ┃ island ┃ year_island_count ┃
┡━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩
│ int64 │ string │ int64 │
├───────┼───────────┼───────────────────┤
│ 2007 │ Torgersen │ 20 │
│ 2007 │ Biscoe │ 44 │
│ 2007 │ Dream │ 46 │
│ 2008 │ Torgersen │ 16 │
│ 2008 │ Dream │ 34 │
│ 2008 │ Biscoe │ 64 │
│ 2009 │ Torgersen │ 16 │
│ 2009 │ Dream │ 44 │
│ 2009 │ Biscoe │ 60 │
└───────┴───────────┴───────────────────┘
Use the across
selector to apply a specific order to multiple columns
>>> penguins[["year" , "island" ]].value_counts().order_by(
... s.across(s.startswith("year" ), _.desc())
... )
┏━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓
┃ year ┃ island ┃ year_island_count ┃
┡━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩
│ int64 │ string │ int64 │
├───────┼───────────┼───────────────────┤
│ 2009 │ Biscoe │ 60 │
│ 2009 │ Dream │ 44 │
│ 2009 │ Torgersen │ 16 │
│ 2008 │ Biscoe │ 64 │
│ 2008 │ Dream │ 34 │
│ 2008 │ Torgersen │ 16 │
│ 2007 │ Dream │ 46 │
│ 2007 │ Biscoe │ 44 │
│ 2007 │ Torgersen │ 20 │
└───────┴───────────┴───────────────────┘
sample
sample(fraction, * , method= 'row' , seed= None )
Sample a fraction of rows from a table.
Sampling is by definition a random operation. Some backends support specifying a seed
for repeatable results, but not all backends support that option. And some backends (duckdb, for example) do support specifying a seed but may still not have repeatable results in all cases.
In all cases, results are backend-specific. An execution against one backend is unlikely to sample the same rows when executed against a different backend, even with the same seed
set.
Parameters
fraction
float
The percentage of rows to include in the sample, expressed as a float between 0 and 1.
required
method
Literal ['row', 'block']
The sampling method to use. The default is “row”, which includes each row with a probability of fraction
. If method is “block”, some backends may instead perform sampling a fraction of blocks of rows (where “block” is a backend dependent definition). This is identical to “row” for backends lacking a blockwise sampling implementation. For those coming from SQL, “row” and “block” correspond to “bernoulli” and “system” respectively in a TABLESAMPLE clause.
'row'
seed
int | None
An optional random seed to use, for repeatable sampling. The range of possible seed values is backend specific (most support at least [0, 2**31 - 1]
). Backends that never support specifying a seed for repeatable sampling will error appropriately. Note that some backends (like DuckDB) do support specifying a seed, but may still not have repeatable results in all cases.
None
Returns
Table
The input table, with fraction
of rows selected.
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.memtable({"x" : [1 , 2 , 3 , 4 ], "y" : ["a" , "b" , "c" , "d" ]})
>>> t
┏━━━━━━━┳━━━━━━━━┓
┃ x ┃ y ┃
┡━━━━━━━╇━━━━━━━━┩
│ int64 │ string │
├───────┼────────┤
│ 1 │ a │
│ 2 │ b │
│ 3 │ c │
│ 4 │ d │
└───────┴────────┘
Sample approximately half the rows, with a seed specified for reproducibility.
>>> t.sample(0.5 , seed= 1234 )
┏━━━━━━━┳━━━━━━━━┓
┃ x ┃ y ┃
┡━━━━━━━╇━━━━━━━━┩
│ int64 │ string │
├───────┼────────┤
│ 2 │ b │
│ 3 │ c │
└───────┴────────┘
select
select(* exprs, ** named_exprs)
Compute a new table expression using exprs
and named_exprs
.
Passing an aggregate function to this method will broadcast the aggregate’s value over the number of rows in the table and automatically constructs a window function expression. See the examples section for more details.
For backwards compatibility the keyword argument exprs
is reserved and cannot be used to name an expression. This behavior will be removed in v4.
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t = letsql.examples.penguins.fetch()
>>> t
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ species ┃ island ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃ sex ┃ year ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ string │ string │ float64 │ float64 │ float64 │ float64 │ string │ int64 │
├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Adelie │ Torgersen │ 39.1 │ 18.7 │ 181.0 │ 3750.0 │ male │ 2007 │
│ Adelie │ Torgersen │ 39.5 │ 17.4 │ 186.0 │ 3800.0 │ female │ 2007 │
│ Adelie │ Torgersen │ 40.3 │ 18.0 │ 195.0 │ 3250.0 │ female │ 2007 │
│ Adelie │ Torgersen │ NULL │ NULL │ NULL │ NULL │ NULL │ 2007 │
│ Adelie │ Torgersen │ 36.7 │ 19.3 │ 193.0 │ 3450.0 │ female │ 2007 │
│ Adelie │ Torgersen │ 39.3 │ 20.6 │ 190.0 │ 3650.0 │ male │ 2007 │
│ Adelie │ Torgersen │ 38.9 │ 17.8 │ 181.0 │ 3625.0 │ female │ 2007 │
│ Adelie │ Torgersen │ 39.2 │ 19.6 │ 195.0 │ 4675.0 │ male │ 2007 │
│ Adelie │ Torgersen │ 34.1 │ 18.1 │ 193.0 │ 3475.0 │ NULL │ 2007 │
│ Adelie │ Torgersen │ 42.0 │ 20.2 │ 190.0 │ 4250.0 │ NULL │ 2007 │
│ … │ … │ … │ … │ … │ … │ … │ … │
└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘
Simple projection
>>> t.select("island" , "bill_length_mm" ).head()
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┓
┃ island ┃ bill_length_mm ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━┩
│ string │ float64 │
├───────────┼────────────────┤
│ Torgersen │ 39.1 │
│ Torgersen │ 39.5 │
│ Torgersen │ 40.3 │
│ Torgersen │ NULL │
│ Torgersen │ 36.7 │
└───────────┴────────────────┘
In that simple case, you could also just use python’s indexing syntax
>>> t[["island" , "bill_length_mm" ]].head()
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┓
┃ island ┃ bill_length_mm ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━┩
│ string │ float64 │
├───────────┼────────────────┤
│ Torgersen │ 39.1 │
│ Torgersen │ 39.5 │
│ Torgersen │ 40.3 │
│ Torgersen │ NULL │
│ Torgersen │ 36.7 │
└───────────┴────────────────┘
Projection by zero-indexed column position
>>> t.select(t[0 ], t[4 ]).head()
┏━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓
┃ species ┃ flipper_length_mm ┃
┡━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩
│ string │ float64 │
├─────────┼───────────────────┤
│ Adelie │ 181.0 │
│ Adelie │ 186.0 │
│ Adelie │ 195.0 │
│ Adelie │ NULL │
│ Adelie │ 193.0 │
└─────────┴───────────────────┘
Projection with renaming and compute in one call
>>> t.select(next_year= t.year + 1 ).head()
┏━━━━━━━━━━━┓
┃ next_year ┃
┡━━━━━━━━━━━┩
│ int64 │
├───────────┤
│ 2008 │
│ 2008 │
│ 2008 │
│ 2008 │
│ 2008 │
└───────────┘
You can do the same thing with a named expression, and using the deferred API
>>> from ibis import _
>>> t.select((_.year + 1 ).name("next_year" )).head()
┏━━━━━━━━━━━┓
┃ next_year ┃
┡━━━━━━━━━━━┩
│ int64 │
├───────────┤
│ 2008 │
│ 2008 │
│ 2008 │
│ 2008 │
│ 2008 │
└───────────┘
Projection with aggregation expressions
>>> t.select("island" , bill_mean= t.bill_length_mm.mean()).head()
┏━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ island ┃ bill_mean ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━┩
│ string │ float64 │
├───────────┼───────────┤
│ Torgersen │ 43.92193 │
│ Torgersen │ 43.92193 │
│ Torgersen │ 43.92193 │
│ Torgersen │ 43.92193 │
│ Torgersen │ 43.92193 │
└───────────┴───────────┘
Projection with a selector
>>> import ibis.selectors as s
>>> t.select(s.numeric() & ~ s.cols("year" )).head()
┏━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃
┡━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ float64 │ float64 │ float64 │ float64 │
├────────────────┼───────────────┼───────────────────┼─────────────┤
│ 39.1 │ 18.7 │ 181.0 │ 3750.0 │
│ 39.5 │ 17.4 │ 186.0 │ 3800.0 │
│ 40.3 │ 18.0 │ 195.0 │ 3250.0 │
│ NULL │ NULL │ NULL │ NULL │
│ 36.7 │ 19.3 │ 193.0 │ 3450.0 │
└────────────────┴───────────────┴───────────────────┴─────────────┘
Projection + aggregation across multiple columns
>>> from ibis import _
>>> t.select(s.across(s.numeric() & ~ s.cols("year" ), _.mean())).head()
┏━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃
┡━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ float64 │ float64 │ float64 │ float64 │
├────────────────┼───────────────┼───────────────────┼─────────────┤
│ 43.92193 │ 17.15117 │ 200.915205 │ 4201.754386 │
│ 43.92193 │ 17.15117 │ 200.915205 │ 4201.754386 │
│ 43.92193 │ 17.15117 │ 200.915205 │ 4201.754386 │
│ 43.92193 │ 17.15117 │ 200.915205 │ 4201.754386 │
│ 43.92193 │ 17.15117 │ 200.915205 │ 4201.754386 │
└────────────────┴───────────────┴───────────────────┴─────────────┘
sql
Run a SQL query against a table expression.
Parameters
query
str
Query string
required
dialect
str | None
Optional string indicating the dialect of query
. Defaults to the backend’s native dialect.
None
Returns
Table
An opaque table expression
Examples
>>> import ibis
>>> from ibis import _
>>> ibis.options.interactive = True
>>> t = letsql.examples.penguins.fetch(table_name= "penguins" )
>>> expr = t.sql(
... """
... SELECT island, mean(bill_length_mm) AS avg_bill_length
... FROM penguins
... GROUP BY 1
... ORDER BY 2 DESC
... """
... )
>>> expr
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ island ┃ avg_bill_length ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
│ string │ float64 │
├───────────┼─────────────────┤
│ Biscoe │ 45.257485 │
│ Dream │ 44.167742 │
│ Torgersen │ 38.950980 │
└───────────┴─────────────────┘
Mix and match ibis expressions with SQL queries
>>> t = letsql.examples.penguins.fetch(table_name= "penguins" )
>>> expr = t.sql(
... """
... SELECT island, mean(bill_length_mm) AS avg_bill_length
... FROM penguins
... GROUP BY 1
... ORDER BY 2 DESC
... """
... )
>>> expr = expr.mutate(
... island= _.island.lower(),
... avg_bill_length= _.avg_bill_length.round (1 ),
... )
>>> expr
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ island ┃ avg_bill_length ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
│ string │ float64 │
├───────────┼─────────────────┤
│ biscoe │ 45.3 │
│ dream │ 44.2 │
│ torgersen │ 39.0 │
└───────────┴─────────────────┘
Because ibis expressions aren’t named, they aren’t visible to subsequent .sql
calls. Use the alias
method to assign a name to an expression.
>>> expr.alias("b" ).sql("SELECT * FROM b WHERE avg_bill_length > 40" )
┏━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ island ┃ avg_bill_length ┃
┡━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
│ string │ float64 │
├────────┼─────────────────┤
│ biscoe │ 45.3 │
│ dream │ 44.2 │
└────────┴─────────────────┘
union
union(table, * rest, distinct= False )
Compute the set union of multiple table expressions.
The input tables must have identical schemas.
Parameters
table
Table
A table expression
required
*rest
Table
Additional table expressions
()
distinct
bool
Only return distinct rows
False
Returns
Table
A new table containing the union of all input tables.
Examples
>>> import ibis
>>> ibis.options.interactive = True
>>> t1 = ibis.memtable({"a" : [1 , 2 ]})
>>> t1
┏━━━━━━━┓
┃ a ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│ 1 │
│ 2 │
└───────┘
>>> t2 = ibis.memtable({"a" : [2 , 3 ]})
>>> t2
┏━━━━━━━┓
┃ a ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│ 2 │
│ 3 │
└───────┘
>>> t1.union(t2) # union all by default
┏━━━━━━━┓
┃ a ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│ 1 │
│ 2 │
│ 2 │
│ 3 │
└───────┘
>>> t1.union(t2, distinct= True ).order_by("a" )
┏━━━━━━━┓
┃ a ┃
┡━━━━━━━┩
│ int64 │
├───────┤
│ 1 │
│ 2 │
│ 3 │
└───────┘
view
Create a new table expression distinct from the current one.
Use this API for any self-referencing operations like a self-join.