Concepts

What is LETSQL?

LETSQL is a query engine with a pythonic dataframe interface, built on top of DataFusion that can be used to write multi-engine workflows.

What is not LETSQL?

Is not a dataframe library, while it provides a familiar pythonic dataframe interface, LETSQL is equipped with a query optimizer and can provide in-situ and federated query processing.

Why LETSQL?

By using LETSQL, you will:

  • Reduce errors thanks to a better Pythonic UX.
  • Accelerate the development process by lowering the cognitive burden induced by using multiple interacting data systems.
  • Gain in security by providing in-situ processing (the data does not move).
  • Improve performance by avoiding data transfer and redundant operations.
  • Reduce costs by easily swapping to the cheapest tool available.

What is Multi-Engine?

What makes LETSQL stand-out against other Ibis backends, is that it can be use to build multi-engine workflows, by multi-engine it means that it can an Ibis expression involving multiple backends in an optimal manner, segmenting the expression and executing each part in-situ on the corresponding backend.

For the following example we are going to use an Ibis table from a Postgres connection and perform a join with an in-memory pandas DataFrame.

import pathlib

import ibis
import pandas as pd

import letsql as ls

ibis.options.interactive = True

# create the letsql connection
con = ls.connect()


# create the Ibis Postgres connection
pg = ibis.postgres.connect(
    host="localhost",
    port=5432,
    user="postgres",
    password="postgres",
    database="ibis_testing",
)


# register Postgres table
batting = con.register(pg.table("batting"), table_name="batting")

# register csv file
df = pd.read_csv("https://raw.githubusercontent.com/ibis-project/testing-data/master/csv/awards_players.csv")
awards_players = con.register(df, table_name="awards_players")

left = batting[batting.yearID == 2015]
right = awards_players[awards_players.lgID == "NL"].drop("yearID", "lgID").execute()

left.join(right, ["playerID"], how="semi")
/home/runner/.cache/pypoetry/virtualenvs/letsql-uJiiVu6z-py3.10/lib/python3.10/site-packages/ibis/expr/types/relations.py:685: FutureWarning: Selecting/filtering arbitrary expressions in `Table.__getitem__` is deprecated and will be removed in version 10.0. Please use `Table.select` or `Table.filter` instead.
  warnings.warn(
┏━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━┓
┃ playerID   yearID  stint  teamID  lgID    G      AB     R      H      X2B    X3B    HR     RBI    SB     CS     BB     SO     IBB    HBP    SH     SF     GIDP  ┃
┡━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━┩
│ stringint64int64stringstringint64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64int64 │
├───────────┼────────┼───────┼────────┼────────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┤
│ alvarpe0120151PIT   NL    150437601061802777204813192046 │
│ arenano0120151COL   NL    1576169717743442130253411013401117 │
│ arrieja0120151CHN   NL    337951211220014500301 │
│ axforjo0120151COL   NL    600000000000000000 │
│ barneda0120151LAN   NL    24000000000000000 │
│ barneda0120152TOR   AL    1523471024001200200 │
│ beltrad0120151TEX   AL    143567831633241883104165430818 │
│ beltrca0120151NYA   AL    133478571323411967004585220612 │
│ bournmi0120151CLE   AL    952892971121019135297600712 │
│ bournmi0120152ATL   NL    4613610303101142173100123 │
│  │
└───────────┴────────┴───────┴────────┴────────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┘