Timeseries: what are ASOF joins?

Posted by Edouard on 30 Jan 2020

In QuasarDB 3.5 we introduced ASOF joins. If you're new to timeseries databases, you may be confused about what ASOF joins are, and how useful they are.

In this blog post we'll give you examples of these joins, how they work, what they are for, and what enhancement we introduced in 3.6.

An example from industrial IoT

Aligning sensor data

Let's say we have two sensors, emitting one temperature value per second (1 Hz).

Sensor 1

2020-01-19T14:01:03.013 27.1

2020-01-19T14:01:04.014 27.2

2020-01-19T14:01:05.012 27.3

Sensor 2

2020-01-19T14:01:03.003 28.1

2020-01-19T14:01:04.002 28.2

2020-01-19T14:01:05.005 28.3

Let's put this data into QuasarDB, so you can try at home:

CREATE TABLE sensor1 ($timestamp TIMESTAMP, value DOUBLE)

INSERT INTO sensor1 ($timestamp, value) VALUES

    (2020-01-19T14:01:03.013, 27.1),

    (2020-01-19T14:01:04.014, 27.2),

    (2020-01-19T14:01:05.012, 27.3)

CREATE TABLE sensor2 ($timestamp TIMESTAMP, value DOUBLE)

INSERT INTO sensor2 ($timestamp, value) VALUES

    (2020-01-19T14:01:03.003, 28.1),

    (2020-01-19T14:01:04.002, 28.2),

    (2020-01-19T14:01:05.005, 28.3)

In this example, you can see that the timestamps are not perfectly aligned. Damn you, laws of physics!

As a data scientist, you want to keep the raw data, but you also want to align the timeseries with each other.

You can do that within Python Pandas, which requires downloading the data from the server and then doing the asof in Python. It's neither the fastest, nor the most convenient.

Since QuasarDB 3.5, you can do that directly in the database in typing:

SELECT $timestamp, sensor1.value, sensor2.value FROM sensor1 LEFT ASOF JOIN sensor2

And you get:

$timestamp                     sensor1.value sensor2.value

2020-01-19T14:01:03.013000000Z 27.1           28.1

2020-01-19T14:01:04.014000000Z 27.2           28.2

2020-01-19T14:01:05.012000000Z 27.3           28.3

What ASOF does is it takes the last known value "as of" the considered timestamp. In the above query, for each row of sensor1, we got the last known value of sensor2.

If instead, you want to have each value of sensor1 "as of" the timestamp of each row of sensor2 , you would type

SELECT $timestamp, sensor1.value, sensor2.value FROM sensor1 RIGHT ASOF JOIN sensor2

With the following result:

$timestamp                     sensor1.value sensor2.value

2020-01-19T14:01:03.003000000Z (void)        28.1

2020-01-19T14:01:04.002000000Z 27.1          28.2

2020-01-19T14:01:05.005000000Z 27.2          28.3

In case you want to align in both directions, you would type:

SELECT $timestamp, sensor1.value, sensor2.value FROM sensor1 FULL ASOF JOIN sensor2

With the following result:

$timestamp                     sensor1.value sensor2.value

2020-01-19T14:01:03.003000000Z (void)        28.1

2020-01-19T14:01:03.013000000Z 27.1          28.1

2020-01-19T14:01:04.002000000Z 27.1          28.2

2020-01-19T14:01:04.014000000Z 27.2          28.2

2020-01-19T14:01:05.005000000Z 27.2          28.3

2020-01-19T14:01:05.012000000Z 27.3          28.3

Binding sensor data

Another thing you may want to do is to align the sensor data against a strict timeline.

We introduced in QuasarDB 3.6 a feature that lets you do this.

Imagine you want to have the sensor data aligned to the second, this is how you would do it:

SELECT $timestamp, value FROM sensor1 ASOF JOIN RANGE(2020-1-19T14:01:04, +10s, +1s)

With the following result:

$timestamp                     $table  sensor1.value

2020-01-19T14:01:04.000000000Z sensor1 27.1

2020-01-19T14:01:05.000000000Z sensor1 27.2

2020-01-19T14:01:06.000000000Z sensor1 27.3

2020-01-19T14:01:07.000000000Z sensor1 27.3

2020-01-19T14:01:08.000000000Z sensor1 27.3

2020-01-19T14:01:09.000000000Z sensor1 27.3

2020-01-19T14:01:10.000000000Z sensor1 27.3

2020-01-19T14:01:11.000000000Z sensor1 27.3

2020-01-19T14:01:12.000000000Z sensor1 27.3

2020-01-19T14:01:13.000000000Z sensor1 27.3

An example from capital markets

A very simplistic explanation of stock markets

A stock market works like an auction house. Actors wishing to purchase a security, bid for a price, actors wanting to sell a security, ask for a price.

As you can imagine, several bids and asks occur in parallel at any point in time. However, stock exchanges are bound by law to account for the best bid and best ask. The best bid and best ask form a pair that is called a quote.

In a quote, the bid price is thus the maximum price a buyer is willing to pay, and the ask price the minimum price a seller is ready to take.

A trade occurs when a buyer and a seller agree for a price below the bid and above the ask.

Let the database magic begin!

Now that you're an expert in capital markets, let's have some fun with QuasarDB!

Let's store our quotes (you know, the bid-ask pairs) in a table called "quotes", and our trades in a table called... mmm "trades"?

If you've watched the "Wolf of Wall Street", you know things can get crazy on the trading floor. Tables naming is no exception.

For simplicity sakes, we'll focus on a single security, MMM, because 3M is alphabetically the first company of the DJI. We'll use made-up values.

Here are some quotes:

time                    sym bid    ask

2020-01-29T14:30:00.023 MMM 162.22 162.25

2020-01-29T14:30:00.023 MMM 162.21 162.27

2020-01-29T14:30:00.030 MMM 162.20 162.28

2020-01-29T14:30:00.041 MMM 162.22 162.26

2020-01-29T14:30:00.048 MMM 162.23 162.28

And here are some trades:

time                    sym price  quantity

2020-01-29T14:30:00.023 MMM 162.23 75

2020-01-29T14:30:00.041 MMM 162.24 50

2020-01-29T14:30:00.041 MMM 162.25 100

To help you do this with QuasarDB, here are the queries you should type:

CREATE TABLE quotes ($timestamp TIMESTAMP, sym STRING, bid DOUBLE, ask DOUBLE)

INSERT INTO quotes ($timestamp, sym, bid, ask) VALUES

    (2020-01-29T14:30:00.023, 'MMM', 162.22, 162.25),

    (2020-01-29T14:30:00.023, 'MMM', 162.21, 162.27),

    (2020-01-29T14:30:00.030, 'MMM', 162.20, 162.28),

    (2020-01-29T14:30:00.041, 'MMM', 162.22, 162.26),

    (2020-01-29T14:30:00.048, 'MMM', 162.23, 162.28)

CREATE TABLE trades ($timestamp TIMESTAMP, sym STRING, price DOUBLE, quantity INT64)

INSERT INTO trades ($timestamp, sym, price, quantity) VALUES

    (2020-01-29T14:30:00.023, 'MMM', 162.23, 75),

    (2020-01-29T14:30:00.041, 'MMM', 162.24, 50),

    (2020-01-29T14:30:00.041, 'MMM', 162.25, 100)

Wouldn't it be nice to see where in the bid-ask range (a.k.a spread) each trade occurred?

In one query, you can know

SELECT $timestamp, trades.sym, price, quantity, bid, ask

    FROM trades LEFT ASOF JOIN quotes

With the following result:

$timestamp                     sym price  quantity bid    ask

2020-01-29T14:30:00.023000000Z MMM 162.23 75       162.22 162.25

2020-01-29T14:30:00.041000000Z MMM 162.24 50       162.22 162.26

2020-01-29T14:30:00.041000000Z MMM 162.25 100      162.22 162.26

What about we use the power of arithmetic to compute the spread?

SELECT $timestamp, trades.sym, price, quantity, ask, bid, (ask - bid) AS spread

    FROM trades LEFT ASOF JOIN quotes

With the following result:

$timestamp                    sym  price  quantity ask    bid    spread

2020-01-29T14:30:00.023000000Z MMM 162.23 75       162.25 162.22 0.03

2020-01-29T14:30:00.041000000Z MMM 162.24 50       162.26 162.22 0.04

2020-01-29T14:30:00.041000000Z MMM 162.25 100      162.26 162.22 0.04

Giving it a try

The best way to understand how ASOF joins work is to play yourself with the database, so help yourself with the free community edition!

Topics: quasardb, capital markets, asof