SQL for Getting the Most Recent Values for Time Series
Given a table for time-series data, it is often useful to ask what are the most recent values for each entity to get a picture of the state up to a specific time.
For example, take sensor data; tracking attributes of a sensor over time. You might want to query the current state of some sensors.
I work with a lot of IoT data, so this is pretty handy to have at my disposal. Trying to answer the question let me down the following path(s).
Real World Example
Let’s do a little real-world example for something like IoT sensor readings.
Examples are using Postgres 10.11
Create a small schema to support our queries:
CREATE TABLE sensor_readings (
ts timestamp,
sensor_id uuid,
reading float
)
Insert some initial data to the table so we can see what’s going on more visually:
INSERT INTO sensor_readings VALUES
('2020-01-20T13:00:00 +0000', 1, 1),
('2020-01-20T14:00:00 +0000', 2, 3),
('2020-01-20T14:00:00 +0000', 3, 7),
('2020-01-20T15:00:00 +0000', 1, 11),
('2020-01-20T16:00:00 +0000', 2, 13),
('2020-01-20T17:00:00 +0000', 2, 17);
My initial solution for me was a basic correlated subquery.
Also works in Snowflake.
SELECT *
FROM sensor_readings t1
where ts = (
SELECT MAX(ts)
FROM sensor_readings t2
WHERE t1.sensor_id = t2.sensor_id
);
A DBA friend of mine used PARTITION BY
which I’m not so familiar with.
SELECT *
FROM (
SELECT *, ROW_NUMBER() OVER(PARTITION BY sensor_id ORDER BY ts DESC) AS rn
FROM sensor_readings
) t2
WHERE t2.rn = 1;
… more clearly written as:
WITH T AS (
SELECT *, ROW_NUMBER() OVER(PARTITION BY sensor_id ORDER BY ts DESC) AS rn
FROM sensor_readings
)
SELECT * FROM T WHERE rn = 1;