watching progress in mysql
- 3 minutes read - 439 wordsPretty frequently at work, I end up polling a database with some command like
SELECT MAX(id) FROM my_tbl;
SELECT MAX(id) FROM my_tbl;
SELECT MAX(id) FROM my_tbl;
SELECT MAX(id) FROM my_tbl;
-- .... ad nauseam ....
I’ve eventually noticed a few patterns I use pretty consistently:
estimate ETA by including NOW()
/UNIX_TIMESTAMP()
Generally, the point of hovering over the table is to get an estimate of when it will finish/catch up/whatever. For that, you generally want to include a timestamp in the query output, so when you come back a while later, you can know exactly how much time has elapsed. I might transform the above query into
SELECT
NOW()
, UNIX_TIMESTAMP()
, MAX(feed_sync_id)
FROM kv;
Here, I include NOW()
for a human-readable time, UNIX_TIMESTAMP()
to make it easier to estimate the duration between a pair of query results by avoiding the need to add in hr*60*60 + min*60
, and of course the actual value I’m interested in.
roll the starting point into subsequent queries
If I run
mysql root@localhost:traviscj_development> SELECT NOW(), UNIX_TIMESTAMP(), MAX(feed_sync_id) FROM kv;
+---------------------+------------------+-------------------+
| NOW() | UNIX_TIMESTAMP() | MAX(feed_sync_id) |
+---------------------+------------------+-------------------+
| 2019-01-03 20:03:18 | 1546545798 | 15 |
+---------------------+------------------+-------------------+
1 row in set
Time: 0.048s
and I know I need to get through a million feed_sync_id
s, I might further add a couple of clauses like
1e6 - MAX(feed_sync_id) AS remaining
UNIX_TIMESTAMP() - 1546545798 AS elapsed
which can be combined to estimate an ETA if useful.
structured lookback in subsequent queries
We can even take it one step further:
CREATE TEMPORARY TABLE poll_metrics (
metric VARCHAR(255) NOT NULL DEFAULT 'default',
ts TIMESTAMP(3) NOT NULL DEFAULT CURRENT_TIMESTAMP(3),
value DOUBLE NOT NULL DEFAULT '0'
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
This has some nice properties!
- it’s a temporary table, so it doesn’t interfere with anything else or have very much risk of becoming HUGE or anything like that – it’s not even visible to other clients of that database!
- it will automatically populate any of the fields if we know we’re only gathering a single metric…
- but it allows us to collect multiple metrics if we want to!
We can easily populate this from querying a different table, as simply as
INSERT INTO poll_metrics (value)
SELECT 13*(SELECT MAX(feed_sync_id) FROM kv);
or as we want:
INSERT INTO poll_metrics (metric, ts, value)
SELECT
"kv_prog"
, NOW(3)
, (SELECT MAX(feed_sync_id) FROM kv);
Together, those two queries would result in:
mysql root@localhost:traviscj_development> SELECT * FROM poll_metrics;
+---------+----------------------------+-------+
| metric | ts | value |
+---------+----------------------------+-------+
| default | 2019-01-03 20:30:42.693000 | 195.0 |
| kv_prog | 2019-01-03 20:30:59.090000 | 15.0 |
+---------+----------------------------+-------+
2 rows in set
Time: 0.005s
stored in a table for other querying.