feed sequences
- 3 minutes read - 492 wordsIn the mysql feeds post, I mentioned that the publisher could do
SELECT MAX(feed_sync_id)+1
FROM kv
to find the next feed_sync_id during the publishing process, but
this is actually a really bad idea.
(And I knew it at the time, so forgive me for selling lies…)
Republishing
Before we jump into the problematic scenario, I’d like to motivate it with a tiny bit of background.
The republish operation is extremely useful when consumers need to receive updates. It is also extremely simple! A query like
UPDATE kv
SET v = "jim"
WHERE k = "fred";
is just amended to include feed_sync_id = null, as in:
UPDATE kv
SET v = "jim"
, feed_sync_id = null
WHERE k = "fred";
By updating feed_sync_id in the same query (or at least transaction) as the other updates, we can lean on the database’s atomicity guarantee to ensure that either both fields get updated successfully or a rollback occurs.
In the latter case, the application block further processing until an update occurs.
From there, we can guarantee that the feed mechanism will deliver the final update with at-least-once semantics.
at most once?
Consider the following scenario:
t=0: consumer: initialize with cursor=""
t=1: publisher: KV(k="fred", v="bob", feed_sync_id=None, version=1)
t=2: publisher: KV(k="fred", v="bob", feed_sync_id=1, version=2)
t=3: consumer: consumes KV(fsi=1, version=2); updates cursor=1
t=4: publisher: KV(k="fred", v="jim", feed_sync_id=None, version=3)
t=5: publisher: KV(k="fred", v="jim", feed_sync_id=1, version=4)
t=6: consumer: fetches KV w/ fsi > 1; nothing to do!
To summarize, we’ve republished the last record (by feed_sync_id) after the consumer has already consumed it, and thereby missed the subsequent updates!
That is, the consumer does not see version=4 of KV(k="fred")!
sequence-based updates
To rule this out, we need to ensure that the feed_sync_id monotonically increases, even with arbitrarily republished records.
Introduce a sequences table, like:
CREATE TABLE `sequences` (
id BIGINT(20) NOT NULL AUTO_INCREMENT,
name VARCHAR(255) NOT NULL,
value BIGINT(20) NOT NULL,
version BIGINT(20) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `u_name` (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
where value stores the maximal feed sync id assigned, for a sequence identified by name.
The inclusion of the name field allows this table to support multiple feed publishers from a single database instance.
The version column can be used for optimistic locking, or simply omitted.
The publish operation will now need to
-
find the next
kvrecord to publishnext_id = sql(""" SELECT id FROM kv WHERE feed_sync_id IS NULL LIMIT 1 """) -
query
sequences, likeseq = sql(""" SELECT * FROM sequences WHERE sequences.name = :sn """, sn=sequence_name) next_fsi = seq.value + 1 -
use
next_fsias thefeed_sync_idfor the nextkvrecord we publishsql(""" UPDATE kv SET feed_sync_id = :fsi WHERE id = :id """, fsi=next_fsi, id=next_id) -
finally, advance the
sequences.value(assuming we actually assignednext_fsi):sql(""" UPDATE sequences SET value = :mf WHERE name = :sn """, mf=next_fsi, sn=name)
Critically, (1)-(4) should occur within a single transaction; otherwise a feed fetch for a given cursor might return different records!