feed sequences
- 3 minutes read - 492 wordsIn the mysql feeds post, I mentioned that the publisher could do
SELECT MAX(feed_sync_id)+1
FROM kv
to find the next feed_sync_id
during the publishing process, but
this is actually a really bad idea.
(And I knew it at the time, so forgive me for selling lies…)
Republishing
Before we jump into the problematic scenario, I’d like to motivate it with a tiny bit of background.
The republish operation is extremely useful when consumers need to receive updates. It is also extremely simple! A query like
UPDATE kv
SET v = "jim"
WHERE k = "fred";
is just amended to include feed_sync_id = null
, as in:
UPDATE kv
SET v = "jim"
, feed_sync_id = null
WHERE k = "fred";
By updating feed_sync_id
in the same query (or at least transaction) as the other updates, we can lean on the database’s atomicity guarantee to ensure that either both fields get updated successfully or a rollback occurs.
In the latter case, the application block further processing until an update occurs.
From there, we can guarantee that the feed mechanism will deliver the final update with at-least-once semantics.
at most once?
Consider the following scenario:
t=0: consumer: initialize with cursor=""
t=1: publisher: KV(k="fred", v="bob", feed_sync_id=None, version=1)
t=2: publisher: KV(k="fred", v="bob", feed_sync_id=1, version=2)
t=3: consumer: consumes KV(fsi=1, version=2); updates cursor=1
t=4: publisher: KV(k="fred", v="jim", feed_sync_id=None, version=3)
t=5: publisher: KV(k="fred", v="jim", feed_sync_id=1, version=4)
t=6: consumer: fetches KV w/ fsi > 1; nothing to do!
To summarize, we’ve republished the last record (by feed_sync_id
) after the consumer has already consumed it, and thereby missed the subsequent updates!
That is, the consumer does not see version=4
of KV(k="fred")
!
sequence-based updates
To rule this out, we need to ensure that the feed_sync_id
monotonically increases, even with arbitrarily republished records.
Introduce a sequences
table, like:
CREATE TABLE `sequences` (
id BIGINT(20) NOT NULL AUTO_INCREMENT,
name VARCHAR(255) NOT NULL,
value BIGINT(20) NOT NULL,
version BIGINT(20) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `u_name` (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
where value
stores the maximal feed sync id assigned, for a sequence identified by name
.
The inclusion of the name
field allows this table to support multiple feed publishers from a single database instance.
The version
column can be used for optimistic locking, or simply omitted.
The publish operation will now need to
-
find the next
kv
record to publishnext_id = sql(""" SELECT id FROM kv WHERE feed_sync_id IS NULL LIMIT 1 """)
-
query
sequences
, likeseq = sql(""" SELECT * FROM sequences WHERE sequences.name = :sn """, sn=sequence_name) next_fsi = seq.value + 1
-
use
next_fsi
as thefeed_sync_id
for the nextkv
record we publishsql(""" UPDATE kv SET feed_sync_id = :fsi WHERE id = :id """, fsi=next_fsi, id=next_id)
-
finally, advance the
sequences.value
(assuming we actually assignednext_fsi
):sql(""" UPDATE sequences SET value = :mf WHERE name = :sn """, mf=next_fsi, sn=name)
Critically, (1)-(4) should occur within a single transaction; otherwise a feed fetch for a given cursor might return different records!