In the mysql feeds post, I mentioned that the publisher could do
SELECT MAX(feed_sync_id)+1 FROM kv
to find the next
feed_sync_id during the publishing process, but
this is actually a really bad idea.
(And I knew it at the time, so forgive me for selling lies...)
Before we jump into the problematic scenario, I'd like to motivate it with a tiny bit of background.
The republish operation is extremely useful when consumers need to receive updates. It is also extremely simple! A query like
UPDATE kv SET v = "jim" WHERE k = "fred";
is just amended to include
feed_sync_id = null, as in:
UPDATE kv SET v = "jim" , feed_sync_id = null WHERE k = "fred";
feed_sync_id in the same query (or at least transaction) as the other updates, we can lean on the database's atomicity guarantee to ensure that either both fields get updated successfully or a rollback occurs.
In the latter case, the application block further processing until an update occurs.
From there, we can guarantee that the feed mechanism will deliver the final update with at-least-once semantics.
at most once?
Consider the following scenario:
t=0: consumer: initialize with cursor="" t=1: publisher: KV(k="fred", v="bob", feed_sync_id=None, version=1) t=2: publisher: KV(k="fred", v="bob", feed_sync_id=1, version=2) t=3: consumer: consumes KV(fsi=1, version=2); updates cursor=1 t=4: publisher: KV(k="fred", v="jim", feed_sync_id=None, version=3) t=5: publisher: KV(k="fred", v="jim", feed_sync_id=1, version=4) t=6: consumer: fetches KV w/ fsi > 1; nothing to do!
To summarize, we've republished the last record (by
feed_sync_id) after the consumer has already consumed it, and thereby missed the subsequent updates!
That is, the consumer does not see
To rule this out, we need to ensure that the
feed_sync_id monotonically increases, even with arbitrarily republished records.
sequences table, like:
CREATE TABLE `sequences` ( id BIGINT(20) NOT NULL AUTO_INCREMENT, name VARCHAR(255) NOT NULL, value BIGINT(20) NOT NULL, version BIGINT(20) NOT NULL, PRIMARY KEY (`id`), UNIQUE KEY `u_name` (`name`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
value stores the maximal feed sync id assigned, for a sequence identified by
The inclusion of the
name field allows this table to support multiple feed publishers from a single database instance.
version column can be used for optimistic locking, or simply omitted.
The publish operation will now need to
find the next
kvrecord to publish
next_id = sql(""" SELECT id FROM kv WHERE feed_sync_id IS NULL LIMIT 1 """)
seq = sql(""" SELECT * FROM sequences WHERE sequences.name = :sn """, sn=sequence_name) next_fsi = seq.value + 1
feed_sync_idfor the next
kvrecord we publish
sql(""" UPDATE kv SET feed_sync_id = :fsi WHERE id = :id """, fsi=next_fsi, id=next_id)
finally, advance the
sequences.value(assuming we actually assigned
sql(""" UPDATE sequences SET value = :mf WHERE name = :sn """, mf=next_fsi, sn=name)
Critically, (1)-(4) should occur within a single transaction; otherwise a feed fetch for a given cursor might return different records!