In the mysql feeds post, I mentioned that the publisher could do

SELECT MAX(feed_sync_id)+1 

to find the next feed_sync_id during the publishing process, but this is actually a really bad idea. (And I knew it at the time, so forgive me for selling lies...)


Before we jump into the problematic scenario, I'd like to motivate it with a tiny bit of background.

The republish operation is extremely useful when consumers need to receive updates. It is also extremely simple! A query like

SET v = "jim"
WHERE k = "fred";

is just amended to include feed_sync_id = null, as in:

SET v = "jim"
  , feed_sync_id = null
WHERE k = "fred";

By updating feed_sync_id in the same query (or at least transaction) as the other updates, we can lean on the database's atomicity guarantee to ensure that either both fields get updated successfully or a rollback occurs. In the latter case, the application block further processing until an update occurs. From there, we can guarantee that the feed mechanism will deliver the final update with at-least-once semantics.

at most once?

Consider the following scenario:

t=0: consumer:  initialize with cursor=""
t=1: publisher: KV(k="fred", v="bob", feed_sync_id=None, version=1)
t=2: publisher: KV(k="fred", v="bob", feed_sync_id=1, version=2)
t=3: consumer:  consumes KV(fsi=1, version=2); updates cursor=1
t=4: publisher: KV(k="fred", v="jim", feed_sync_id=None, version=3)
t=5: publisher: KV(k="fred", v="jim", feed_sync_id=1, version=4)
t=6: consumer:  fetches KV w/ fsi > 1; nothing to do!

To summarize, we've republished the last record (by feed_sync_id) after the consumer has already consumed it, and thereby missed the subsequent updates!

That is, the consumer does not see version=4 of KV(k="fred")!

sequence-based updates

To rule this out, we need to ensure that the feed_sync_id monotonically increases, even with arbitrarily republished records.

Introduce a sequences table, like:

CREATE TABLE `sequences` (
  name VARCHAR(255) NOT NULL,
  value BIGINT(20) NOT NULL,
  version BIGINT(20) NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `u_name` (`name`)

where value stores the maximal feed sync id assigned, for a sequence identified by name. The inclusion of the name field allows this table to support multiple feed publishers from a single database instance. The version column can be used for optimistic locking, or simply omitted.

The publish operation will now need to

  1. find the next kv record to publish

    next_id = sql("""
        SELECT id
        FROM kv
        WHERE feed_sync_id IS NULL
        LIMIT 1
  2. query sequences, like

    seq = sql("""
              SELECT *
              FROM sequences
              WHERE = :sn
    next_fsi = seq.value + 1
  3. use next_fsi as the feed_sync_id for the next kv record we publish

        UPDATE kv
        SET feed_sync_id = :fsi
        WHERE id = :id
  4. finally, advance the sequences.value (assuming we actually assigned next_fsi):

        UPDATE sequences
        SET value = :mf
        WHERE name = :sn

Critically, (1)-(4) should occur within a single transaction; otherwise a feed fetch for a given cursor might return different records!