I realized I’ve left out a major part from my sequence of previous feed-related posts: a justification for why we should bother with a separate
So let’s give it a shot!
The fundamental problem is:
AUTO_INCREMENT ids are assigned in insertion order, but become visible to other threads in commit order.
To see how this causes a problem, consider the interactions and visibilities between three transactions to the same database:
t0: TRX0: BEGIN; INSERT INTO kv (ns, k, v) VALUES ("-", "k0", "v0"); COMMIT;
t1: TRX1: BEGIN; INSERT INTO kv (ns, k, v) VALUES ("-", "k1", "v1");
t2: TRX2: BEGIN; INSERT INTO kv (ns, k, v) VALUES ("-", "k2", "v2");
t3: TRX0: SELECT MAX(id) FROM kv;
t4: TRX2: COMMIT;
t5: TRX0: SELECT MAX(id) FROM kv;
t6: TRX1: COMMIT;
t7: TRX0: SELECT MAX(id) FROM kv;
Here, we have two transactions that both insert a new
The database has to assign an
id value to each of those records, because we might be creating other associations to those records in our application code.
But other threads –
TRX0 in this case – shouldn’t be able to see those records until we
COMMIT, and so indeed the
t=t3 might return
Next, we commit
This was the third insert the database saw, so we expect
MAX(id) to be
3, and indeed, it is, as verified at
Finally, we commit
Consumers will now be able to see an
id=2 record – hooray!
Except: if we applied a feed-style consumption based on
id instead of
feed_sync_id, we have introduced the possibility of skipping records.
To see how this happens, consider a feed-style fetch w/
We will run a query like
WHERE id > :cursor
ORDER BY id
KV(id=1, ns=-, k=k0, v=v0)
KV(id=3, ns=-, k=k2, v=v2)
Note that there is no
KV(id=2) record committed yet, so this response is correct!
The problem comes from subsequent fetches: In particular, we will record
3 as the cursor to use for the next fetch.
This will prevent the subsequent fetch (with
cursor:=3) from ever observing the
This is pretty bad, and that’s why we pay the penalty of storing a separately assigned
After all the updates, I figure it’d be helpful to give a reference for the current schema we’ve built up over all the blog posts mentioned above:
CREATE TABLE `kv` (
`id` bigint(22) NOT NULL AUTO_INCREMENT,
`created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`updated_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`feed_sync_id` bigint(22) DEFAULT NULL,
`shard` int(11) DEFAULT '0',
`ns` varchar(255) COLLATE utf8_bin NOT NULL DEFAULT '',
`k` varchar(255) COLLATE utf8_bin NOT NULL,
`v` longblob NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `u_ns_k` (`ns`,`k`),
UNIQUE KEY `u_fsi` (`feed_sync_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin;