Transacter and Intent/Result
- 6 minutes read - 1122 wordsAt SQ, we had a family of Transacter
interfaces:
public interface Transacter {
void transaction(Function<Session, Void> session);
public interface Session {
DSLContext dsl();
}
}
where DSLContext
is a jOOQ concept & is the handle for doing work against the database.
Then this would be used in some class like
public class KvDao {
@Inject Transacter transacter;
void put(String ns, String k, String v) {
transacter.transaction(session -> {
KvRecord record = new KvRecord(ns, k, v);
session.dsl().insertInto(KV).set(record).execute();
});
}
}
The parameters ns
, k
, and v
are shorthand for namespace
, key
, and value
, respectively.
We’ll show how these might be used together shortly!
Transacter
was essentially the only way we ever interacted with the underlying database.
Implementations of this interface handled a few things:
- getting a connection from the connection pool, if the current thread didn’t already have one
- retrying optimistic lock exceptions with some bounded, nondeterministic retry interval
- perhaps most importantly: managing
BEGIN
andCOMMIT
/ROLLBACK
commands, or equivalently:Transactor
manages transactions.
situation: chained calls, mixed data sources
One place this could break down pretty easily was when database updates and API calls (e.g. to third party services, or other internal microservices, or whatever) were interleaved:
public class KvDao {
@Inject Transacter transacter;
@Inject Gson gson;
void complexPut(String k, String v) {
transacter.transaction(session -> {
KvRecord r0 = new KvRecord("intent", k, v);
session.dsl().insertInto(KV).set(r0).execute();
Object result = doSomeApiCall(v);
KvRecord r1 = new KvRecord("result", k, gson.toJson(result));
session.dsl().insertInto(KV).set(r1).execute();
});
}
}
We’re using the ns=intent
and ns=result
to persist a representation of data we need to make the call & what the call returns, respectively.
sidebar: intents & results
Why even bother writing two records above?
Why not a single kv
record with intent
and result
subfields, or a better data model?
The guiding principal here is: “Write a new record whenever new information becomes known.”
Without abiding by this principal, we’re left not knowing for sure when we discovered a new bit of information without a detailed analysis of the data schema and code that interacts with it.
Furthermore, once we get in the business of updating records in-place, we’ll probably eventually end up losing some information.
Often this manifests with a data model (call it foos
) that has some state
field – perhaps INITIAL
transitions to IN_PROGRESS
before COMPLETE
; as we progress:
; t=t1
INSERT foos SET (id, ..., state) = ("fredbob", ..., "INITIAL");
; t=t2
UPDATE foos SET state = "IN_PROGRESS" WHERE id = "fredbob";
; t=t3
UPDATE foos SET state = "COMPLETE" WHERE id = "fredbob";
; t=t4
Even assuming appropriately defined foos.{created_at,updated_at}
fields, we can’t know what time we transitioned to IN_PROGRESS
, or even whether that happened!
This presents problems for analyzing data after-the-fact & massively complicates training machine learning models on this data, as they critically depend on a precise understanding of what happened & when to avoid time leakage.
Furthermore: frequently, the “intent” information is also precisely the information needed to resume execution when doSomeApiCall()
recovers.
Especially when that data is expensive to recompute (or perhaps even no longer available!), a saved intent
is a compelling & simple alternative.
problem 1: exceptions => rollbacks
Suppose we can reach the database but find ourselves partitioned away from doSomeApiCall()
& the method call throws an exception.
What happens then?
Well, remember, we automatically started a transaction & inserted the first record in that context.
However, the exception triggers a ROLLBACK
command to be sent to the database, so we never commit the ns=intent
record!
This is the correct behavior, but it was a major surprise for me to wrap my head around – everything I’d ever played with apparently had autocommit turned on, or I’d simply missed
- Probably this behavior wouldn’t have been a surprise if I’d taken a database fundamentals class, but I really struggled with this.
problem 2: unbounded delays
It turns out the exception case might not even the worst problem here:
What happens if doSomeApiCall()
simply never terminates, or takes a very long time?
I’d argue that this is even worse, because now in addition to other connections not being able to see the ns=intent
record,
we’re also holding a transaction open for an arbitrarily long time.
In a write-ahead log (WAL)-based system – which is pretty much anything with meaningful ACID guarantees – we may need to iterate over the WAL for any database interaction.
In a high-traffic system without [wastefully] significant headroom, this can quickly become catastrophic as the extra work incurred can easily saturate an IO or CPU quota or limitation & rendering the database server non-responsive!
the fix: separate transactions
How do we fix it, assuming we want to store intent
regardless of whether the API call succeeds or fails?
We need to split the intent
vs result
INSERT
s into two separate transactions:
public class KvDao {
@Inject Transacter transacter;
@Inject Gson gson;
void complexPut(String ns, String k, String v) {
// STEP I
transacter.transaction(session -> {
KvRecord r0 = new KvRecord("intent", k, v);
session.dsl().insertInto(KV).set(r0).execute();
});
// STEP II
Object result = doSomeApiCall(v);
// STEP III
transacter.transaction(session -> {
KvRecord r1 = new KvRecord("result", k, gson.toJson(result));
session.dsl().insertInto(KV).set(r1).execute();
});
}
}
failure modes
The failure modes are now:
- step I fails, nothing gets written, the entire operation can be retried
- step II fails, but we’ve saved the
intent
so we can resume this operation at a later time. - step III fails, and we’re probably going to have a duplicate
doSomeApiCall(v)
call.
The final case here is most interesting, so let’s dig in a bit more.
First off, if at all possible, one should probably avoid having step III within this code path entirely, and instead rely on doSomeApiCall
to (eventually) notify your application that it happened, and do the step III work there.
But not all APIs supports webhooks & in that case, we’re probably going to end up retrying step II. Many (good) APIs include an idempotence token to detect this situation & permit that API to deduplicate those events and no-op the second time. We might also be able to do some list/search/find operation to check that the API isn’t already aware of the information/events in question, but this is somewhat of a last resort as race conditions are still possible. (These can also be mitigated with some kind of lease mechanism to enforce a single owner of that record until some timeout has elapsed & we’re confident there isn’t another in flight request…. but this is a lot of finicky machinery & overhead compared to the API using a unique key to deduplicate on idempotence token.)
If all of that fails, we might actually end up with multiple API calls in flight. This is not great, but it tends to arise in primitive/early-stage systems or those not overly sensitive to duplicates.