# espps are free money

ESPPs give employees an opportunity to buy the company stock at a discount. In both of the examples I'm aware of, the companies give a 15% discount on the LESSER of the price on the grant date and the price on the purchase date. The purchase dates are every six months, while the grants I've seen are either 12 or 24 months.

We can analyze this mathematically by breaking it into three cases. For concreteness, let's look at ADBE for a grant date of 2019-01-02. The stock is trading at \$224.27/share currently: First, let's consider the case of the stock being exactly the same price on the purchase date. The lesser price is then \$224.27, and an ESPP enrollee will be able to purchase at \$224.27*85%=\$190.63. Those shares will still be worth 224.27 though, so the enrollee has made \$33.64 per share they purchase. Next, consider the case of a decrease in value of the stock on the purchase date. Again, for concreteness, let's assume it goes down to \$200/share. Now the lesser price is \$200 and the purchase price is \$200*85%=\$170. The enrollee still makes \$30/share!

Finally, consider the case of an increase in value of the stock on the purchase date. For concreteness, let's assume it goes up to \$250/share. The lesser price is now the original price of \$224.27, so the enrollee can purchase at the \$190.63/share price point as in the first case. There's a critical difference, though: each share they purchase is now worth \$250/share, so the enrollee makes \$250-\$190.63=\$59.37/share this time! At this point, you're probably wondering: what's the catch? What trick did traviscj pull on me? There isn't one, but there is an important caveat: this assumes you immediately sell after the purchase occurs. You might or might not want to do this! Several things play into this: Do you value holding the stock? (e.g. you want to collect the dividends, etc) What are the tax implications of selling vs holding? Selling immediately will trigger the short term capital gains tax! (since you haven't held them for longer than a year) Holding for at least a year will qualify the earnings for long term capital gains tax, which is generally more favorable. But this incurs the risk of the stock moving against you within the holding duration. DISCLAIMER: I am not a tax professional! Please consult a one before taking action on this information! # tradeoffs Life is (just) a series of tradeoffs. - DawnAnn "Mom" Johnson I heard this so many times growing up that I think I actually stopped hearing it. But recently, I realized that it's never been more pertinent than to my current day-to-day life as a software engineer. The fundamental tradeoff we make on the Risk Systems team is trading off between • how much fraud we catch (and symmetrically, miss!) • how many false positives we incur to catch that much fraud Those false positives running rampant can inhibit growth (in terms of interested customers) of the product we're trying to protect, but letting too much fraud through can make a product too expensive (and even liable to be shut down!) These consequences are pretty dire, but any project with a machine learning underpinning -- and especially those with an adversarial aspect! -- probably finds itself in a similar standoff. A level below that, there's a more detailed level of distributed systems tradeoffs, like: If our datastore is unavailable, would we rather • be unavailable, and perhaps not respond to some client? • be available, but permit our own records to be lossy? Another level even further below that, there's tradeoffs around how to spend the team's time! Should we • attempt to overengineer a bulletproof, multi-datacenter, cloud-native solution, but risk never finishing it? • ship the smallest thing that could possibly work, and risk having to throw it away later? There's an interesting meta-tradeoff here, which is: Should we • expend extra effort into not committing to a given path? (ie: hedge) • not bother, paying that cost later if we need to? (ie: gamble) I don't have answers for any of these, really. Every option I've listed has been the perfect answer for some project and the death knell for another. And I think that was always Mom's point: It's just a tradeoff! So 1. be aware of and open to the opposite approach 2. be mindful of the tradeoff you're making 3. be on the lookout for times to seek a different tradeoff # sharded feeds Suppose that: 1. our humble KV feed sees a lot of traffic. 2. someone needs to consume our KV feed with multiple threads. ### data The first step is to introduce a notion of "shards" into our data model: ALTER TABLE kv ADD COLUMN shard INT(11) DEFAULT '0', ADD INDEX k_fsi_s (feed_sync_id, shard);  ### publishing We don't need to alter the publishing until the publishing itself is too slow to work with a single thread, but this introduces a lot of complications, so let's just hold off for now. ### serving SELECT * FROM kv WHERE kv.feed_sync_id > :fsi AND kv.shard = :shard ORDER BY kv.feed_sync_id LIMIT :limit  which is supported by an API like /_feeds/fetch/kv?after=3&limit=5&shard=3  We can also support client-side shard configurations by also supporting a shard_count argument: /_feeds/fetch/kv?after=3&limit=5&shard=3&shard_count=4  ### consumption We need to update our cursors table to include a shard column as well: ALTER TABLE cursors ADD COLUMN shard INT(11) DEFAULT '0', ADD COLUMN shard_count INT(11) DEFAULT '1', ADD UNIQUE KEY u_name_shard (name,shard), DROP KEY u_name;  ### data vs consumer shards We haven't discussed the relationship between the number of threads a particular consumer uses (the shard_count value it provides to the API) and the range of shard values in the data model. Coupling the two together is completely sufficient for a prototype, and can even work surprisingly well for a surprising amount of traffic! But at some point, when a system has grown to several consumers, and especially when those consumers need to do different amounts of work or have different locking requirements or performance characteristics, it can become useful to decouple the two. The basic ideas are to 1. write the data with a relatively large number of data shards DS (distinct shard values; between 2**9 and 2**12 are common in examples I've worked with), 2. consume with relatively fewer consumer shards CS (e.g. <=2**5), 3. query by mapping a given cs_i to a range of [DS_min, DS_max) values. I've still pulled one punch here: we haven't said anything about what to actually set that new shard field to when writing new records! That'll be in the next post. # moar stupid jquery tricks Sometimes it's useful to use tampermonkey on a site without nice ids for the elements you want to edit. It turns out it's still pretty easy! Take my own website; I have <h2> elements for the headers, but none of them have any ids associated. But as discussed in the jquery selectors post, we can still select by just the tag, with a selector like $("h2")


This returns a jQuery object, which behaves like an array. If we want to select a particular one -- like "Research Interests" -- we can try accessing different elements until we get the one we want:

> $("h2")[0] < <h2>​Work​</h2>​ >$("h2")[1]
< <h2>​Education​</h2>​
> $("h2")[2] < <h2>​Research Interests​</h2>​  One tricky bit here, though: doing the array access returns the dom element, not another jQuery object! The most direct API here is just using the textContent DOM API to set the "text content" of that element, like $("h2")[2].textContent = "RESEARCH INTERESTS"


But since we have been using jQuery so far, we can also call the $ with dom element, like: $($("h2")[2])  and now we can call usual jQuery object methods, like #html $($("h2")[2]).html("RESEARCH INTERESTS")  # skydive! For my 32nd birthday, my wife took me skydiving! # killing all timeouts in js this article has a nice trick for killing all javascript timeouts: function stopAllTimeouts() { var id = window.setTimeout(null, 0); while (id--) { window.clearTimeout(id); } }  which can be entered on the javascript console and then run with stopAllTimeouts()  This is an effective way to prevent javascript timeouts from doing a whole variety of things, like • carousels • endless scrolls • any other kind of animations # jquery selectors This is silly, but I always forget this: attribute selector jquery <tag id=x> #x $('#x')
<tag class=y> .y $('.y') <tag> tag $('tag')

I also learned you can specify things like

// @require https://code.jquery.com/jquery-2.1.4.min.js


in a Tampermonkey script, even when a given site doesn't already have jquery.

# Naming is hard (so don't)

A lot of times I just want to record something. This should be one of those things computers are good at. Turns out this is a bit harder than it seems: many editors make you name a file to save it.

One easy-sounding way to record something is:

1. open up a text editor
2. mash the keyboard until the thought is out of your head and into the text editor
3. (hard part starts) what do we name that file?
4. where do we put that file?
5. do we remember to save that file, given the difficulty of (3) & (4)?

I think picking names too early is at best a minor annoyance, and at worst a pretty major distraction.

• You have to use that early name to open that file later to make changes.
• The process of assigning a name and referencing it repeatedly puts a damper on changing the idea "too far" from the original idea and original name.
• If that name is utilized in a URL or shared with anyone or (worst of all!) gets linked from other documents, changing it means changing everywhere it might be linked from. This problem is hopeless if we permit our friends and coworkers to retain private documents/bookmarks/etc. Even considering shared workspaces, frequently it isn't even possible to enumerate such places. (I'm looking at you, Google Docs.)

One alternative is:

cat >> ~/to_file/$(uuidgen) << EOF my super clever thought EOF  This is slightly unsatisfying, too: MacOS tracks update timestamp, not creation timestamp, by default. So we maybe want a separate "meta" file here, if the creation timestamps are important. But we can tackle it with either discipline (never update!) or a small bit of extra complexity, like U=~/to_file/$(uuidgen)
date +"%s" > ${U}.created_at cat >>${U} << EOF
my even more clever thought
EOF


With even that tiny bit of infrastructure in place, we can generate a timestamped log of "recordings" about whatever is useful to you!

The access pattern is kinda interesting here. Obviously, we don't want to -- and won't, or at least shouldn't -- remember the raw UUIDs. So we need to search, instead. We can use something like ag:

~/to_file ➜ ag clever
1:my even more clever thought

8319D617-59AE-4369-85C1-D5A738C91ABD
1:my super clever thought


Why bother with uuidgen?

1. I have built some very small systems on similar ideas but with much shorter "token generation" schemes; the problem here is eventually they conflict, and you either lose data (perhaps unknowingly), concatenate entries (perhaps unknowingly), or need to implement an existence check.
2. In the case of a shared folder (like Dropbox), the existence check can't check whether records already exist on a "replica" (ie your personal laptop) that hasn't been synced yet.
3. It's built into MacOS, so there's no custom token generation code to write.

This is conceptually kinda similar to the technique of Write-ahead logging: The ~/to_file directory functions as our "log" (of "thoughts" or "recordings" or whatever), which is to be reconciled later by moving the file into a more appropriate place.

# Feeds as cache invalidation mechanism

One really cool use of feeds we've realized is that it gives a very efficient mechanism for application code to load the most recent versions of a table into memory. The basic idea is:

1. Set it up as a usual feed published table with an appropriate index on feed_sync_id.
2. Either alongside or within the cache, represent the latest loaded feed_sync_id.
3. Set up a cronjob/etc that reads the latest feed_sync_id and compares it to the cache's feed_sync_id.
4. If they differ, reload the cache.
5. Ensure that all changes set feed_sync_id to null!

This works really well because the feed_sync_id in the database only gets updated on changes, so the reload cronjob mostly is a no-op. This means we can reload very frequently!

# robustness principle and mocks

The Robustness Principle (or Postel's Law) states

Be conservative in what you do, be liberal in what you accept from others (often reworded as "Be conservative in what you send, be liberal in what you accept").

This principle has some criticisms.

I realized this has interesting implications for mocks. Suppose you have

public class MyObj {
private final Integer x;

public MyObj(Integer x) {
if (x < 0) {
throw new IllegalArgumentException("negative x!");
}
this.x = x;
}

public int getX() {
return x;
}
}


In a unit test, we can have

MyObj myObj = mock(MyObj.class);
when(myObj.getX()).thenReturn(-1);


which bypasses our sanity checks!

My takeaways from this are:

### Don't use mocks for POJO/data-ish objects!

If this feels painful, create a fixtures class:

public class MyObjFixtures {
@Inject Random random;

public MyObj create() {
return new MyObj(Math.abs(random.nextInt()));
}
}


### Create "fake" implementations when possible/appropriate.

Usually, it's quite a bit easier to create a fake implementation than a real implementation:

1. use maps/lists instead of a real database
2. take shortcuts like avoiding concurrency where possible
3. depend on fakes of whatever API calls you need

There's some danger in this -- you should definitely consider wiring things up in a way that your tests can run against "real" implementations or sandboxes of them.

Another major concern becomes having a way to "reset" the fake implementations between tests to reduce flake, or coming up with a way to test that avoids the need to reset.