500 Million Dollars!

Oct 27, 2017

I watched Destin's $500 Million Dollars video:

and had a few thoughts:

my highschool CS education

My highschool had computer teachers, but no computer science teachers. Luckily, we did have a teacher (Dick Van Kirk!) that was willing to sponsor an independent study of some computer science topics. I read the AP Computer Science A study guides and a bunch of the Java: How to Program (Deitel & Deitel) book and managed to get through the AP test with a top score. This enabled me to skip over the intro CSE142 at University of Washington.

This whole experience made the promise to increase highschool CS course availability for rural students really resonate with me -- it's very exciting to hear "rural schools" included in the earmarking callout.

substantial enough?

From the US Department of Education:

Across the United States there are 26,407 public secondary schools and 10,693 private secondary schools. ( Digest of Education Statistics, 2001, Table 89)

Let's assume there's no curriculum overhead or anything like that, and it all goes completely into the public school system. So, \$500,000,000 / 26407 schools = \$18,934.37/school.

According to the US News, the annual median salary for high school teachers was $57,200 in the US in 2015. So this represents approximately 1/3 of one headcount per public high school.

I'm not sure how to feel about this. On the one hand, it seems likely to have some positive impact, so it seems like good news.

On the other, the amount (and especially the original $200M presidential mandate) seems unlikely to be substantial enough to accomplish the stated goal.

confident women are getting bypassed by overconfident men

Sep 11, 2017

python attrs

Aug 31, 2017

I came across a very interesting library in a HN thread: the python attrs library.

In particular, this seems like a great way to do the "dumb data objects" they talk about in the end of object inheritance, and also related to (but maybe lighter weight than) zope.interface.

This also seems very similar to what I use autovalue for at work.

One particularly interesting application is a "code database" -- using static, checked-in-to-version-control definitions of some data model as a sort of very-fast-to-read, very-slow-to-update "Data Model". I find this fascinating: Code shares a lot of properties with great data stores: ability to rollback (git revert) and accountability/auditability (git blame). It also makes a lot of fairly hard problems much simpler: you don't need to poll the database for changes. You don't need to invalidate any caches. You don't need to consider a "split brain" environment where half of the in-memory caches have updated but the other half haven't. You don't need to consider failure cases of how long the in-memory cache is allowed to be invalid: you just fail to boot up on deploy. (Admittedly, there's still an opportunity window for split brain behavior for the duration of the deploy, but this is a lot easier to reason about than an essentially arbitrary.)

Immutable objects also work surprisingly well for interactions with database-backed objects. The notion of "data model dirtiness" (that is, unwritten writes in an ORM framework) correlates very strongly and is relatively easy to observe from the litmus test of whether the data model has round-tripped to and from the mutable representation. Difficulties like "is this timestamp database time or JVM time?" are mostly phased out, because it simply always uses the JVM time -- subject to an explicit modeling of constraints of the database (like precision).

Interestingly, jOOQ actually supports an "Immutable Pojo" flag in the codegen phase. In practice, jOOQ's approach has other difficulties -- namely, the lack of index awareness and the inconvenience of the native (read: tightly coupled with your database) objects -- but I think this is ultimately a step in the right direction.

klr650 planned upgrades

Jul 17, 2017

I'm planning on "civilizing" my KLR650 just a tiny bit, since it is coming up on 1500 miles and I haven't made it offroad at all yet.

There are two (fairly minor) issues with the stock setup, as far as I'm currently concerned:

First, I'd prefer the gearing to be just a little bit taller so I don't have to wind it up quite as much. The engine just runs a little faster than I'd like at freeway speeds, and even driving around town it just feels slightly off. So I'm thinking about swapping out the stock 15T countershaft sprocket for a 16T. That changes the final drive from 2.87 to 2.69, and saves about 400 RPM at freeway speed, which doesn't sound like much, but should quiet it up nicely. I found a website called gearingcommander that allows all of these calculations pretty nicely.

Another thing I've noticed on freeway rides is that the stock tires just don't feel great to me. Because I haven't ended up doing any offroad adventures yet, I'm thinking about switching to the Pirelli MT90 Scorpion S/T. They're supposed to be a lot nicer on pavement and great in the rain, at the cost of some dirt performance, which sounds like a good tradeoff based on my riding so far.

Fine, I might also do one other little tweak, someday...

relational java explorations with sqlite3 vtab

May 5, 2017

One idea i have been mulling over lately is exposing a java codebase relationally, with queries like:

FROM classes c
JOIN annotations a
  ON a.class_id =
WHERE = 'MyAnnotationClass';

The idea here is that you could search a codebase by pretending that you had tables for a whole bunch of things like:

  • classes
  • annotations
  • variables
  • literals
  • methods

This would be a useful thing because instead of relying on an IDE like IntelliJ to do "find usages" operations, you could actually script those interactions. Admittedly, of course, Java has some pretty sophisticated reflection APIs to find this sort of stuff too, but it seems like exploring the structure could be much easier writing Java code to traverse those trees.

I told sharvil about this idea and he immediately responded with a link to SQLite's vtab API. There's a Dr Dobbs article on using it as well.

toy/life data models

Apr 19, 2017

I've been experimenting a lot with some kinda "toy" data models based on random things that I wish there was a database to query, but isn't. For example:

  • What was my average arrival time this week?
  • How much of my equity has vested before a certain date?
  • When was the last time we had spaghetti for dinner?

I've been doing this with flat JSON files. This is a bit of an odd choice for me; I actually love schematizing data models in protobuf and MySQL and designing proper indices for the data models I work on during work hours.

The ones I work on off-hours are a bit different, though. I want to quickly add a new record in if I order something on Amazon or my wife asks me to do something or I think of some quick hack to do something differently. I don't want to run a "real" database server, because that would mean running a MySQL server on the internet and connecting to it (which complicates or restricts accessing that database in a safe way.) Even if I did, that database (without a ton more work) wouldn't have multiple staged backups, let alone a solid replication strategy. Because I'm trying to be scrappy with my updates, the possibility of a catastrophic update is very real -- there's nobody around I can ask to review my roll plans.

I want to edit things at work or at home in the morning and see those updates in the other space. I want to be able to roll back to points in time and undo really big commits. I want to do bulk updates -- none of these tables have more than like 50 records -- in a text editor with a multi-line cursor.

These use cases are actually (surprisingly?) all really well served by JSON + Git/Dropbox. Git gives a certain type of transactionality and time-travel query capability. Dropbox provides a replication mechanism across all of my devices and other folks have done all the hard work of inventing UIs that work appropriately on each given device.

Unfortunately, they do mean that the querying support is not as developed as a proper relational store like mysql or sqlite3. But that's okay. jq gets a big part of the way there, and some small python scripts get most of the rest of the way. Once there's a large enough corpus of records and/or enough schema stability to want a "real" database, it's pretty easy to simply import the JSON records in to that database.

Let's do a quick example! Periodically, my wife texts me asking what I want for dinner. I wanted to record a small "menu" of these, so I popped open a new file in textmate, select JSON mode, and enter a couple quick entries:

  {"main_course": "korean beef"},
  {"main_course": "greek food"},
  {"main_course": "thai food"},
  {"main_course": "meat loaf"}

I can get a quick list of just the main courses with a JQ command like

$ pbpaste | jq 'map(.main_course)'
  "korean beef",
  "greek food",
  "thai food",
  "meat loaf"

or pick a random entry in python with something like

import json
import random

with open('data/dinners.json') as dinners_file:
    dinners = json.load(dinners_file)

which might output something like

greek food

If I want a mysql-cli-style table of records, I can replace the final line with


(after a quick pip3 install tabulate, of course.)

The beauty of this approach is that if we can't come to an agreement among those options, and I start googling around on recipe sites, and come across a super-great sounding tamale pie recipe, I can just add it in along with the recipe link:

$ git diff
diff --git a/data/dinners.json b/data/dinners.json
index 60f8f6b..abb04d2 100644
--- a/data/dinners.json
+++ b/data/dinners.json
@@ -2,5 +2,6 @@
   {"main_course": "korean beef"},
   {"main_course": "greek food"},
   {"main_course": "thai food"},
-  {"main_course": "meat loaf"}
+  {"main_course": "meat loaf"},
+  {"main_course": "tamale pie", "recipe_url": ""}

I didn't need to run any

ALTER TABLE dinners 

or anything like that. If I make a mistake in some query I don't need to delete and re-insert a new record -- I just edit the file. If I shut the lid on my laptop and ride in to work, the new entry is waiting for me on my work laptop when I get there.

These "data models" are also extremely useful as mock data when writing mustache templates or React components.

JSON files definitely aren't perfect for this use case. It's a bit verbose (christ, the quotes) and finicky (diff-minimizing trailing commas, where art thou?). As a database, it's worse in almost every respect to a real relational/KV/document store -- there's no ACID guarantees, there's no command line, everything is slow, common operations are a bit wordy, there's no phpMyAdmin, etc. But there's great support in almost every scripting or compiled language for loading JSON, so you have a lot of freedom to put together different parts ("I think I'll write this transformation in Clojure and that one in Haskell -- wait, no, MATLAB!"), and there's no servers to configure and secure, or clients to install.

In similar experiments, I have created some basic mysql tables or used a local Redis/mysql/sqlite3/mongodb. SQLite3 is an incredible, insanely battle-tested peice of software. Redis has a pretty comfortable command line and some great functionality like the HLL support and TTLs on a key-value pair. MongoDB fits a pretty similar "early experimentations" use case and has a great query language. Of course MySQL is also a very solid workhorse, especially with the InnoDB storage engine. They all absolutely have their use cases: nearly everything!! -- in contrast to flat json files, which is really barely even appropriate for my use cases! -- and most cases any of these would be much better suited for "bigger" data and especailly when any concurrency gets involved. But it's hard to argue with the immediateness of this approach, though, and it's easy to get frustrated when homebrew breaks the development redis instance on your work laptop or the todos you added last night are sitting in the mysql on your home laptop.

I hope I've convinced you there can be at least some utility in this kind of thing! I'm planning on describing some of the tooling I've built up for this, like tools for inserting new records, extracting field sets, adding fields, getting distinct values, aggregations and group-by's. So stay tuned!

appendix: starter ideas (personal)

  • account_balance_snapshots: balances held in various accounts on certain dates, because I'm too lazy to keep up with ledger-cli but still get annoyed at how slow Mint is.
  • donations: organizations I've donated to, dates, and amounts.
  • interesting_papers: links to PDFs I want to read but haven't yet.
  • travels: flight confirmation numbers, to/from airports (by IATA code), airlines, flight ids, departure times, arrival times, seat assignments.
  • stock_{purchases,sales,vestings}: tables defining stock transactions from equity compensation.

appendix: starter ideas (work)

  • app_nodes: hostnames, data centers, and environments by application. We have a service discovery err, service for this, but updates are relatively rare and
  • team_members: phone numbers, emails addresses, slack ids, birthdays, etc for people on my team. Again, we have some portals at work for (at least subsets) of this data, but it still comes in handy for the rest of the data and internet/VPN is problematic.


Apr 3, 2017

I often create a directory/file called _useful/ I have one in my Dropbox, for example, that contains:

  1. My apartment lease
  2. My car/motorcycle insurance details
  3. A textfile with my vehicle plate numbers/VINs/insurance policy numbers.

At work, I have one with

  1. the top visited links for logs/metrics/admin interfaces for the services I work with most
  2. a list of links of "typical" or "exemplar" things
    • links to our internal tool views for typical payments, merchants, etc
    • typical size (in bytes) of various protobuf messages we use a lot, size of 1M messages, #messages in 1MB/GB
  3. common coding idioms, like several variants of @RunWith that we use in various cases in our test code.
  4. useful commands for doing stuff (curl/SQL/plain old shell)

Plain text is great for all the reasons it usually is. But it's especially useful here (see what I did there?) because the file loads much faster than Google Docs or wiki pages, it's grep-able, it's trivial to copy to a new machine, there's no fuss about futzing with the document to get it to format properly, and soforth.

The naming convention is useful because it naturally gets lexigraphically sorted at the top in most macOS/iOS file lists without being a special character on the shell prompt (which complicates the aforementioned grep-ability.)


Mar 31, 2017

I came across Better output from sqlite3 command line. His .sqliterc file did not work for me, but the simpler

.mode "column"
.headers on

did work nicely.

I also found out that brew install sqlite3 does not install the sqlite3 binary to $PATH, which stinks.

The sqlite3 environment is still much less pleasant than the mysql cli. A few things on my wishlist:

  • tab completion of tables, fields, keywords, and functions.
  • nicer .schema output -- I've just gotten really used to reading them in mysql output

It sounds like maybe apsw can be something useful? Or maybe Navicat? (but that's expensive)

sprat: multiplayer solitaire

Mar 29, 2017

I took a quick pass at describing the rules of a card game I grew up playing, Sprat:

Each player(or team) has 1 deck of cards. initial setup is 4 cards face up (the "personal piles"), 13 cards (top card face up, others face down) in the "sprat deck", and the remaining cards in the "flip deck".

The center of the table is the space for the "ace piles". A new ace pile can be started by any player with any ace; any player can play the next card of the same suit on any ace pile.

Each player can move cards/stacks of cards within their personal piles as long as each card decreases the face value by 1 and alternates color. (solitaire style)

Each player flips through their flip deck 3 cards at a time and can play the top card on any ace piles or personal piles. (again, solitaire style)

The round is over when the first player eliminates their sprat deck. Rounds usually take 2-15 minutes, with the higher end being very rare for >= 3 players. The round score is then (player's cards in ace piles) - 2*(player's remaining sprat deck).

One game consists of several rounds; the game is over when the first player reaches 100 points.

I wrote a basic (read: not battle hardened) implementation of this as a semi-real-time game on github/traviscj/sprat. It's barely good enough to play, but I've had a bit of fun playing it with some (non-local) family!

When I was writing up the code and trying to describe the rules to someone else, I discovered an identical game called Nertz, itself a variant of Canfield.

idea: transaction ordering in ledger-cli

Mar 27, 2017

I love ledger-cli but keeping it in sync with my bank statements drives me crazy. The problem is that the transactions can end up with an essentially arbitrary ordering, and the order they clear (and even the date they clear) is not necessarily under my direct control.

One answer to this is: stop caring about the ordering of your transactions! That's a decent answer, except that not addressing the ordering issue means that you can only ever have "end-of-day" consistency. This means you need a different report to reconcile the transactions.

It also sidesteps some more fundamental concerns:

  • It doesn't help me figure out if a given transaction is already accounted for in the running transaction total.
  • I want my near-term forecast to give me a strong guarantee that I won't overdraft the account.

I had a quick idea about this. What if instead of having a "clear date", transactions had a "clearing window" -- the earliest date and the latest date that a given transaction is expected to clear.

This doesn't solve the "put it in order" problem, but the software could construct pathological orderings and generate error conditions to show potential cashflow problems.

subscribe via RSS

Powered by Olark