# build_json.sh

This might seem silly, but I've beeing playing with some json.sh scripts that build legitimate json bodies and are easily filled into a shell script variable as needed.

The basic driving idea was that there are lots of slick ways to pull data out of JSON(either by programming something with python's json or running a command line tool like jq or whatever), but not as many friendly ways to build some JSON out of a given token or whatever. Often, you have a list of identifiers and you need to build a bunch of JSON blobs from that list.

For example, say we have a file called things that contains

thing1
thing2
thing3
thing4

and we need to generate

{"value":"$THINGTOKEN"} for each token in things. Then we can simply run a tiny shell command like Traviss-MacBook-Pro% while read token; do > echo$(newBuilder | set value $(quoted$token) | build)
> done < things
{"value":"thing1"}
{"value":"thing2"}
{"value":"thing3"}
{"value":"thing4"}

Easy as that! There's no waiting for heavy VMs to start up or anything like that, just run it.

# carputer brainstorming ideas

As much as I enjoy driving, radio almost always annoys me. There's a couple of particularly terrible stations and commercials around, but even the least objectionable of the pack really bother me. The focus did come with an auxiliary stereo input plug, which we have used extensively, but even that seems to have some shortcomings. In particular, one of the things I really miss about the radio approach is just that I don't really need to think that much about it -- it is just there while I'm driving around. Starting up spotify after I've started off isn't safe or prudent.

I started thinking about ways to improve this, and I think I finally came up with a workable idea.

The plan is to use a raspberry pi wired to the electrical system and stereo auxiliary input. I think the Pi should start up fast enough to start playing music shortly after starting the car, and with an SD card loaded with music should have plenty to not get bored with. I also realized that if I used some program like MPD to handle the music playing, I could probably remote control it from the iphone. That might require another hack or two (like a Linux-compatible USB/WiFi or bluetooth network adapter), but it would be pretty awesome to have phones in the car able to select the music. The Pi is cheap enough that I could probably just hide it under the front seat in the car and not be very worried about it getting too hot/etc, and I could power it out of the 12V phone charger that I already plug in to charge my phone.

The other cool part about this is that if I ever manage to get my radio installed in the car, I can potentiaally use the Pi (or some other computing unit) to send APRS packets or whatever.

Anyway, the actual setup:

1. Install Debian on the Pi, along with apt-get install mpd.
2. Copy a bunch of music onto the Pi SD card, either directly or by scp-ing files over.
3. Hook up the Pi to some speakers and troubleshoot the mpd installation.
4. Move the Pi to the car.
5. Ensure the wifi still works.

Next step will be running some machine learning analysis on the MPD play logs to see if I can build some really awesome playlists.

# car is paid off!

I finally sent the last check for my 2013 Ford Focus a couple of weeks ago, and finally got the title in the mail today. It also came with a "congratulations for paying off your car!" letter, which was a nice touch.

I opted to pay it off early, despite the (moderate) financial disadvantage it put me at. The loan was only 2.4%, so the cost of carrying the loan actually wasn't super significant. I had started paying the loan ahead of schedule during graduate school because I was worried about whether I would have the cashflow available to keep making payments during the time between graduate school and real paid work. Then when I did get a job, I just kept making the higher-than-required payments.

In any case, now that it is paid off, I get to enjoy a payment-free car with 20k miles on the clock until I'm ready for a new one!

# slide rules

Suppose you want to evaluate the multiplication problem $C = A\cdot B$. But you forgot your times tables, so you are stuck using your expedition watch or something. In fact, you can even make your own out of a couple of slips of paper, if you really need to.

First bit of background: One of the rules you probably learned and I definitely forgot was the rule of logs: $$C = A\cdot B \equiv \log C = \log(A\cdot B) = \log A + \log B$$ This ends up being useful for slide rules, because you can easily add distances together by putting two things next to eachother! So that is exactly what we do. We add the logarithms by putting things next to eachother.

The trick ends up being how we actually label the distances. We pick one convenient unit for the first interval, and that represents going from the value of 1 to the value 2. The key is then using that same interval, but having the value go from 2 to 4. And we continue this way until we have 1, 2, 4, 8, 16, 32, 64, 128 written down the split.

Suppose we call our two halves the left and right scales. Then here's the setup for actually doing a multiplication: Align the value 1 on the right scale with the value A. Then scan down to the value of B on the right scale, and read off the number we aligned with on the left scale. That is nothing other than C!

If you're wondering why it is not $\log C$, great question! It's because every time read a fixed distance, we are looking at the logarithmic scale, but when we read the value, we are taking log or exponential. It sounds weird, but it becomes pretty clear with a bit of practice.

Now, the next problem is: What if we want to multiply by 6, or something else that we haven't really labeled on our scale yet? The answer is that you end up estimating. One decent estimation once you have all the powers of two is just a linear interpolation. This is basically equivalent to

You can also use this to divide. If you want to evaluate $Z = X / Y$, just set X (on the fixed scale) next to Y (on the moving scale), and then read off Z (on the fixed scale) by finding the number adjacent to 1 (on the moving scale).

# debugging internet

My standard plan for debugging internet connections:

1. Can we reach stuff from some other device?

Usually, connection problems are something like "My iPad isn't working," which is caused by a flash of light in the sky, like when Swamp gas from a weather balloon gets trapped in a thermal pocket and reflects the light from Venus."

If it is a wifi problem:

2. Can we reach the router? Basically, this means running: : ping 192.168.1.1 except that sometimes it is some other IP address. In those cases, run something like: : windows> ipconfig /all : linux$route -nv : osx$ route -n get default | grep gateway

If this doesn't work, it means something is screwed up with the modem and/or router.

If it does, it means something else is wrong, so continue:

3. Can we reach the external modem IP address?

4. Can we reach another external IP address? Usually, this means running : ping 8.8.8.8 which should be a very reliable IP address.

If this doesn't work--i.e. anything other than output like: : PING 8.8.8.8 (8.8.8.8): 56 data bytes : 64 bytes from 8.8.8.8: icmp_seq=0 ttl=46 time=88.441 ms : ^C : --- 8.8.8.8 ping statistics --- : 1 packets transmitted, 1 packets received, 0.0% packet loss : round-trip min/avg/max/stddev = 88.441/88.441/88.441/0.000 ms (on OSX) indicates that your connection between the modem and the internet is degraded.

If that's not it, continue:

5. Can we reach another website by its URL?

If not, it probably means that DNS is degraded (i.e., the system that transforms puppies.com into the IP address 184.168.221.26 is broken somehow.

Usually, this means that your internet provider has DNS server downtime. A couple of options here are to set your primary and secondary DNS to Google's:

8.8.8.8
8.8.4.4

or, in a pinch, OpenDNS':

208.67.222.222
208.67.220.220

You might want to switch them back later.

# don't poll

A while ago, I wrote down a list of clever life rules in my day-to-day notebook. One of them was "don't poll". To say a bit more about it, don't spent time waiting for things to finish and checking them constantly.

A more concrete example: I find myself running terminal commands that take ~5 minutes, and then wasting ~5 minutes watching them. This is pretty stupid. So what I have started doing instead is run something like

$some_long_command [ ... scroll scroll scroll ... ] [ ... waiting for it to finish ... ] [ ... give up ... ] terminal-notifier -message "some_long_command done" [ ... go do something else ... ] Then, later I see a notification: Another one: If I set up a long running command and know when it should finish, I can also fire off something like at 1:05 << EOF terminal-notifier -message "hit 1:05 ETA for XXX being done" EOF which will fire off a message at 1:05 to remind me to check in on it. In fact, this is such a useful pattern that I wrote a small script around it: #!/bin/sh # remindat time message TIME=$1
shift
MESSAGE=$* at${TIME} <<EOF
terminal-notifier -message "${MESSAGE}" EOF so now I can just run remindat 1:05 hit 1:05 ETA for XXX being done (Note: For the at command to work, I had to run: sudo launchctl load -w /System/Library/LaunchDaemons/com.apple.atrun.plist after adding my username to /var/at/at.allow.) Finally, iterm2 includes triggers that can run a command when it observes text matching a text. This is great! # logging In grad school, I spent a lot of time writing code that read output from nonlinear optimization solvers, and tried to do useful things with it. A much better way to do that is called "structured logging", an idea I experimented with a bit during grad school. It has also been coming up in my working life, so I wanted to delve into it a bit deeper. For a quick introduction, check out Thoughts on Logging. For a lot longer introduction, see The Log: What every software engineer should know about real-time data unifying. The reason this became interesting in my graduate work was that we had one set of experimental scripts that would output information (like how many QP solver iterations, matrix-vector products, error observed, or whatever) into a flat CSV file. This works great for writing the initial version, because it's so simple! But eventually we wanted to do some moderately complex stuff with these data files, so I had some analysis scripts that pulled things out of the CSVs and generated some charts. Later still, we wanted to try a new QP solver without rewriting the analysis scripts, so we just made the new QP solver output the same structure of CSV output. When we decided that we needed to track something else, we just added it to the CSV output, and probably the analysis would have looked weird without changing the analysis script, so it's pretty easy to remember to update that as well. But what about the old QP solver? Well, we thought we wouldn't be needing that again, so no sense wasting time updating that! Of course, history is a cruel friend, and we did end up running the old QP solver again on some new data. And the analysis scripts were silently broken until we noticed and fixed them. There's a whole other problem too: We didn't have any headers in the CSV file, so they were not self-describing. (Probably this was FORTRAN's fault?) To give a concrete example, the code might have output something like it= 5, inner_it= 34 err: 1.2345e-3 or a CSV file like 5,34,1.2345e-3 Structured logging applied to this problem, I think, would have the programs outputting some JSON like { "iteration_number": 5, "inner_iterations": 34, "error": 1.2345e-3 } In practice, would be flattened down to something like {"iteration_number": 5,"inner_iterations": 34,"error": 1.2345e-6} Of course, all of these options in isolation contain exactly the same data. The differences lie in how resiliant to change they are, and how well we could: • parse output to do some extra calculation on it • reconstruct output from an old version of the code One JSON blob per iteration is about the same amount of work to output, and perhaps slightly harder to read in than a CSV, but much much easier than the completely unstructured logging. Parsing JSON is extremely easy in almost all languages: Java and Python in particular have great support; MATLAB has some support libraries as well. (No guarantees about FORTRAN.) And of course, as long as you keep your output routine up to date, there'll never be any confusion like "The third column of this CSV looks like a double... I wonder if that's the error or that one weird run I did where the code was outputting CPU time instead?" because the data self describes. Late in my PhD I took this even further and loaded from the structured JSON output into a relational database. Then I could do things like SELECT problem_name, MEAN(inner_iterations) WHERE iteration_number = 1 GROUP BY problem_name; which output a table like |--------------|------------------------| | problem_name | MEAN(inner_iterations) | |--------------|------------------------| | problem_1 | 123.45 | | problem_2 | 234.56 | | problem_3 | 345.67 | | problem_4 | 456.78 | |--------------|------------------------| or run a query like SELECT iteration_number, MEAN(inner_iterations) GROUP BY iteration_number; which output a table like |------------------|------------------------| | iteration_number | MEAN(inner_iterations) | |------------------|------------------------| | 1 | 503 | | 2 | 434 | | 3 | 418 | | 4 | 342 | | 5 | 309 | | 6 | 196 | | 7 | 113 | |------------------|------------------------| Disclaimer: This is all made-up data. The point is: Structuring the data made it really easy and much less error-prone to build reports. # Switching to jekyll I spent a few hours this evening switching everything over from tcjblog to Jekyll. I really do miss a few parts of the old blog setup, but one of the main reasons I switched several years ago was LaTeX support. Back then, we didn't have really sweet options like MathJax. And it just takes too long. One of the other big things that was keeping me from updating the blog the last couple of months was... the stolen laptop. It turns out that I didn't have very good backups for the last few website updates, which meant that I needed to manually restore blog posts from the website. And on top of both of those, of course everything needs to be converted from jemdoc to markdown. But, it is mostly done! # 14 vs 1499 vs 15 Sometimes, it is tempting to see 14.99 and say "about 14," even though we all know better. The problem with this is that by giving a 0.07% discount (14.99 vs 15), they have made you estimate a 7% discount (14 vs 15). Nice trick! # launchd as cron crash course insert <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd"> <plist version="1.0"> <dict> <key>Label</key> <string>com.whatever.five_after</string> <key>ProgramArguments</key> <array> <string>echo just ran > /tmp/whatever_five_after</string> </array> <key>StartCalendarInterval</key> <dict> <key>Minute</key><integer>5</integer> </dict> </dict> </plist> into$HOME/Library/LaunchAgents/com.whatever.five_after.plist

If you call a script in the ProgramArguments section, remember to make it executable and define the script properly.

Run

launchctl start com.whatever.five_after

Can also run every N seconds with

<key>StartInterval</key><integer>N</integer>

Can also check the status with

launchctl list | grep com.whatever

and check for status (the second column) = 0.