2007: J F M A M J J A S O N D
2008: J F M A M J J A S O N D
2009: J F M A M J J A S O N D
2010: J F M A M J J A S O N D
2011: J F M A M J J A S O N D
2012: J F M A M J J A S O N D
2013: J F M A M J J A S O N D

# Blog, page 0

from the desk of travis johnson.

## implementation of set operations (from 2013/03/13)

We got in a bit of a debate yesterday in the office over the implementation of associative containers, which I thought was pretty fun. We made up the big chart of complexity results you see below.

### nomenclature:

• , , and are subsets of .

• Denote an element by .

• ,,, are the sizes of the set , , , and , respectively, and .

### Complexity

 OperationApproach Hash Table Hash Tree Binary List Entry List (sorted) Entry List (unsorted) space complexity bits.

As I said–this was just what came out of my memory of an informal discussion, so I make no guarantees that any of it is correct. Let me know if you spot something wrong! We used the examples and to think through some things.

### Details on Implementations

#### Hash Table

• Assumptions: hash collisions rare/table lookups constant time.

• contains: hash the element , table lookup, if 1 then in set.

• union: create new table , set for all in either set.

• intersection: create new table , then for each , if set .

• space: Actual space used and table lookup complexity is highly dependent on the hash function : requires bits for the table but collisions are impossible(identical to bitset); requires only 1 bit for the table but necessitates a separate list of colliding elements(identical to sorted/unsorted list, depending on how collisions are handled.)

#### Hash Tree

• Assumptions: hash collisions rare/tree lookups logarithmic time(well balanced trees).

• contains: hash elt, tree lookup, if 1 then in set.

• union: must iterate over both sets (eg, tree traversal)

• intersection: lookup for each elt in n_2.

• space: Each element requires storage, plus the overhead of the tree.

• Implementations details: To make average runtime more likely and depending on size of set, can use an AVL, Red Black, Splay, or B-Trees instead of simple binary search trees.

#### Binary List ()

• contains: access element of vector/bitset.

• union: must iterate over all items which could be in the set.

• intersection: same.

• space: Each element in the set requires one bit.

#### Element List (, sorted)

• contains: binary search on list for .

• union: pointer for each list, step until both point to end.

• intersection: pointer for each list, step the pointer indicating the smaller value until end of shorter list(the remaining elements in the longer list can not be in both lists.)

• space: Each element in the set must be stored.

#### Element List (, unsorted)

• contains: must iterate over entire list since elt could be anywhere.

• union: for each elt of n_1, will iterate over half of n_2 on average

• intersection: for each elt of n_1, will iterate over half of n_2 on average.

• space: Each element in the set must be stored.

### Tree notes

• B-Trees vs Red/Black vs AVL vs Splay:

• each try to minimize the depth of a tree:

• B-Trees by breaking the children per node and doing more comparisons at each level,

• AVL/Splay trees by keeping the tree as balanced as possible.

• B-Trees can get a performance edge from cache locality (good for disk reads, ENORMOUS amounts of data, etc)

• AVL/RedBlack trees try to self-balance to minimize depth.

• AVL rebalances (an operation) when a the heights of subtrees of a node differ by more than 1;

• RedBlack trees rebalance (an operation) when paths to leaves contain different numbers of black nodes.

• AVL requires longest and shortest depth of tree to be within one(and has work to rebalance) while RedBlack trees requires longest and shortest depth to be <=2 and has O(1) work to rebalance.

• Splay trees try to to 'splay’ commonly accessed nodes to more shallow parts of the tree so that subsequent accesses require less tree traversal.

### Practical Use

In closing, I wanted to point out a few ways these implementations come up in C++ and MATLAB.

In C, most of these options are supported. In particular:

MATLAB does support list-style set operations via the ismember, union, and intersect commands(as well as setdiff):

S1 = [3,2,1]			% declare list
ismember(1,S1)			% true
S2 = 2					% 1-elt lists are lists.
S3 = setdiff(S1,S2);	% remove elements of S2 from S1
S4 = union(S2,S3)		% recombine S2 and S3 (ie, S1)
S5 = union(S4,3)		% also works for elements.
S6 = intersection(S1,S2)% common elements


Interestingly, MATLAB returns sorted arrays regardless of whether they are sorted or not.

Sets can also be manipulated in either the binary list or element list formats, because MATLAB supports indexing matrices by either a binary vector or a vector of indices.

vec = 10:20;
oe1=mod(vec,2)==1		% logical array of odd elements
oe2=find(mod(vec,2)==1)	% index array of odd elements.
vec(oe1)				% slicing by logical
vec(oe2)				% slicing by index


While researching this blog entry, I was also surprised to learn that MATLAB has support for maps.

k={'Bid', 'Ask', 'Open', 'Close'};
v={10.10,10.15, 9.90, 9.95};
stock=containers.Map(k, v);
stock.keys
stock.values
stock.isKey('Close')
stock('Close')=3
stock('Close')
stock.remove('Close')


Pretty neat!

## website basics (from 2013/02/15)

Several of my classmates in graduate school are considering making websites around now, and have asked me to explain it. I figured I'd explain the stack as I have it set up, and where you could make tradeoffs. For now, I'm not going to include too many detail here; I just want to outline the possibilities and lingo so that researching options is a bit easier for the uninitiated.

First off, many people are perfectly happy to use the school's email and department web space. If you only want to have a list of relevant papers and soforth, then this is a fine approach. See the very bottom of this post for a couple potential ideas for organizing it.

An alternative to this is to buy a domain name, and doing a combination of email and web hosting with it. A few other reasons you might consider this path:

• You want to build an internet brand.

• One particular perk here is an email address on your domain.

• You want to develop web-based software

• You want your web identity to be independent of where you currently work(something that can persist across multiple careers or institutions)

All of these apply to me, so that's the route I went with. If you're on the fence, a few other things I use my server for include:

• Running python or C-based numerical experiments

• Running an OpenVPN to have consistent access to my desktop at home from anywhere with an internet connection

• A code repository to showcase projects I've been working on and also serve as a backup of my work

• a private bittorrent tracker to transfer large files without tying up Dropbox space

In my setup, I use NearlyFreeSpeech.NET as my registrar, Linode for both their virtual private server and DNS hosting(through their very nice web interface, not a custom setup,) and Google Apps for email and calendar. Domain registration typically runs about $10 per year, DNS/website hosting is around$20 per month, and Google Mail is around $50 per year. Less expensive options for website hosting are available, but most impose some limit on flexibility. However, if you only need website hosting, it also greatly simplifies the procedure. One economical option is to use NearlyFreeSpeech.NET as a domain registrar, DNS server and web/database server (as needed.) For a new VPS setup, the step is to secure your hosting. Once you've activated it, you'll need to set up a Linux distribution on it and configure web servers and whatever else you need. At least on Linode, this is a trivial matter of picking which distribution and having them load a disk image onto your node, then waiting for it to install(it's quick.) While that's finishing is a good time to set up Google Apps to get the email rolling. The setup is straightforward, though I haven't done it since they abolished the 'Personal Apps’ hosting. Once your setup has progressed to the point that the machine has an IP address listed and your mail setup is complete, it's a good time to purchase your domain name. You can do this sooner, but you might as well do the whole setup at once. After purchasing the domain, you can point it to the IP address providing your hosting, and also add in any Google Apps mail info, as needed. Now comes the hard part–waiting for the DNS info to propagate around the world so that other people can start seeing your website and email can be delivered. While you wait is a good time to start adding some content to your website. My two primary recommendations here are jemdoc (for academics, particularly those whose webpages have a mathematical slant) and wordpress (for everyone else.) If you like the look and feel of jemdoc but wanted a more blog-style setup, one option is my own tcjblog. If you like online, live editing of Wordpress are not interested in the blog styling, keep in mind that it is easy to create static pages as well. Now you're set! Enjoy your online presence! ## some tips for project euler problems (from 2013/01/26) I wanted to make a list of a few tips on solving Project Euler problems that have been helpful for me while I solve them. These are general principles, even though I do most of my Project Euler coding in Python. Without further ado: • If the problem is asking for something concerning the number of digits, typically this indicates that the use of the function is warranted. • If the problem is asking for the last few digits, modulo arithmetic might speed it up considerably. • Some might consider this cheating, but looking up some small numbers in the Online Encyclopedia of Integer Sequences is occasionally pretty helpful. • Many problems boil down to: Find numbers with property and property . Two solutions are: • Brute force: Try all numbers with tests of property and . • Find numbers with property and filter by a test of property . • Find numbers with property and filter by a test of property . • Find the set of numbers with property and the set of numbers with property . Compute their intersection. I've found that it's sometimes hard to predict which one will end up being the fastest. It depends on the relative speed of the tests and the generators, and the frequency of finding numbers which have that property. ## Some Recent Photography (from 2012/12/16) My girlfriend and I were out by the lake taking pictures recently. I was trying to get the Chicago skyline, but we really weren't in the right place for it. So instead I settled for taking some long exposure photos of stuff I found. I think they came out pretty good, especially because it was all a ruse to get her to the lakefront so that I could ask if she would marry me. She said yes! ## Game of Life (from 2012/11/20) When I was a sophomore in high school, I was fascinated with Conway's Game of Life. I still am. I did a pretty rudamentary study of the kinds of patterns that could form from the simple rules of the game. One thing that wasn't available when I was a sophomore was Youtube. Two of my favorite Game of Life videos: Another direction that Game of Life has gone recently is something that I really should have thought of, honestly. Many times I've thought that there's at least some superficial relationship between Game of Life and diffusion equations. Turns out that S. Rafler has extended Game of Life to continuous domains through what he calls SmoothLife. ## Perturbation Theory Problems with bvp4c (from 2012/10/22) I have been watching Nathan Kutz’ lectures on Coursera. One change he made to the course since I took AMATH 581 at University of Washington was introducing the MATLAB function bvp4c. I immediately realized that this would be nice for solving boundary layer problems that arise in asymptotics. Following my life philosophy of doing the dumbest thing that could possibly work, I tried implementing Nathan's code for a single-layer boundary layer problem from Holmes, Chapter 2: The code for this DE looks like: clear all; close all;clc epsilonVec = [1,1e-1,1e-2]; BS={}; for i=1:length(epsilonVec); epsilon = epsilonVec(i); singular_rhs_anon=@(x,y) [y(2); (-2*y(2)-2*y(1))/epsilon]; singular_bc_anon=@(yl,yr) [yl(1)-0; yr(1)-1]; init = bvpinit(linspace(0,1,20), [0,0]); sol = bvp4c(singular_rhs_anon, ... singular_bc_anon, init); x = linspace(0,1,1000); BS{i}=deval(sol, x); end plot(x,BS{1}(1,:),x,BS{2}(1,:),x,BS{3}(1,:)); legend('\epsilon=1','\epsilon=1e-1','\epsilon=1e-2');  Here, the rhs function works exactly like a standard ODE45 call(turn the second order DE into a system of first order DE; coefficient on term should be 1). The bc function specifies the right and left bounds. The guess given to bvpinit is critical, but it seems that worked okay here. Also note that the granularity might need to be turned up for more difficult problems. Finally, note the use of anonymous functions to define a function without a separate MATLAB .m file. This generates plots like the following: This is a pretty slick way to solve boundary layer problems! There's a lot of existing stuff out there on this. A quick google search for 'perturbation problems with bvp4c’ turned up: ## more stupid matlab tricks (from 2012/09/28) I've been setting up long-ish runs on my MATLAB instance lately, which is nice because I can wanter off to do something else. The trouble is that I occasionally forget to come back or refocus on work again. I've come up with two solutions to this: * The first is to use growlnotify to pop up a persistant growl message. This works well enough as long as I'm at the computer: system('growlnotify -s -m "somefile is done"')  * For more extended aways or if I'm going to take a nap while it runs, I might do something more like: system('echo "matlab: uflX_aw_constr_log_5_15 done" | msmtp -a traviscj (mycell)@txt.att.net')  You need some setup before this works: I did a brew install msmtp and set up a .msmtprc file. After that, it should work goldenly, at least for AT&T customers. Others might need to look up their email/sms gateway number. ## debugging matlab mex (from 2012/08/16) One thing I've been doing a bit of lately is debugging MATLAB mex files; it seemed worth documenting. The broad strokes: • recompile all mex codes with ’-g’ • quit MATLAB • From a command prompt, run $ /Applications/MATLAB_R2012a.app/bin/matlab -Dgdb
[snip]
(gdb) run -nojvm
>> run_your_mex_file


This gives debugging output. If there is an error in the code, you're dropped back to a gdb prompt, where you can continue debugging in a semi-normal fashion. Typically, I end up doing a 'where’ to get a stack trace, and debug from there more-or-less like usual.

## Meta post: How my blog works (from 2012/07/27)

A while back, Sharvil requested a quick overview of how I post stuff to my blog, so I thought I'd post a rundown of it.

At the base, I use a git repository to store files named entryNNNN.txt. Each has the basic format:

title: Meta post: How my blog works
author: travis johnson
date: 2012/07/27
category: software
content:
A while back, Sharvil requested a quick overview of how I post stuff to my blog, so I thought I'd post a rundown of it.
At the base, I use a git repository to store files named entryNNNN.txt. Each has the basic format:


To start a new post and fill in some of this, I have a script called 'newentry.sh’, which determines the latest entry number and adds one to get a new filename (eg, entry0081.txt).

• The title entry is blank by default

• I am hardcoded as the author in the script. (Eventually, I'd like tcjblog to support multiple authors… but it doesn't yet.)

• Category defaults to 'personal’, but I probably could have picked 'software’, instead.

• Content is empty. Everything after the 'content:’ line is parsed as standard jemdoc.

The first pass of my script loops over the entryNNNN.txt files and generates the 'by date’ and 'by category’ pages, along with the long index, in the form of .jemdoc files. Then it calls jemdoc on each file to generate the .html files from the .jemdoc files. Finally, the files are all copied to the server. This is all managed by a makefile, so these steps amount to running 'make clean && make blog && make update.’ Of course, this is a bit of a pain too, so I added it as a post-commit hook in my git repository. Therefore, after I make some updates, I just check them in, and the rest is automatic.

## More cracking D-Link Files (from 2012/06/27)

Somehow, in the process of a router reconfiguration, I reset the password without the new password getting saved into 1Password. So I found myself locked out of my own router. I was about to reset it, thinking, “Hey, at least I have a backup of the settings from 2 nights ago!” and then realized, “I bet that settings file has the password right in it.”

Googling around a bit turned up this guy, but he only wrote VBA and a Windows binary. His pseudo-code looked pretty easy to translate into Python, so I did just that. Here's the result: D-Link DIR615 B2 v2.25 Decoder (no encoder, yet…)