Performance on traviscj/blog

sharded feeds

Mon, 29 Oct 2018 18:30:00 +0000

Suppose that:

our humble KV feed sees a lot of traffic.
someone needs to consume our KV feed with multiple threads.

data

The first step is to introduce a notion of “shards” into our data model:

ALTER TABLE `kv`
  ADD COLUMN `shard` INT(11) DEFAULT '0',
  ADD INDEX `k_fsi_s` (`feed_sync_id`, `shard`);

publishing

We don’t need to alter the publishing until the publishing itself is too slow to work with a single thread, but this introduces a lot of complications, so let’s just hold off for now.

Feeds as cache invalidation mechanism

Wed, 03 Oct 2018 02:31:43 +0000

One really cool use of feeds we’ve realized is that it gives a very efficient mechanism for application code to load the most recent versions of a table into memory. The basic idea is:

Set it up as a usual feed published table with an appropriate index on feed_sync_id.
Either alongside or within the cache, represent the latest loaded feed_sync_id.
Set up a cronjob/etc that reads the latest feed_sync_id and compares it to the cache’s feed_sync_id.
If they differ, reload the cache.
Ensure that all changes set feed_sync_id to null!

This works really well because the feed_sync_id in the database only gets updated on changes, so the reload cronjob mostly is a no-op. This means we can reload very frequently!

piping for fun and profit

Thu, 29 May 2014 00:00:00 +0000

I recently discovered something pretty cool: groovy, and in particular groovysh. It lets you do cool stuff like run JVM functions:

➜  ~  groovysh
Groovy Shell (2.3.3, JVM: 1.8.0)
Type ':help' or ':h' for help.
-------------------------------------------------------------------------------
groovy:000> new Random().nextInt()
===> 909782845

But the sad part is that it seems pretty slow on my machine:

➜  ~  time (echo :q | groovysh)
Groovy Shell (2.3.3, JVM: 1.8.0)
Type ':help' or ':h' for help.
-------------------------------------------------------------------------------
groovy:000> :q
( echo :q | groovysh; )  16.56s user 0.31s system 201% cpu 8.384 total

That’s more than 8 seconds just to start up and shut down a prompt that I might just run one command in!

Numerical Development on OSX (in the Command Line)

Mon, 19 Aug 2013 00:00:00 +0000

I’ve been working on C implementations of my research projects, which can of course be a perilous project. I’ve found some tools that make it hugely, hugely better.

Homebrew

You can’t do a list like this without mentioning homebrew. You want homebrew instead of MacPorts or Fink or bailing twine and chewing gum or whatever else you were thinking about using. Just do it: You can find the homepage at brew.sh or just install with:

implementation of set operations

Wed, 13 Mar 2013 00:00:00 +0000

We got in a bit of a debate yesterday in the office over the implementation of associative containers, which I thought was pretty fun. We made up the big chart of complexity results you see below.

nomenclature:

$S$, $S_1$, and $S_2$ are subsets of $\Omega$.
Denote an element by $e\in\Omega$.
$n$,$n_1$,$n_2$,$N$ are the sizes of the set $S$, $S_1$, $S_2$, and $\Omega$, respectively, and $n_1 \geq n_2$.

Complexity

Operation\Approach	Hash Table	Hash Tree	Binary List	Entry List (sorted)	Entry List (unsorted)
$e \in S $	$O(1) $	$O(log(n)) $	$O(1) $	$O(log(n)) $	$O(n) $
$S_1 \cup S_2 $	$O(n_1+n_2) $	$O(n_1+n_2) $	$O(N) $	$O(n_1+n_2) $	$O(n_1n_2) $
$S_1 \cap S_2 $	$O(n_1) $	$O(log(n_1)n_2)$	$O(N) $	$O(n_2) $	$O(n_1n_2) $
space complexity	$O(n) $	$O(n) $	$O(N)$ bits.	$O(n) $	$O(n) $

As I said–this was just what came out of my memory of an informal discussion, so I make no guarantees that any of it is correct. Let me know if you spot something wrong! We used the examples $S_1 = {1,2,3,4,5}$ and $S_2 = {500000}$ to think through some things.

Computer Ressurection and Elastic Cloud Experimentation

Sat, 29 Nov 2008 00:00:00 +0000

I was home on Thanksgiving Break with Sharvil, and we decided to revive some old computers. Partly I’d like to experiment with some clustering stuff without incurring CPU time at the AMATH department or Teragrid stuff I’m likely gonna be working on soon with Shea-Brown’s neuroscience research. So, it turns out I resurrected about 5-6 old computers(final tally is still waiting on the number of successful Xubuntu installs on them, among other practical issues(where the hell am I going to put six computers…?): The very first computer I built(a P3 450), P3 700, Dual P2 266, a couple of AMD64 3200’s, and a Sony Vaio P3 733. The cool thing is that the neuron spiking models are basically embarassingly parallel(well, each run isn’t necesarily, but from what I’ve gathered so far, we’re looking for averages over a bunch of them. So, sweet! Again, this would be terrible for actual research, especially against something like TG or even Amazon’s EC2–which is another thing I really need to check out.

Fame and FORTRAN

Sat, 08 Nov 2008 00:00:00 +0000

I must be getting more popular on some search engines somewhere. I just got six random comment-spam messages. Awesome. I guess that’s why the more important bloggers have come to rely on Bayesian filters and soforth for taming the wild flow of spam. Hopefully that trend doesn’t continue.

Also, it seems as though I am now learning FORTRAN. I’m sortof starting working with Eric Shea-Brown on some Neuroscience research, working with HPC on NSF’s Teragrid. It’s pretty exciting stuff, and I’m really excited about getting moving on it. Anyways, back to FORTRANizing, I suppose.

Cython

Fri, 09 May 2008 12:00:00 +0000

After I had finally convinced myself to get out of bed this morning to go to my ACMS seminar, I quickly checked my email and my heart sank a little. Today’s talk was on SAGE. Don’t have anything against SAGE, but I thought it was just a big pile of open source packages in a big, heavy install. Sorta cool, but worthless, in other words.

Turns out, I was pretty wrong about that. It is that, but it’s also 70k new lines of code that does a whole bunch of exciting stuff. Near the end of his talk, William Stein mentioned that they had created a new tool called Cython. (Well, extended Pyrex, but… whatever.)