Sunday, September 28, 2014

Gay rights vs national prosperity

The other day I saw a post on reddit via /r/bestof that was a counter to the claim that "Every civilization that has accepted homosexuality has failed".  I've never heard this argument, but I'll accept that some people actually claim it's true.  The rebuttal was a list of several modern countries that banned homosexuality and several that had many gay rights.  The latter countries were the ones that were more well off.

While his argument seemed reasonable, it would be easy fall victim selection bias by picking the examples that you are well aware of.  I wanted to see if there actually was a correlation between gay rights and national prosperity.

Quantifying national prosperity is a common task.  I choose GDP (PPP) per capita, GNI per capita, GNI (PPP) per capita, and HDI.  HDI combines GNI (PPP) per capita with a education index and a life expediency index.

Quantifying gay rights was harder.  I knew of a survey that asked if gays should be accepted, but it wasn't asked in many countries.  I thought about using years that homosexuality had been legal or the punishment if it was still illegal.  However, as I was looking that up I found this Wikipedia page that listed 7 types of LGBT rights for each country.  They are, legality, civil union or similar, marriage, adoption, gays in military, anti-discrimination laws, and laws related to gender identity.  Each category has a green check or red x in it for each country.  I gave 1 point for a check, 0 for a x, and 0.5 points for both or a ? (eg, the US gets 0.5 for marriage since it's legal in some places and illegal in others).  Note here, that regardless of how you feel about those 7 things, it is safe to say that a country with more green checks is more friendly towards gays, and that is what we are trying to quantify here.

The results were pretty similar for each of the 4 measures of prosperity.  All were just about a 0.70 correlation between gay rights and prosperity, which is quite high indeed.  I used the 125 most populous countries.  Just for fun, I also looked at the correlation between population and gay rights, and as expected I found none (-0.04).

GNI (PPP) GNI HDI GDP (PPP) Population
0.70 0.68 0.69 0.68 -0.04

Scatter plots are always fun so here are some of those:


Wednesday, September 24, 2014

Odds of dying in shark attack vs driving to ocean

I've long held that there is a greater chance of dying in a car accident on the way to the ocean than there is of dying from a shark attack in the ocean.  I felt justified in this belief because I know car accidents are a large cause of deaths and that shark attacks are quite rare.  That being said, I never had any numbers to back this up.

The odds of a shark attack vary greatly depending on where you go swimming.  I had the idea briefly of making a map of the US that showed how far you'd have to drive from to equal the odds of dying from a shark attack there.  The problem was in estimating the number of swimmers in each state.

In fact, the only number for annual number of swimmers in the US was this Huffington Post article that claims it is 75 million.  This number seems high, and is unsourced.  The majority of people live near the coast, so it is possible that a quarter of them visit the ocean every year.

Number of shark attacks is easier to estimate.  Wikipedia lists 11 fatal attacks in the US in the 2000s, and 12 in the 1990sThis site lists 1.8 fatal attacks on average per year, and 41.2 injuries per year.

The NHTSA gives us the number for deaths per 100 million miles traveled as 1.13 in 2012.  I assume this includes miles traveled as a passenger. 

Using the 75 million swimmers figure, and the 1.8 fatal attacks and 41.2 injuries figures gives us these results:

You would have to drive 49 miles to have a greater chance of dying in a car accident than being injured by a shark attack.  You would have to drive 2.1 miles to have a greater chance of dying in a car accident than being killed by a shark.

There are a lot of caveats on these numbers.  First, I couldn't find injuries per mile driven, so keep in mind that 49 mile figure is dying in a car accident vs any shark attack injury, and it is also the round trip figure.

As I used the high 75 million swimmer figure, the shark attack odds could be higher.  That is 25% of the US population swimming in the ocean every year.  I would be surprised if the actual figure were lower than 10%, so that only increases the odds by 2.5, which means the round trip distance for equal odds of dying in a car accident vs shark attack would still only be about 5 or 6 miles.

Also, the vast majority of attacks in the US happen in Florida and Hawaii.  Avoid swimming in those two states and you reduce your shark attack odds to at least 1/4. 

Tuesday, September 16, 2014

Grain of sand sized piece of neutron star matter

So the realistic scenario would be, you reach out in curiosity to touch the grain of sand. Your fingertip gets 1 cm away, and you feel like it's being gently pulled on. You bring it half a centimeter closer, hrm, more of a pull, then you get to 1-2 mm away, and it suddenly feels like someone yanked on your nail as your finger suddenly snaps to the grain of sand (much like two closely spaced magnets suddenly snap into one another).

At that very instant, it feels like someone has grabbed a square millimeter of your fingertip with pliers, so you yank your arm back and that tiny little area on your fingertip is torn off in the process. Your fingertip now has a little crater on it, about 1mm in diameter, and looks like you had a teeny little wart removed.

Sunday, August 24, 2014

Cost of Living

Recently, someone posted a map of the cost of living for states in the US.  The issue was that the state level is way too high to look at that.  I decided to do it at the county level.

First I had to find a data source for the cost of living at the county level.  I found this site, which seems to have good county level data.  I wrote a script to scrape that data, and then had a csv with the cost of living and fips code for every county.

For plotting I followed this R tutorial.  It was easy enough to follow/adapt, but I noticed the result looked odd.  It was much more random than I would have expected, and some spot checking confirmed the plot was wrong.  It took me quite a while to figure out there were a few counties that are out of order in the maps package.  This still doesn't make sense as I used a match command so they should all be matched up.  Either way, I finally got an R script that plotted the data correctly.

The code and data are up on github.

Saturday, August 23, 2014


After hearing about this California drought nonstop, I made this quick perl script to make a gif from the pngs on this site.

I made one for the whole CONUS and uploaded it to Gfycat because the original is 14 MB.

Thursday, August 7, 2014

Learn X in Y minutes

This site is a great reference for people that already know how to program but switch between languages frequently, and get caught up on the subtle syntax differences between them.

Wednesday, July 16, 2014

Tax Brackets

I saw a post where someone was advocating a sliding tax scale instead of tax brackets.  The thing is I think many people misunderstand that tax brackets don't mean that two people making $37k and $89k pay the same tax rate, despite both belonging to the 25% bracket.

I wrote a quick perl script to calculate the effective tax rate for taxable incomes $0 - $1,000,000 and plotted it.  Let me stress taxable incomes here, this is after deductions are applied.

Here's another for just up to $100k:

The result is more or less a logarithmic curve, which is realistically the curve you'd end up with using a sliding scale since you couldn't have the rate rise to infinity.  The other alternative would be a logistic curve.

If you thirst for sweet sweet data here it is:

Monday, June 30, 2014

On being sane in insane places
Rosenhan's study was done in two parts. The first part involved the use of healthy associates or "pseudopatients" (three women and five men) who briefly feigned auditory hallucinations in an attempt to gain admission to 12 different psychiatric hospitals in five different States in various locations in the United States. All were admitted and diagnosed with psychiatric disorders. After admission, the pseudopatients acted normally and told staff that they felt fine and had not experienced any more hallucinations. All were forced to admit to having a mental illness and agree to take antipsychotic drugs as a condition of their release. The average time that the patients spent in the hospital was 19 days. All but one were diagnosed with schizophrenia "in remission" before their release. The second part of his study involved an offended hospital administration challenging Rosenhan to send pseudopatients to its facility, whom its staff would then detect. Rosenhan agreed and in the following weeks out of 193 new patients the staff identified 41 as potential pseudopatients, with 19 of these receiving suspicion from at least 1 psychiatrist and 1 other staff member. In fact Rosenhan had sent no one to the hospital.

Friday, June 13, 2014

Detroit: A Hurrican Without Water

This is a blog that goes through Google and Bing street view images and compares recent ones to old ones.  This shows the rapid deterioration of houses in Detroit.

Keep in mind the oldest images are from 2008, and the newest are from 2013.  So, 5 years is all it takes for a couple of houses to turn into couches.

Let me stress that these pictures are of the same place.

Friday, May 30, 2014

Everything there is to know about NES Tetris

When I was a kid, I kept track of the pieces I received in Tetris on a piece of paper, because I was convinced that the in game stats were lying (they weren't).

This guy wrote 25,000 words about the inner workings of Tetris (including an interesting RNG, which is actually truly random in practice).

Even if you don't understand or care about the technical details I recommend you skim through it, as there were a lot of interesting items.

Friday, May 2, 2014

The Graphing Calculator Story
People around the Apple campus saw us all the time and assumed we belonged. Few asked who we were or what we were doing.When someone did ask me, I never lied, but relied on the power of corporate apathy. The conversations usually went like this:
Q: Do you work here?
A: No.
Q: You mean you're a contractor?
A: Actually, no.
Q: But then who's paying you?
A: No one.
Q: How do you live?
A: I live simply.
Q: (Incredulously) What are you doing here?!

Sunday, April 27, 2014

Block editing text in Notepad++ or Geany

The other day I showed the concept of block editing text to 3 different people in 3 different contexts.  All were unaware of it, and agreed it was a very useful feature.

Block editing allows you to drag your cursor down across several rows of text and edit them all at once.  Two main uses are to either type on several rows at once, or to select a column of data to be copied to somewhere else.

In Notepad++ it is Ctrl + Shift, in Geany is it Alt + Shift.

Consider this data:
-7.1382    6918.30976
-7.4637    7244.35
-7.7947    7585.7757
-8.1307    7943
-8.4715    8317.637
-8.8169    8709.635
-9.1664    9120.1
-9.5199    9549.9258
-9.8770   10000

If you wanted to remove those negatives, there are a few ways you could go about it, but with the ability to drag your cursor across several rows it is a simple matter to drag it down those rows and delete the negative.

If you wanted to paste this into a spreadsheet it may be able to figure out how to separate the columns, but by just selecting one column at a time and pasting it you can guarantee each column will end up organized properly.

I think block editing is pretty self explanatory, and the stolen gif does a good job of showing it's potential.  However, I'd like to also touch on using find and replace.

Ctrl+H will bring up the find and replace box.  There is an option for regular, extended (escape sequences), or regex.  I prefer the extended option.  Let's say we wanted to enclose all the above data in quotes, separated by commas, and semicolons for each pair.  In other words we want this output:
"-7.1382", "6918.30976";
"-7.4637", "7244.35";
"-7.7947", "7585.7757";
"-8.1307", "7943";
"-8.4715", "8317.637";
"-8.8169", "8709.635";
"-9.1664", "9120.1";
"-9.5199", "9549.9258";
"-9.8770", "10000"

This is a pretty common format for input to various programs.  First, I would use block editing to change the spaces to ", " which admittedly wouldn't work for the last one.  If there were more of varied length I'd have to use a regex to check for multiple spaces (\s+).  The real usefulness comes from find and replace with newlines.

The newline character (\n) represents the each line break.  Search for that, and in the replace box enter ";\n"  If you want the output to just be on the same line leave out the \n from the replace text, maybe use a space instead.  On Windows you will likely need to use \r\n instead of just \n (\r = return, \n = newline).

These two techniques have saved me countless hours manually formating text.

Jack Churchill
Lieutenant Colonel John Malcolm Thorpe Fleming "Jack" Churchill, DSO & Bar, MC & Bar (16 September 1906 – 8 March 1996), nicknamed Fighting Jack Churchill and Mad Jack, was a British soldier who fought throughout the Second World War armed with a longbow, and a Scottish sword (a basket-hilted claybeg commonly but incorrectly called a claymore). He is known for the motto "any officer who goes into action without his sword is improperly dressed." Churchill also carried out the last recorded bow and arrow killing in action, shooting a German officer in 1940 in a French village.

Monday, March 24, 2014

Backups to/from a Linux machine from Windows

I'm currently using Windows 7 on a laptop and Linux at home on my PC.  I plan on switching the laptop to Linux at some point, but for now I wanted to be able to back files up to my home PC over the internet.  Now if you're a normal person there are plenty of services that do this for you (eg, Dropbox), but if you're me, then you insist upon doing it manually and having total control.

Linux on Windows
Step one is to get a Linux like environment on Windows.  This means installing Cygwin.  You will want to install openssh, rsync, git, and whatever languages you may want to program in. 

With Cygwin you should be able to directly use linux commands in the Windows command prompt.  Use 'ls' to test this.

SSH (Secure SHell) allows you to log in to a remote machine over an encrypted connection.  Once logged in you have the same control as you would in a shell prompt on that machine locally.  It supports public key encryption so that we can login without a password prompt.

As it is, you can log in with just the password using ssh dale@  Replace dale with your account on the remote machine.  You can use an actual IP to log in over the internet, but first you have to forward port 22 on your router.

Generate Keys
Now it is time to generate some ssh keys to allow you to log into your home computer remotely.  There are two ways to use ssh keys.  You can generate keys that require a password to use, or you can generate keys that do not require a password.  If you use passwordless keys, then you will have a private key file that anyone could use to login to any computer where you've set it up to allow logins with that key.  Of course, you can keep the key on an encrypted laptop, and if they laptop is stolen or you have reason to believe the key was stolen it's as easy as deleting the key from the remote computer.

Passwordless keys are nice, and I use one to admin my site, but for my home PC I didn't want to allow access with just the keyfile.  This complicated the commands slightly, but I will get to that later.

Generating keys is easy.  Just run ssh-keygen, and answer the questions.  You now have the .ssh directory in your home directory.  Inside, should be (at least) two files.  First is id_rsa, which is your private key and must be protected.  The second file is, and that is your pubic key.  As the name implies there is no need to keep this a secret.  You will place this file on any machine you want to be able to log in to.

To allow you to log in to a remote machine, find the .ssh directory in your home directory, and then the authorized_keys file in it.  Copy and paste the public key text string into that file.

You may need to restart some processes, but you should now be able to log in from the laptop to the remote machine.  You use the same command as above, but with the username you put in the ssh key (which can be the same as you use on different machines, Linux will pair it with @host to make it unique).

Now that you have the ability to log in to the remote machine you can make some scripts to move files between them.  rsync is nice for this because it will automatically use ssh to tunnel the connections to hide the data on public networks, and it does a good job of only sending the data that has changed.

Because I used a password with my key I wanted to combine the rsync commands into one, so I only had to enter my password once.  I made 4 batch files.  Two each to receive files, and to send files, and then one set for going over the local network, and one for going over the internet.

My local receive command looks like this:
rsync -ahvz ---progress --stats dale@ /cygdrive/C/Users/Dale/Documents/

You can check the flags if you want, but I found those work well for me.  Note that it takes all the contents of the Documents folder on the remote machine and dumps it into the Documents folder on the laptop.  Also notice how cygwin handles drives.  You use /cygdrive/C/ to represent C: all forward slashes.

For sending files it was a little harder:
rsync -ahvz --progress --stats Other Programming School dale@

I can't remember exactly why, but I couldn't just use the whole folder.  I had send those individual folders.  It took a little while to get it, but this syntax works.  Note that 'Other' 'Programming', and 'School' are each files on the laptop that need to be sent to Documents on the remote machine.

For over the internet versions, just replace the IP with your home IP, and make sure you set up port forwarding.  If you plan on accessing the remote machine over the internet, it makes sense use the host file to create a shortcut for the IP.  The host file is at: C:\windows\system32\drivers\etc\hosts,  Simply add a line with the IP and then a name, like: 123.456.78.90 home.  Then you can replace that IP with the name in scripts or on the command line.

These are all pretty rudimentary, but they have been working well for me for a few months.   As it stands, they never delete a file, so if you move something you must manually delete it's old location on both the remote machine and local between syncs.  There are flags to do this automatically, but I kind of like it as is.

Sunday, March 23, 2014


I never know what the world looks like outside my little internet bubble.  I'm often surprised when I bring something up that has, from my perspective, been talked about constantly for weeks and people don't know what I'm talking about.  I have a feeling this 2048 game is a case of this.

It has spawned an absurd number of forks and variants.  At one point there were 10 threads about it on the top 100 posts on hackernews.  I thought it was some crazy new revolutionary technology.  Nope, silly flash game.

Anyway, while I didn't care about the original that much, one of the clones has already eaten several hours of my (very valuable) time.

In case you haven't played any of these, the point is to move squares with the same number together and they will merge into a single square with the sum.  You end up with powers of 2, and are aiming for the highest possible value.  The goal being the eponymous 2048 square.

In the tetris version the pieces will fold together due to gravity, and can be quickly moved left and right for unlimited length combos.

The key is a good organization strategy, that is robust enough to handle the 4s and 8s.  I've gotten to 1024 a few times, but 2048 eludes me.

Another random variant, is this one where you are placing the blocks and the AI is playing against you collapsing them.

Sunday, March 16, 2014

The Earth and Moon

I stumbled upon this image of the Earth and the Moon photoshopped together to scale.  I decided to make my own in 1920x1080.  I took the images and the average radii, orbit, and inclination from wikipedia

8 kB PNG:

28 kB JPG

Wednesday, March 12, 2014

What’s gone wrong with democracy
Yet these days the exhilaration generated by events like those in Kiev is mixed with anxiety, for a troubling pattern has repeated itself in capital after capital. The people mass in the main square. Regime-sanctioned thugs try to fight back but lose their nerve in the face of popular intransigence and global news coverage. The world applauds the collapse of the regime and offers to help build a democracy. But turfing out an autocrat turns out to be much easier than setting up a viable democratic government. The new regime stumbles, the economy flounders and the country finds itself in a state at least as bad as it was before. This is what happened in much of the Arab spring, and also in Ukraine’s Orange revolution a decade ago. In 2004 Mr Yanukovych was ousted from office by vast street protests, only to be re-elected to the presidency (with the help of huge amounts of Russian money) in 2010, after the opposition politicians who replaced him turned out to be just as hopeless.

Wednesday, February 19, 2014

The Highest-Denominated Bill Ever Issued Gives Value to Worthless Zimbabwe Currency
A 100-trillion-dollar bill, it turns out, is worth about $5.

That's the going rate for Zimbabwe's highest denomination note, the biggest ever produced for legal tender—and a national symbol of monetary policy run amok. At one point in 2009, a hundred-trillion-dollar bill couldn't buy a bus ticket in the capital of Harare.

But since then the value of the Zimbabwe dollar has soared. Not in Zimbabwe, where the currency has been abandoned, but on eBay.

The notes are a hot commodity among currency collectors and novelty buyers, fetching 15 times what they were officially worth in circulation. In the past decade, President Robert Mugabe and his allies attempted to prop up the economy—and their government—by printing money. Instead, the country's central bankers sparked hyperinflation by issuing bills with more zeros.

Sunday, February 16, 2014

Cryptic Crossword: Amateur Crypto and Reverse Engineering
Something that my friend had noticed was that when we scrambled a puzzle twice in a row, the two keys would be different, but only in the first half. The third and fourth digits were the same. At first I thought that this might be due to scrambling the same grid, but further exploration suggested that it was entirely due to temporal proximity. So naturally, I tried running two instances of the Across Lite program at the same time, and hit Alt-S S on both of them as quickly as possible. In this way I obtained two grids scrambled with the same key.

With this technique, I had my first inroad, a way to start making some actual progress in the investigation. I could now create two crossword grids that differed in some specific way, scramble them both with the same key, and then compare the results, seeing directly how a change input affected the scrambled output.

Monday, January 27, 2014

Truth Tables

I made a truth table generator.  There are many of these, but as usual, none met my standards for working exactly how I wanted them to.

You can change what operators are used to represent AND, OR, and NOT, and you can use short hand of writing variables together for AND or OR.  It also shows what actual logical comparison it makes in javascript so you know if what you are writing is being interpreted correctly.  I wanted to make the columns in the table color code to show which functions were equivalent, but that was annoying; so it just displays decimal values at the end for you to manually compare.

Tuesday, January 21, 2014

Snapping windows to half screen in Fluxbox

I've recently begun using Windows 7 more heavily, and one feature I do miss when on Linux, despite initially scoffing at it, is the ability to snap a window to half the screen.  I figured this wouldn't be too hard to replicate, and it turns out I was right.  There is a program called wmctrl which should be usable with any windows manager, but Fluxbox had some built in commands I used.

I stole a very good idea from this thread about using the numpad to move windows to either the left/right, top/bottom, or the four corners of the screen.

Here is what I added to my ~/.fluxbox/keys file:
#1920 / 2 = 960
#1080 / 2 = 540
Control KP_0 :Minimize
Control KP_1 :MacroCmd {ResizeTo 958  540} {MoveTo 00 00 LowerLeft}   
Control KP_2 :MacroCmd {ResizeTo 1920 540} {MoveTo 00 00 LowerLeft}   
Control KP_3 :MacroCmd {ResizeTo 958  540} {MoveTo 00 00 LowerRight}   
Control KP_4 :MacroCmd {ResizeTo 958 1060} {MoveTo 00 00 UpperLeft}   
Control KP_5 :Maximize
Control KP_6 :MacroCmd {ResizeTo 958 1060} {MoveTo 00 00 UpperRight}   
Control KP_7 :MacroCmd {ResizeTo 958  540} {MoveTo 00 00 UpperLeft}   
Control KP_8 :MacroCmd {ResizeTo 1920 540} {MoveTo 00 00 UpperLeft}   
Control KP_9 :MacroCmd {ResizeTo 958  540} {MoveTo 00 00 UpperRight}  

Note the subtraction of 2 pixels from the left/right measurements to account for window borders.  Also, note the subtraction of 20 pixels from the full height measurement to account for my taskbar.  Finally, note that the title bar will be overlapped when stacked vertically.

Here is a sample after using Ctrl+7, Ctrl+1, Ctrl+6:

Friday, January 17, 2014

Wednesday, January 1, 2014

The Best of The Onion

A reddit thread asked for favorite headlines, and someone at The Onion coded up a page for the highest voted ones.

Wednesday, December 18, 2013

Boston Dynamics

Google buying Boston Dynamics is old news at this point, but I wanted to see what kind of things they were up to after that Bigdog robot.  If this video doesn't seem that impressive, skip to about 3:55, and imagine that thing walking towards you, possibly carrying a plasma rifle of some sort, asking you for your real name.

Saturday, December 7, 2013

Wednesday, November 6, 2013

Electrocution in Water
How Dangerous is it to swim in a pool when there is live wire in the water? What are the chances of electrocution? Take a look and you may get some ideas!

Friday, October 4, 2013


Jesus already gave me two burrito forks. One at the end of each arm. They’re called fucking HANDS.

A fork. My god. I haven’t cried since I was six, but I’m fucking sobbing now.

People eat burritos with forks?

God is sorry he made us.

Friday, September 20, 2013

New Jersey Municipalities

So by popular demand, a post on the types of municipalities in New Jersey.

The structure of NJ Municipalities is rather unique, and complicated.  To begin, we must discuss the 5 traditional (ie used before 1900) types, all of which were recently redefined in the late 1980s:

Township is the oldest forms.  It began as a direct democracy with a town hall meeting.  That system was replaced in the late 1800s and then again in 1989.

Boroughs began as a special form, requiring an act of legislature for each.  In the late 1800s any township or area not exceeding 4 square miles and a population of 5000 could form a borough.  Boroughs were considered a separate school district and thus could avoid paying taxes as well as exercise greater control over their own schools.  This, of course, led to the infamous Borough Fever.  The legislature removed the ability for boroughs to self-incorporate.  The latest rewrite came in 1987.

Cities were created in the late 1800s by various special laws, with no real pattern, besides a max population cap of 12,000.  In the late 1980s when all the municipality laws were rewritten there were only 11 cities. 

Towns were created in the late 1800s for municipalities over 5000.  The law was rewritten with the rest in the late 1980s.

Villages were also created in the late 1800s for areas of at least 300 people.  In the late 1980s rewrite villages were all but abolished (there was only 1 at the time).  Now, they operate under the same rules as a township, but with some name changes.  For example, mayor is called president of the board, which is a terrible name for the leader of a village, it should probably be chieftain.

Currently, all 5 of these types of municipalities are legally equal.  This is opposed to other states, where townships are made up of towns or boroughs.

Now that we've discussed the 5 types of municipalities, we must discuss the 12 forms of government.  To begin, each of the 5 types has a default form of the same name.  They may either keep the default or change to one of the 7 modern forms.

To begin, there is the special charter, which is another name for 'other'.  As in, a special charter granted by the state legislature which doesn't fit one of the other forms.

Next there is the Walsh Act, and the 1923 Municipal Manager Law.  Both created in the early 1900s.

The most important of the modern forms, though, are the 4 Faulkner Act forms.  The Faulkner Act (aka Optional Municipal Charter Law) was passed in 1950 and created 4 new optional forms: Mayor-Council, Council-Manager, Small Municipality and Mayor-Council-Administrator.  Each of these forms have several sub plans, designated by letters (eg plan B).

For some stats I went to this wiki page:

Unfortunately, it doesn't have most forms, only the types.  It also seems to be wrong in a few cases.  I used the links to the individual pages, and scraped the form from them directly.  Even so, I know there are at least a few errors.  For example, there are no more village forms, the last changed a few years ago.  My data still lists 1.  Still, I hope that it is accurate for a few years ago.

Faulkner (all)13649223,41732,983277,140
Faulkner Act (Mayor-Council)7349230,71945,143277,140
Faulkner Act (Council-Manager)431,17022,86622,92962,300
Walsh Act3054,30012,14466,455
Faulkner Act (Small Municipality)171,6735,3577,32726,535
Special Charter108,21324,33829,12666,522
1923 Municipal Manager Law76724,13628,87184,136
Faulkner Act (Mayor-Council-Administrator)313,18325,85026,59240,742

This is a good PDF with a good historical overview:

This page gives a good overview of how each form operates:

Here's my data:

Tuesday, September 17, 2013

High Point Itineraries

A week or so ago I made a post comparing state high point elevations to dates of admission to the union.  Todd commented that he wanted to see itineraries for the two different trips.  I thought it wouldn't be that hard to make a site to generate the itineraries for any HP trip, and it would be good practice at both Rails and Github.  I've done about as much work as I see myself doing on it, so here it is:

It should be pretty easy to figure out, but I'll go over the details here.

Probably the biggest negative is that it does not support Alaska and Hawaii.   You can't drive to Hawaii, and probably wouldn't to Alaska, so they don't really lend themselves to pregenerated itineraries.  I thought about adding some base time like 12 hours for each to represent a flight, but then I needed to account for time to nearest airport, which is pretty significant for some of the HPs.

You enter state abbreviations in the top box, in the order they will be traveled.  Use any separator.  Click on the dots on the map (red for HPs, green for cities) to add them to the end of the list.  Cities are abbreviated with 3 letter airport codes.  You can use full names, but only if there are no spaces in the name.  There are currently 5 built in trips with buttons to add them to the box.  These include two of our trips (NE, SE).

After entering states and cities in order, you can click 'create' to be taken to the itinerary page.

Clear will clear out the box, and reverse will reverse the order of the points in it.

There are some options below the main box, all can be left as their defaults.

Daily start/end time is the time of the day you will begin or stop hiking or driving (use either 24 hour time or 'p' for pm hours).  For example if you want your days to start at 4am and end at 9pm, you would enter 4 in the start box, and either 21, 9p, or '9 pm' in the end box.  If you enter a start time after the end time it will just swap them.

Driving/Hiking time scale is what to multiply the default data by.  The driving times come from Google directions, which I find to be slightly slow, so I might use 0.9 there.  The hiking times aren't great, they just assume 2 mph average hiking speed.  I compared those times to our hikes and they are reasonably close.

Overhead time is the amount of minutes to add to every hike.  Consider this to be the time taken to get out of the car and stop at summit for picture.  Somewhat confusingly it will be doubled for most hikes, because it is considered one way.

The display page should also be pretty self-explanatory.  It lists the day, with the day count and actual date.  The code can handle any starting date, but there is currently no place to enter one other than today, mainly because I didn't feel like parsing dates in javascript.  There are two types of task, drive or hike.  For each it lists the start and end time (in 24 hour format), the duration (in decimal hours), and distance (miles).  There is also a link.  For drives it's to Google Directions, and for hikes it's a Google search for the peak and, which should take you directly to the page for most.

Note here that (RT) means round trip, and that is what most hikes will be.  The exception will be if the HP is either the starting or ending terminus.

Note that it does seem to handle multi day drives.  Although, I'd be careful to double check them.

At the end is total distance and time for the drive and hike.  There is no overall Google Directions link because the syntax of the URL was more annoying than I felt like dealing with right now.

You can link directly to the URL given for an itinerary:;CT;RI;MA;NH;ME;VT;NY;NJ;PHL

Some other random notes:

I've deployed this to Heroku.  I suppose I should give them credit for free hosting of a dynamic site.  However, at least half the development time was spent trying to get it to work.  The main issue was their 'toolbelt' they insist you use kept breaking my Rails installation.  Then there was the fact that I pretty much had to switch to PostgreSQL for development to make uploading my database work.  That being said, once I got it working it was pretty easy to work with.

The code is pretty sloppy overall.  Maybe I'll refactor it all one day.

I'd vaguely like to add traveling salesman logic to find the fastest route.  I honestly don't think this would be that hard, but isn't likely to happen any time soon.

This is by far the most time I've ever spent for what amounts to a one-off mildly-funny inside joke.

Here's all the code:

Monty Hall Problem

I discussed this problem a while ago, but as it is one of my favorite probability problems I felt it deserved a revisit.  What makes the problem great, is that nearly everyone (myself included) gets it wrong at first.  Wikipedia even claims: "Paul Erdős, one of the most prolific mathematicians in history, remained unconvinced until he was shown a computer simulation confirming the predicted result (Vazsonyi 1999)."
Suppose you’re on a game show, and you’re given the choice of three doors. Behind one door is a car, behind the others, goats. You pick a door, say #1, and the host, who knows what’s behind the doors, opens another door, say #3, which has a goat. He says to you, "Do you want to pick door #2?" Is it to your advantage to switch your choice of doors?

Yes; you should switch. The first door has a 1/3 chance of winning, but the second door has a 2/3 chance. Here’s a good way to visualize what happened. Suppose there are a million doors, and you pick door #1. Then the host, who knows what’s behind the doors and will always avoid the one with the prize, opens them all except door #777,777. You’d switch to that door pretty fast, wouldn’t you?

This is the article that brought the problem to a large audience.  She published many of the responses, which are almost all claiming she is wrong.

Friday, September 6, 2013


So for the last hour I've been zooming in on fractals.  This Xaos program is pretty sweet.  Left click to zoom, p for random color scheme, and a digit for a different fractal (1 to jump back to default zoom of Mandelbrot).  If you go into Calculation > Iterations, and turn it up (mine is at 500) it'll let you zoom further.

The United States of America: Onwards and Upwards

I awoke this morning to some highpoint trivia, which is obviously the best way to start the day:

(11:11:59 AM) Todd Nappi: well you probably didnt read that HP records post I sent you awhile back 
(11:12:06 AM) Todd Nappi: But there are two up for grabs
(11:12:13 AM) Todd Nappi: doing them in Ascending height order 
(11:12:20 AM) Todd Nappi: And doing them in order of admittance to the union

My first thought was that order of admittance would actually work pretty well.  It would move generally west, and you end in one of the far travel ones of Hawaii and Alaska.  Then, I realized that ascending order would be pretty similar.  I wondered how similar and have been practicing R lately, so I found out.

To start with the big reveal, the correlation between state highpoint elevations and dates of admission is 0.703.  What's more, p = 0.000000013, or about a 1 in 77.5 million chance of random data of this size giving this correlation.  This is pretty conclusive: The United States has been adding states of ever increasing height in an effort to increase average state highpoint elevation.

I did some calculations and here is the relationship between year (y), and height (h):
`y = 0.01338 h + 1757.6`
`h = 74.738 y - 131 360`

Using these formulas we can predict that the US will annex Nepal/Tibet sometime around 2146, and Mars in 2748.

Tuesday, September 3, 2013

A basic intro to Git and Github

Git is a system for making backups of source code.  It allows you to atomize each change you make to the code which can then be rolled back independently of later changes.  Github is a website which people can publish their code which is tracked in git to.  It allows for easy collaboration between multiple people on single projects.

I've wanted to learn more about Git for a while, since it is the trendy version control system.  However, it has a notoriously steep learning curve.  As such, there are many overviews available online.  In keeping with the general theme of this blog I've decided to provide my own inferior overview.

As I said, there are many guides online, but I liked this one.

The main thing I want to summarize is the different conceptual levels a file goes through in the backup process:
  • To begin, we have untracked files.  All your files will begin at this level, and git will do nothing with them until you tell it to.

  • The first time you add a file, it becomes tracked.  This means git will monitor it and when you run git status, it will alert you to changes.

  • You must explicitly stage files which are part of a single change.  Files that are staged will all by saved by the next step.  You stage files with git add.

  • When you commit changes it takes the currently staged files and makes a record of the changes.  At this point the changes are backed up locally, and you can roll them back later.  You can skip the staged step and directly commit any tracked file that has changed with git commit -a.

  • For remote backups, you can push your changes to a site like Github.  Once the files are uploaded there, others can see them and download them with pull.
This is the basic gist of using git as a single user backup system.  If you want to collaborate on files that's when things like branches and merges become more useful.

Friday, August 30, 2013

Anatomy of a hack: How crackers ransack passwords like “qeadzcwrsfxv1331”
One of the things Gosney and other crackers have found is that passwords for a particular site are remarkably similar, despite being generated by users who have never met each other. After cracking such a large percentage of hashes from this unknown site, the next step was to analyze the plains and mimic the patterns when attempting to guess the remaining passwords. The result is a series of statistically generated brute-force attacks based on a mathematical system known as Markov chains. Hashcat makes it simple to implement this method. By looking at the list of passwords that already have been cracked, it performs probabilistically ordered, per-position brute-force attacks. Gosney thinks of it as an "intelligent brute-force" that uses statistics to drastically limit the keyspace.

Where a classic brute-force tries "aaa," "aab," "aac," and so on, a Markov attack makes highly educated guesses. It analyzes plains to determine where certain types of characters are likely to appear in a password. A Markov attack with a length of seven and a threshold of 65 tries all possible seven-character passwords with the 65 most likely characters for each position. It drops the keyspace of a classic brute-force from 957 to 657, a benefit that saves an attacker about four hours. And since passwords show surprising uniformity when it comes to the types of characters used in each position—in general, capital letters come at the beginning, lower-case letters come in the middle, and symbols and numbers come at the end—Markov attacks are able crack almost as many passwords as a straight brute-force.

Thursday, August 29, 2013

An overview of open source licenses

I know what you are saying, 'wait a minute, you already did a post on open source licenses in May of 2011'.  Well yes, but that was more of an intro to the concepts of open source licenses, and the virus like nature of the GPL.  Here I will do more of a review of the major licenses, as well as some good sites I've found to explain them.

During my recent WMGK parser post, I published the code to github, and that required picking a license.  This lead to much more research than would strictly be necessary for something no one will ever see or use ever.  I want to now spread that knowledge to all of you.

In uncharacteristic fashion, I will give a quick summary of the licenses here, as opposed to forcing you to read through all my drunken ramblings (although we are already three paragraphs in):

  • If you provide no explict license on your code it will default to the most restrictive one possible.  That is, no one will be able to redistribute your code in any way, unless they get your expressed written consent first.

  • If you want people to be able to do anything they want with your code, and not even have to attribute you, then use Unlicense, which is effectively the public domain, with a disclaimer that you can't be sued if the code burns down their house.  This is good for short code snippets that aren't worth a complex license.

  • If you want people to be able to do anything they want with your code, as long as they attribute you, then you can use the MIT or BSD licenses.  The MIT license seems to be somewhat more popular, and is probably simpler to read.  The BSD license has a 2 clause version that is basically the same as MIT, and a 3 clause version that also prohibits the use of your organizations names to endorse the derivative works.  There is also the Apache license which prevents people from using software patents to sue, it is probably better than pure MIT for longer programs.

  • If you want people to be able to release programs based on your program, as long as they also release the source code to those derivative works, then use the GPL.  There are two versions in common use, v2 and v3.  The main update was an attempt to deal with software being released on physical devices, as well as some software patent stuff.

If you want to review software licenses in greater depth, and I know you do, here are two good sites with simple color coded overviews.

Monday, August 26, 2013

An analysis of radio song play frequency for 102.9 WMGK

Recently I had to drive a car with only a radio for music.  After just a few days I was annoyed at the variety, or lack thereof.  I decided to scrape the station's web site for recently played data and see what the long term trends were.

The local classic rock station is 102.9 WMGK.  They have a recently played page which is easily scrapable.  This is contrasted to other classic rock stations that I wanted to scrape for a comparison that only listed the last few songs, or embedded it in flash.

I began scraping on June 26th and August 24th makes 60 days worth of data.

Thanks to my recent post, we know classic rock songs average 4:48 in length.  There are 86,400 minutes in 60 days.  That would be enough time for 18,000 songs.  If we assume only 75% of the time is actually music that's 13,500 songs.  During these 60 days WMGK played 14,286 songs.

I won't speculate about what this means about the actual percentage of music on the air.  Slight changes in average song length have a big effect on numbers.

So, now it is time for the interesting questions.  How many of those were unique songs?  How many songs would be needed to represent 25% or 75% coverage?

In case it isn't clear what I mean by coverage, I mean how many songs represent 10% (or whatever) of the total songs played.  For example, if a station played 1 song 10 times, and then 10 other songs 1 time each, for a total of 20 plays, that 1 top song would give 50% coverage.

So, without further ado here are the stats:


924 unique plays out of 14,286 means about 6.5% of songs were being played for the first time in 60 days.  Honestly, that's not bad.  However, the 50% and 75% coverages are awful.  I have 30k unique songs in my playlist, and that's not even particularly impressive.  Admittedly, they aren't all one genre, but Led Zeppelin has 86 studio songs, all of which would be suitable for a classic rock station.

The key take away is that there are about 300 to 350 songs that get played at least every other day.  Then, they occasionally toss in a deep cut.  There is no middle ground; either a song is played every other day, or once every few months.  I made some (read: over 100) graphs that illustrate this, but for now let's stick to tables.

Want to guess what the most popular bands or songs are?

Top songs:
Plays per 30 daysBandSong
27.5Warren ZevonWerewolves Of London
27CarsJust What I Needed
27Blue Oyster CultBurnin' For You
27Steve Miller BandRock 'n Me
26.5SupertrampThe Logical Song
26.5David BowieChanges
26.5Pink FloydAnother Brick In The Wall
26.5Electric Light OrchestraDo Ya
26J. Geils BandCenterfold
26WarLow Rider

Top Bands:
Plays per 30 daysBand
356.5Rolling Stones
334.5Led Zeppelin
183Pink Floyd
169.5Van Halen
138.5Billy Joel
135.5Tom Petty And The Heartbreaker
135.5Steve Miller Band
135Electric Light Orchestra
107.5Bad Company
105.5Creedence Clearwater Revival
105Elton John

I would not have guessed most of those top songs.  Note how much higher the top three bands are than the rest; there is a second smaller drop off after Foreigner.  Also interesting is that none of the top three bands have a top 10 song.  I would have also guessed Beatles to be the top band.

Let's start looking at the graphs.

Here we have two graphs of bands.  The first is limited to the top 50 and has labels.  The second is all bands, and without labels, just showing the general trend.

These next two graphs really show the tendency to play the same songs.  The first shows how many songs were played various number of times in 60 days.  There are three clusters.  First, songs that were played once or twice.  Then there is a wasteland from 11 to 26 plays in 60 days.  After that, there is the main group of songs that are played every other day.  That tapers off, and leads to the last group of songs which are played almost every day.  Keep in mind that the number of plays compounds the number of songs in that group.  20 songs each played 35 times is a lot more plays than 200 songs played once.
The second graph that illustrates this point is this one of every single song.  It is a bit confusing as there are way too many songs to see individual bars, but you can look at the slopes to see the same three groups as before.  The top songs as the peak on the left.  Then the plateau of the normal roster, followed by a steep drop off to the songs played a few times.  The steep drop off illustrates the lack of a middle ground.

The last general WMGK graph is this one that shows the average daily plays in a given hour of the day.  It shows there is a drop off in the morning hours from 5am - 8am.  The night hours of 12am - 4am are the highest.  It's interesting that there is a clear rise at midnight.  I don't think WMGK has a morning show, so I'm not sure why there is the drop off.  At first I thought they increased ads during peak commuting hours, but there is no drop off in the evening.  My guess is they must provide short news and traffic conditions during those hours.

I made that graph so that I could compare individual bands to see if different bands were being played more at certain times (eg longer songs being relegated to the middle of the night).  Unfortunately, I don't have enough data to really answer this.  A typical band might have a few plays in a given hour for the entire 60 day period.  I suppose I could have increased the buckets to 6 hour windows, but was too lazy to code this.

The rest of the graphs compare WMGK plays to plays for a given band.  I'll post a few here.  All of the graphs are in this gallery on my site.

I had a hard time with the comparisons.  First, there was the fact that some of the titles had subtle differences.  This meant I had to hard code the differences in a hash.  There is also the problem of songs with multiple versions on, this will tend to change the order slightly.  Also, the api only gives play data from the last 6 months.  For most classic rock this doesn't matter, but, eg, David Bowie had a recent album, and thus his graphs are hugely skewed towards it.

Then I couldn't decide which songs to show.  I ended up making two graphs for each band.  One shows all the WMGK plays and the other shows the top 20 songs.  Each has the other's plays on them, but are sorted differently.  I think the graph is more interesting, as it shows which popular songs WMGK doesn't play.  In their defense, some songs simply aren't classic rock.  On the other hand, the WMGK sorted graph shows the sharp drop off when you go from the frequently played songs down to the deep cuts (that aren't that deep).

For example, here are the two Billy Joel graphs:

I think they both show the vast difference of opinion of what good Billy Joel songs are.


A very different Genesis:

Led Zeppelin, showing the drop off:

A not-that-surprising Pink Floyd.  Except, why does WMGK love Young Lust?

Not big Rush fans:

WMGK doesn't seem to know where the Styx CD gets good:

A crime against humanity:

Finally, I used this as an excuse to learn some git and github use.  All the code, text files, and graphs are available there. Here are some files of interest:

Text file of all plays

Text file of all songs, with play counts

Text file of all bands, with play counts

Perl script to scrape WMGK site

Perl script which provides general stats and graphs for all WMGK

Perl script which provides specific stats from one band passed to it on command line

Wednesday, August 21, 2013

Genre Average Song Lengths

I was working on a different post and had to write a script to get the average song length of a genre from I wrote a quick perl script and figured I'd get some other genres to compare. I grabbed the 250 top tags, and then grabbed the 50 top tracks for each tag.

I didn't really expect anything revolutionary.  Prog has longer songs than punk.  Some of the tags don't represent genres (eg Live), but I didn't remove them, both because they are interesting too, and I'm lazy.  Here are the results.

There's a slight positive correlation (0.23) between song length and ranking.  That is, longer genres are generally less popular.  Mean and median length across all genres are quite close at 4:20, and 4:14 respectively.

A graph and table of the top 50 most popular:

RankGenreAverage Length
48progressive metal6:40
26progressive rock6:02
24black metal5:38
29heavy metal5:14
40thrash metal5:14
19hard rock4:51
9classic rock4:48
27death metal4:34
50hip hop4:31
10alternative rock4:25
8female vocalists3:58
2seen live3:41
13indie rock3:39
31punk rock3:26

Sunday, August 18, 2013

A Comparison of PNG Compression Methods

You may remember my past post about which image format you should be using.  The summary was use PNG for computer generated graphics.  I briefly touched on the fact that PNG allows you to use 24 bit or 8 bit colors.  It is, however, much more complicated than that.


There are three color modes available in PNG.  The mode with the fullest color space is RGB 24 bit.  The second mode is indexed.  It allows you to use 8 bits or less for up to 256 colors.  Additionally, it allows you to choose the palette.  The last mode is grayscale.  As the name implies it's black and white, but allows a gradient of grays.  By comparison, a 1 bit indexed mode would only allow full black or full white, no grays.

Generally, switching to indexed offers a pretty big savings in file size.  However, you can often get even greater savings by going to less than 8 bit color.  Doing so will generally require generating a custom palette to take full advantage of the reduced color space.  If the file is black and white, it can sometimes be worth it to switch to grayscale or to 1 bit indexed.

There really is no simple way to know which will be best, besides trial and error.  Doing this for one file is a bit annoying, but doing it for many files is pretty unreasonable.  As such, many PNG compressors have popped up to help automate this process.

It turns out that the compression process is not so clean cut, and thus many of the various programs perform differently on various files.  I decided to look into if my current program was a good choice.  I found a few old comparisons, but not any from the last few years.  I decided to just download a bunch of programs and try them out.

I didn't download too many, preferring those that either had recent development, or were spoken highly of somewhere.

Programs Tested

So without further ado, here are the candidates:

TinyPNG - Has been my go to.  It's the only web based option here.  It's quite fast and has a great interface.

pngquant v1.8.3 - Some research showed that pngquant was the library being used at TinyPNG.  I figured if I could have the same compression as TinyPNG as a command line tool, that would be ideal.

pngcrush v1.7.9 - This is the standard PNG compression utility.

optipng v0.7.4 - Another standard utility, based on pngcrush.

pngout 20130221 - Based largely on the recommendation of this 6 year old Jeff Atwood post.  It is still being developed though.

Before I get into the results I should note the command line switches used.  In all cases its none.  Some of the utilities are geared more towards automatic use than others.  There is no doubt you could achieve much better results if you were willing to experiment with the options.  However, that wasn't the use case I was testing here.  I was concerned with processing dozens or more files quickly and easily.

If you want to see the images used I uploaded them to imgur here.  Unfortunately, imgur is no dummy, they compressed the files themselves.  They also resized the largest ones and converted some to jpg when that made sense.  If you want all the files, both the original and the results, I uploaded them as a zip to mediafire.


Filesizes in KB:

Ratio to the best compression method:

In the first table sum for all columns is just the sum of the column.  In the second table sum is the ratio of sum to the smallest sum.  In other words, a sum ratio of 100% should indicate a perfect score, but it's skewed by the large file sizes.  The geometric mean solves this for the normalized ratios.  Or does itYeah, probably.


I think the take away here is that TinyPNG is the winner.  That being said, you'll note there were three cases where it did significantly worse than pngout.  Images b and o were both black and white, and pngout turned b into grayscale, but both left o as indexed.  What's more, e was grayscaled by pngout for worse size than the indexed tinyPNG version.  Pretty clear evidence that grayscale vs indexed  isn't clear cut.

quantpng is consistently a few percent higher than TinyPNG, lending credence to the idea that TinyPNG is using quantpng, but then also doing something else to squeeze a bit of filesize out.  On the other hand, quantpng actually made a few files worse, so maybe there is some overhead on its end.

The alert reader will have noticed there is a seventh algorithm listed which wasn't defined above.  t2o is a response to the fact that either TinyPNG or pngout was the optimal algorithm.  It is the TinyPNG files ran through pngout.

While it almost always does better than TinyPNG or pngout alone, you'll note the one case where it failed to return to the level pngout had achieved on the raw file.

I suppose my strategy will be to continue to use TinyPNG, and if I really need to minimize a file, run it through pngout independently, and compare the results.