Friday, December 28, 2012

Federal prisoners use snitching for personal gain
Snitching has become so commonplace that in the past five years at least 48,895 federal convicts -- one of every eight -- had their prison sentences reduced in exchange for helping government investigators, a USA TODAY examination of hundreds of thousands of court cases found. The deals can chop a decade or more off of their sentences.

How often informants pay to acquire information from brokers such as Watkins is impossible to know, in part because judges routinely seal court records that could identify them. It almost certainly represents an extreme result of a system that puts strong pressure on defendants to cooperate. Still, Watkins' case is at least the fourth such scheme to be uncovered in Atlanta alone over the past 20 years.

Those schemes are generally illegal because the people who buy information usually lie to federal agents about where they got it. They also show how staggeringly valuable good information has become --­ prices ran into tens of thousands of dollars, or up to $250,000 in one case, court records show.

Friday, December 21, 2012

Great Minds

I recently read Nate Silver's newest book, The Signal and the Noise: Why Most Predictions Fail – But Some Don't, which was good if you're wondering.  Chapter 11 is about the stock market, and he points out that there is some correlation between one day's rise or fall and the next's.  He then calculates that one could make quite a bit of money investing based on this fact, but that fees would kill it.  Finally, he notes that the correlation has disappeared recently, making the whole plan worthless.

Assuming you've memorized all my posts, you should now be realizing I made a post about the same thing.  His book came out September 27, and my post was on October 13, so it would seem pretty obvious that I read the book and got the idea from there (even though I didn't).  I'm surprised no one pointed this out in the comments.  I can only assume that this is due to a lack of readership of his book.

In another case, I recently discovered this post on the expected value of a Mega Millions ticket.  He calculates the expected value of the lesser prizes, calculates the amount of the jackpot one would actually get after taxes, and then uses the Poisson Distribution to calculate the expected value of the jackpot.  He then fits a polynomial model to the past data in order to predict what various jackpot's values will be.  In other words, exactly what I did.

The key difference is he didn't spread his through 3 long posts, filled with math, and no visual breaks in the wall of text.  Also, he didn't include an unnecessary free lesson on the guts of Linear Algebra. I'll leave it to the masses to decide which approach is better.

I made my first post on the subject in April 2012, and he made his in January 2011.  So it's debatable who copied who.

Time-machine-assisted plagiarism aside, it is interesting to see someone else tackle the same problem, and do it largely the same way, and produce very similar results.  One example of a difference is he calculated an expected value from the non jackpot prizes of $0.10, whereas I calculated $0.15.  The difference is he applied taxes to all the prizes, whereas I exempted the $150 and below prizes from taxes.  It is interesting how much of a difference that makes.

Friday, November 30, 2012

North Korea Says Its Archaeologists Discovered A 'Unicorn Lair'
"The lair is located 200 meters from the Yongmyong Temple in Moran Hill in Pyongyang City. A rectangular rock carved with words "Unicorn Lair" stands in front of the lair. The carved words are believed to date back to the period of Koryo Kingdom (918-1392).

"Jo Hui Sung, director of the Institute, told KCNA:

"'Korea's history books deal with the unicorn, considered to be ridden by King Tongmyong, and its lair."

Thursday, November 15, 2012

Why Donald Rumsfeld Can’t Be Sued for Torture
The facts are a case study in system failure. Donald Vance and Nathan Ertel were Americans working for a private security firm in Iraq. When Vance became suspicious that his employer was selling weapons to groups hostile to the United States, he went to the FBI. Vance and Ertel were then fingered as arms dealers. Military personnel arrested them in 2006 and held them for several weeks.

According to the complaint, Vance and Ertel were held in solitary confinement and subjected to violence, sleep deprivation, extremes of temperature and sound, denial of food, water, and medical care, and other abuses. Though the Army Field Manual (and four judges) calls this torture, the majority opinion prefers the euphemism “harsh interrogation techniques.”
Vance was a 29-year-old Navy veteran from Chicago when he went to Iraq as a security contractor. Vance became an unpaid informant for the F.B.I., passing them evidence that seemed to suggest that the Iraqi security firm at which he worked might be engaged in illegal weapons trading, particularly to officials from the Iraqi Interior Ministry.

 However, when American soldiers raided the firm, he was treated as a suspect. Another American who worked for the company, but had resigned over the alleged weapons trading, was also detained. Vance was held for three months at Camp Cropper, America’s maximum security prison site in Baghdad.

Tuesday, November 13, 2012

The Growth Of Monopoly Power
Percentage of Sales for Four Largest Firms in Selected U.S. Retail Industries:
Industry (NAICS code)  1992    1997    2002    2007
Food & beverage stores (445)  15.4    18.3    28.2    27.7
Health & personal care stores (446)  24.7    39.1    45.7    54.4
General merchandise stores (452)  47.3    55.9    65.6    73.2
Supermarkets (44511)  18.0    20.8    32.5    32.0
Book stores (451211)  41.3    54.1    65.6    71.0
Computer & software stores (443120)  26.2    34.9    52.5    73.1

Get ready for the last site you'll need for the rest of your life.

I love every one of these.  I want to post each one of these individually.

Saturday, November 10, 2012

Odds of jumping into a star

In an attempt to be more like Dwight Schrute I just finished watching the new Battlestar Galactica.  Overall, it was a good show.  I did, however, have a problem with it.  Its particular answer to faster than light (FTL) travel was a jump drive.  The practical effect was that coordinates could be entered and then the ship would be more or less instantly transported to those coordinates, without covering the area between.  The coordinates were relative to the current location, so they had to be calculated for each jump at the time.  At several points during the series, in order to escape an enemy, an unexpected jump was needed immediately.  Each time a jump to anywhere was ordered, and the response was that it could land them into a star.  The jump was performed anyway and they beat the odds.

The problem is space is almost entirely empty space (hence the name).  I knew the odds of actually jumping into something would be remote, and decided to calculate them.

Before I start, a quick note: I seem to recall a similar objection in Star Wars.  However, in Star Wars FTL travel was via hyperdrive, which did seem to cover the area between the start and end point.  This would greatly increase the odds of actually hitting something.  I'm assuming the jumps in BSG don't cover the area in between, which I feel is justified by the name, dialogue, and jumps that were made surrounded by matter.

Volume of the Milky Way:
The series is purposly coy about if it involves our Earth, but without spoilers it's safe to say it takes place in a galaxy very similar to the Milky Way.  Wikipedia tells me that the Milky Way is a disk 100,000 light years across, and about 1000 light years thick on average.
`V = \pi \cdot r^2 \cdot h`
`V = \pi \cdot (50,000" ly") ^2 \cdot 1000" ly" = 7,854,000,000" ly"^3`

Volume of stars:
Estimates for stars in the Milky Way are 300-400 billion.  I'll round that up to 500 billion.  I'll also assume stars are Sun like.  While there is a lot of stars larger and smaller than the Sun, it's an alright average.
`V = 4/3 \cdot \pi \cdot r^3 \cdot n`
`V = 4/3 \cdot \pi \cdot (1.47 \times 10^-7" ly")^3 \cdot 500 \times 10^9 = 6.69 \times 10^-9" ly"^3`

As you can see, the stars occupy a very tiny volume of the Milky Way.  The odds of a random point being inside a star are about 1 in 1,174,000,000,000,000,000.

Larger stars:
Some of you may be protesting that there is other matter in the galaxy besides stars.  However, this stuff is as rare relative to stars as stars are relative to the galaxy as a whole.  Still, it could be argued that jumping in close to a star would be a problem.  So let's increase the size of the average star we are using to the size of the orbit of Mercury.  Jumping in at the distance of Mercury from the Sun shouldn't pose a threat for a ship in BSG.  This also more than allows for all the other non-star matter.

`V = 4/3 \cdot \pi \cdot (7.38 \times 10^-6" ly")^3 \cdot 500 \times 10^9 = 8.42 \times 10^-4" ly"^3`

Still a pretty small chance at about 1 in 9,330,000,000,000.

Entire star systems:
Let's go ahead and say that the entire star system is off limits to a jump.  Pluto orbits at a max of 50 AU.  The Voyager probes are just shy of 100 AU and are currently at the heliopause, considered the edge of the solar system.  Using 100 AU radius spheres gives:
`V = 4/3 \cdot \pi \cdot (1.59 \times 10^-3" ly")^3 \cdot 500 \times 10^9 = 8348" ly"^3`

 Which gives odds of about 1 in 940,876.

So there is only about a 1 in a million chance of jumping into a star system, which itself is almost entirely empty.

Tuesday, October 16, 2012

Alpha Centauri has a planet

In this case, the planet is low mass but very close in. The Doppler shift in the starlight amounts to a mere half meter per second – slower than walking speed! When I read that I was stunned; that low of a signal is incredibly hard to detect. Heck, the star’s rotation is three times that big. But looking at the paper, it’s pretty convincing. They did a fantastic job teasing that out of the noise. 

The graph displayed shows the effect of the planet on the star. RV means "radial velocity", the speed toward and away from us as the star gets tugged by the planet. The x-axis is time, measured in units of the period of the planet (in other words, where it reads as 1 that means 3.24 days). The dots look like they’re just scattered around, but when you average them together – say, taking all the dots in a one hour time period – you get the red dots shown (the vertical lines are the error bars). The signal then pops right out, and you can see the tell-tale sine wave of a planet pulling its star.

This is now my only impression of British politics

Thursday, October 11, 2012

Get Rich Quick

You may remember my work with the #1 best selling NES game of all time, Wall Street Kid.  In it, I found that the best strategy was to simply buy which ever stock did best on any given day.  As I watch actual stocks go up and down every day I had an idea.  It seems that the market tends to go in the same direction for a whole week, either up or down.  If you bought an S&P 500 index fund and then sold it on any day when the market closed down, rebuying it on any day the market closed up, you would tend to beat the market.

I feel obliged to point out that this sort of thinking is a cardinal sin of probability theory, ie, thinking that if an independent event occurs in a string that it will affect the outcome of the next event.  For example, thinking that if the roulette wheel has come up red 5 times in a row that it is due to be black (or that it'll continue being red for that matter) is wrong.  Its probability is 50/50 no matter what previous outcomes were.

That being said, the stock market is not an independent random event.  All it would take for this to be a valid method is for there to be some positive correlation between how the market does today and how it did yesterday.  Surely, that is not unreasonable.

I should also point out that there are trading fees which would create an overhead that would definitely negate any gain.  But spherical cows care not for these things.

I've often needed the historical stock market prices, and Yahoo Finance is the easiest place to get it.  I thought I was going to have to write a quick program, but I did all the work in LibreOffice.

In case the process is not clear, here is the strategy.  Buy $1 worth of S&P 500 in 1950.  If the S&P 500 closes down for the day sell at opening the next day.  If the S&P 500 closes up for the day, rebuy if you had previously sold (and haven't rebought yet).

In practice, this has the effect of skipping any day that follows a negative day.  The theory being that negative days will tend to follow other negatives days more often than positive days.

I didn't really expect this to work.  While the premise sounds plausible, the idea of an efficient market is that, if some common piece of information can be employed to beat the market, that this info will be exploited and the market will adjust such that it is no longer profitable.  For example, if we had a harsh winter, and you expect this to hurt orange farms, which will in turn raise prices of frozen concentrate orange juice futures, you might think you could buy them now safe in the knowledge that they will go up in price.  But, so does everybody else.  The net result is that the market adjusts the price to whatever level the best available knowledge would predict that it will be at in the future.  Now, if there were some sort of report that suddenly revealed the crop conditions and you got your hands on that, well that would change things.

Anyway, on to the graphs:
In this graph we see that this method actually dramatically outperforms the market.  After 62 years our $1 in the S&P 500 is worth about $90 (7.5% annualized), certainly good.  However, had we been using the proposed method we'd have about $1500 (12.5% annualized), and would have had $5500 a decade ago at the markets peak (18.8% annualized).

Here we see the ratio of the proposed method / normal S&P 500, graphed on the right axis (which is identical to the left here).   I put the S&P on the left axis for comparison.

This is largely the same as the previous graph but now the proposed method is graphed on the left axis instead of the S&P 500.  Note the axes are different here.

This is perhaps the most insightful graph, but also the most confusing.  It is very important to note that the proposed method is graphed on the right axis from 0 to 6000, while the S&P 500 is graphed on the left from 0 to 100.  This means that around the year 1998 where they seem to both be about equal in the middle of the graph, the S&P 500 is worth only $50, while the proposed method is worth $3000.

What this shows is that while the alternate method worked very well, it did not gain on the two recent spikes, but still lost on the down slopes of them.  This can be explained, somewhat obviously in retrospect, by the fact that the market has been very volatile in the last decade, and that eliminated the correlation. 

When I first did this I started in 1990.  The result was about a 400% gain for the S&P 500 and a mild loss for the alternate method.  This seemed so surprising to me that I decided to look further back.  If you look back at the graphs with the ratio plotted in green, any time that was increasing this was a good strategy, and any time it was decreasing this was a poor strategy.  So, from 1950 to about 1970 it did well, in the 70s it did extremely well, from 1980 to 2000 it was neutral, and in the 2000s it did very poorly.  There is a glimmer of hope, in that it has been neutral for the last few years.

Fell free to use this strategy and just send me 10% of your net profits.

Tuesday, October 9, 2012

The CIA Burglar Who Went Rogue
The CIA station in Katmandu arranged for an official ceremony to be held more than an hour away from the capital and for all foreign diplomats to be invited. The agency knew the East Germans could not refuse to attend. That would leave Groat’s team about three hours to work. Posing as tourists, they arrived in Katmandu two days before the mission and slipped into a safe house. On the appointed day, they left the safe house wearing disguises crafted by a CIA specialist—whole-face latex masks that transformed them into Nepalese, with darker skin and jet-black hair. At the embassy, Groat popped the front door open with a small pry bar. Inside, the intruders peeled off their stifling masks and with a bolt-cutter removed a padlock barring the way to the embassy's security area. Once in the code room, Groat and two teammates strained to lift the safe from the floorboards and wrestled it down the stairs and out to a waiting van.

They drove the safe to the American Embassy, where it was opened—and found to contain no code machine. Based on faulty intelligence, the CIA had sent its break-in team on a Himalayan goose chase.

First CityWide Change Bank

I just cited this video in a economics paper I'm writing:

Protip to fellow students:  Referencing 25 year old parody commercials is how you wow your professors.

Friday, September 14, 2012

Confirmation Bias and the iPhone 5
You might expect this from people who don’t have much knowledge of iPhones; they don’t have a clear basis for comparison, so whatever features seem neat, they assume are new. But even people holding their own iPhone 4 up for direct comparison perceive the “iPhone 5″ Kimmel hands them to be superior, noting a range of details — it’s lighter, faster, just clearly better. They think a new version of a gadget must be way more awesome than the previous version, and Apple has an aura of coolness that leads people to expect their new products should be extra amazing. Since people expect a new iPhone to be awesome, they notice, or invent, features that confirm that it is, indeed, awesome.

Wednesday, September 12, 2012

How Apple and Amazon Security Flaws Led to My Epic Hacking
After coming across my [Twitter] account, the hackers did some background research. My Twitter account linked to my personal website, where they found my Gmail address. Guessing that this was also the e-mail address I used for Twitter, Phobia went to Google’s account recovery page. He didn’t even have to actually attempt a recovery. This was just a recon mission.

Because I didn’t have Google’s two-factor authentication turned on, when Phobia entered my Gmail address, he could view the alternate e-mail I had set up for account recovery. Google partially obscures that information, starring out many characters, but there were enough characters available, m•••• Jackpot.

Since he already had the e-mail, all he needed was my billing address and the last four digits of my credit card number to have Apple’s tech support issue him the keys to my account.
So how did he get this vital information? He began with the easy one. He got the billing address by doing a whois search on my personal web domain. If someone doesn’t have a domain, you can also look up his or her information on Spokeo, WhitePages, and PeopleSmart.

Getting a credit card number is tricker, but it also relies on taking advantage of a company’s back-end systems. … First you call Amazon and tell them you are the account holder, and want to add a credit card number to the account. All you need is the name on the account, an associated e-mail address, and the billing address. Amazon then allows you to input a new credit card. (Wired used a bogus credit card number from a website that generates fake card numbers that conform with the industry’s published self-check algorithm.) Then you hang up.

Next you call back, and tell Amazon that you’ve lost access to your account. Upon providing a name, billing address, and the new credit card number you gave the company on the prior call, Amazon will allow you to add a new e-mail address to the account. From here, you go to the Amazon website, and send a password reset to the new e-mail account. This allows you to see all the credit cards on file for the account — not the complete numbers, just the last four digits. But, as we know, Apple only needs those last four digits.

Sunday, September 2, 2012

Student Gulps Into Medical Literature
Mazur tells the story: "As tradition dictates, we made our own ice cream, using liquid nitrogen as a refrigerant and aerator. We spilled a little of the nitrogen onto a table and watched tiny little drops of it dance around."

Someone asked, "Why does it do that?" Mazur explained that the nitrogen evaporated when it came in contact with the table, which provided a cushion of air for the drop to sit on, and thermally insulated it to minimize further evaporation-enabling it to do its little dance without scarring the table, boiling away or being "smeared" out. "It's this principle," he said, "that makes it possible for someone to dip his wet hand into molten lead or to put liquid nitrogen in his mouth without injury."

Mazur had worked with the chemical in a cryogenics lab several years before and believed in the principle. To prove it to the doubting ice cream socializers, he poured some into a glass and into his mouth-fully expecting to impress the crowd by blowing smoke rings. But then he swallowed the liquid nitrogen. "Within two seconds I had collapsed on the floor, unable to breathe or feel anything other than intense pain."

Monday, August 27, 2012

Running Greasemonkey functions from page events

As you know, I wrote a simple Greasemonkey script to hide multiple results from one domain in Google searches.  This script worked very well, and was easy to set up, but I had originally wanted to have the ability to click to show the hidden results.  Doing this in javascript is pretty simple, but Greasemonkey runs all scripts in a sandbox which cannot be accessed from the page.  This is somehow related to security, which I won't pretend to understand.

I was pretty content with how the script was, but someone on the script's page requested the click to show feature, and I figured I would give it a try.

Greasemonkey has something called the unsafewindow for stuff that must break out of the sandbox, and I got click to show working pretty fast using it.  The problem is that everyone has a dogmatic animosity towards using it.  So, I decided to try to figure out one of the alternatives

The most common suggestion was to use the location hack, which just puts the function in a bookmarklet and links to it.  If you don't know, you can run javascript on a page by just pasting it into your url bar.  Prior to Greasemonkey, I used a lot of these via bookmarks called bookmarklets.  Anyway, this is all relatively simple, but there is no actual example on the wiki.  I eventually just had to try something out that I thought would work rather than being spoon fed someone else's example.  Since the internet is no place for having to figure things out yourself I'll post the actual code I used.

Since the bookmarklet can be quite long I stored it in a variable first.  Since javascript ignores whitespace like a sensible language I broke it up over multiple lines.  The actual code is just wrapped in a self invoking function for reasons I still don't really understand.

showcode = "javascript:(function() {"+
"var results = document.getElementsByClassName('"+prevurl+"');"+
"for (i=0; i "document.getElementById('show"+prevurl+"').style.display = 'none';"+
"; } )()";

Pasting that into something with syntax highlighting will show that prevurl is a variable, and outside the double quotes.  Everything else is inside.  This function is then called by just making the variable the destination of a onclick event.  If you want to see more the source is on the script's page.

With this, my script is, at the risk of tooting my own horn, quite good.  It is probably the only thing I've ever made that doesn't look like it was made by a 5 year old.

Wednesday, August 22, 2012

How Hollywood Is Encouraging Online Piracy
And if you don't make your product available legally, guess what? The people will get it illegally. Traffic to illegal download sites has more than sextupled since 2009, and file downloading is expected to grow about 23 percent annually until 2015. Why? Of the 10 most pirated movies of 2011, guess how many of them are available to rent online, as I write this in midsummer 2012? Zero. That's right: Hollywood is actually encouraging the very practice they claim to be fighting (with new laws, for example).

Monday, August 20, 2012

Hiker Leaves Dog on Mountain, Charged With Animal Cruelty
When Ortolani and his friend finally made it safely off the mountain and called 911, however, they discovered that Mt. Bierstadt rangers are not allowed to send rescues for dogs.

Missy wasn't seen again until two hikers discovered her six days later. They, too, were unable to rescue her, but they posted a picture of the dog on a popular hiking website, which eventually prompted an eight-person volunteer crew to make a successful rescue -- despite a snowstorm.

The incident has sparked an outpouring of opinions. Ortolani goes to trial in October.
This is certainly interesting.  I suppose Captain Hindsight says you probably shouldn't bring a dog or person on a hike they can't physically do themselves. 

The Checkpoint: Terror, Power, and Cruelty
Threats. All day and all night he generates threats: he threatens people in order to extinguish their will. He does not know what their wills actually consist in. He can feel, by now, that part of them wishes he were dead. “If you didn’t have [your weapon], and if your fellow soldiers weren’t beside you, they would jump on you,” one soldier testifies, “beat the shit out of you, and stab you to death.” He imagines them charging his checkpoint by the hundreds: carpenters, doctors, teachers, farmers, mothers, uncles, children, grandparents, and lovers. How could so many people, people who look him in the eye every day, want him dead? How do the Palestinians see him? He does not recognize his own gaze reflected back at him from the windows of the cars he inspects or confiscates. He is no longer his own person.

Now that guilt is impossible, the soldier realizes that part of him is dying. The soldier starts to think that he is the real victim in all this, especially since no one understands that he is bound to fail, that his power makes him helpless. No one knows that since arriving here he has not made a single choice. Sure, the Palestinians are helpless too, but it is easy to see that they are victims, he tells himself. He, the soldier, is a powerful nobody—that is his tragedy.

Anger accumulates. Palestinians come up to him, one after the other, all day long, begging him to let them pass. Telling him they need to get to their schools, universities, hospitals, jobs; they need food; they want to see their children, their parents; they need to get to their funerals and weddings, to give birth. But how the hell should he know? Why do they think he has any clue as to whether they can pass through his checkpoint? He cannot tell the difference between them; they all act the same, the way terrified people act.

Sunday, August 19, 2012

Configuring Linux or: How I Learned to Stop Worrying and Love the Command Line

The 7 year old HDD in my laptop died.  This gave me an excuse to switch from Windows XP to Linux.  Since there was a lot of configuration to set things up I figured I'd document my trials and tribulations here for future reference.  Note, this was written over about two weeks as I set things up.  It sometimes reflects this, but I tried to edit it to make sense overall.

Years ago, I had previously used Slackware with Fluxbox.  So, I installed Slackware 13.37; the install was exactly the same as it had been 10 years ago.  The network didn't work, and as I didn't really have another computer for troubleshooting I really didn't feel like dealing with Slackware.  I have been wanting to give Ubuntu a try,  and so I had also downloaded the newest Ubuntu, and I installed that.

Ubuntu used Gnome as its default desktop environment, although it seems to have switched to Unity, which I think is still Gnome based.  I've never liked Gnome, and now both it and KDE have gone out of their way to be unusable.  Luckily, there is a version of Ubuntu with lightweight xfce called Xubuntu.  This didn't matter much as I planned on using the even more lightweight Fluxbox again.

It helps if you are familiar with the general structure of files in Linux.  Here is a good overview:

Installing Programs
When I used Slackware I used to install everything by compiling source, however, Ubuntu is well known for its friendly package manager and I made extensive use of it.  It's nice because if you enter the name you think a program will have on the terminal it will tell you if it's installed.  If it's not it will suggest similar names you may have meant.  If there is a program called that, but it's just not installed it will tell you the command to run to install it: sudo apt-get install program-name, which can then be selected and middle clicked to install right away.

Extra Programs
I had to install a lot of random utilities to do everything here.  I won't bother mentioning them all; if you try something and it complains that something isn't installed, then just install it with apt-get.  Rather, I will list the programs that I actually use directly that I had to find.

Geany is a text editor that I found as a replacement for Notepad++.  It's ironic because when I first moved from Linux back to Windows one of my complaints was that I couldn't live without syntax highlighting provided by whatever text editor I had been using.  I found Notepad++ as a suitable replacement on Windows and grew to love it.  Now, moving back to Linux I don't know how I'll live without all the features of Notepad++.  I'm liking Geany more as I use it.  I'd say it may even be better than Notepad++.

For bit torrent, I went with qbittorrent. By default I had Transmission installed, which seemed good for people that didn't like bit torrent (I promise this opinion isn't biased by the fact that I associate Transmission with Macs).  Anyway, qbittorrent is an open source clone of uTorrent, which is good because every since uTorrent was sold it was becoming increasingly annoying.

I found gmusicbrowser as a replacement for Foobar2000.  I don't have many requirements in a music player, just a simple interface that can handle a lot of files.  This wasn't hard to find, however, I couldn't find anything that simply had a random song feature.  I should point out they all had shuffle, which randomizes the order of songs and plays them back.  What I wanted was a button that when pressed would play a random song, but then playback would proceed in the normal order.  I like to have that command mapped to one of my mouse buttons.

The closest I could find was a random album button, and to get this I had to manually edit the skin file to add the button to the system tray popup.  As far as I can tell there is no command for it, so I have to use the button in the system tray.

Note this is no longer true.  I'm using mpd, and wrote a section about it below.

Shell Scripts
Linux shell scripts are just a way to run commands that would normally be entered on the terminal via a text file.  This is similar to batch files in Windows.  The difference is that since everything in Linux is built on the command line from the ground up, they're much more powerful, and sensible to figure out.  Just put commands in a text file and then run it with the command sh file.  For more complex stuff I'll just use Perl.  The output of a script can be redirected to a text file with sh file > text.txt.  I used several shell scripts where I needed to combine several commands into one.

Fluxbox Menu
Fluxbox is pretty simple.  There are a handful of config files that control just about everything.  These files are in a folder called ~/.fluxbox.  One is called menu and it contains the Fluxbox menu that comes up when you right click the desktop (similar to the Windows start menu).  This menu is pretty easy to edit, here are some good guides.  There is one important note though.  It doesn't update unless you update it yourself.  There's a program called MenuMaker that will scan the system for programs and build a menu.  You can run this with mmaker fluxbox -f, but if you want to have a custom menu you'll need to combine its output with your custom menu.  I made a simple shell script to do this:
echo "[begin] (Dale)"
mmaker fluxbox -c -i
cat custommenu.txt
echo "[end]"

This will just dump the menu to the screen.  Redirect it to the menu file with:  sh createmenu > .fluxbox/menu.  Just put the custom part of your menu in the text file, and you're set.  This is the custom menu I used, which includes easy access to several of the commonly edited Fluxbox config files:

[submenu] (Mine)
[exec] (Thunar File Manager) {Thunar}
[exec] (FireFox) {firefox}
[exec] (gmpc) {gmpc}
[exec] (Pidgin Internet Messenger) {pidgin}
[exec] (qBittorrent) {qbittorrent}
[exec] (Calculator) {gcalctool}
[exec] (Geany) {geany}
[exec] (Leafpad) {leafpad}
[submenu] (FluxBox)
[workspaces] (Workspaces)
[submenu] (Styles)
[stylesdir] (/usr/share/fluxbox/styles)
[stylesdir] (~/.fluxbox/styles)
[submenu] (Config Files)
[exec] (startup) {geany .fluxbox/startup}
[exec] (menu) {geany .fluxbox/menu}
[exec] (keys) {geany .fluxbox/keys}
[exec] (layout) {geany .fluxbox/init}
[config] (Configure)
[reconfig] (Reconfig)
[restart] (Restart)
[exit] (Exit)

Music Player Daemon
I eventually got tired of not having a random song button.  I asked on stackexchange about a music player that had a random song command.  The answers seemed to indicate that I was not explaining the concept of what I wanted to do very well. Reguardless, the common suggestion was to use mpd, which is a background command line music player, designed to be controlled by some other frontend.  Note, mpd still doesn't have an actual play random song command, but I made a shell script to emulate that functionality.

To begin, I had trouble getting mpd to work at all.  First it wouldn't connect to my gvfs music shares, and then it didn't produce audio.  I ended up following this guide after which, despite being the same stuff I was doing, it worked.  I'm sorry I can't give more details on what I actually did to fix my problems for anyone following this (ie future me).

After that, I had to figure out what client to use.  mpc is a command line client that just issues commands.  I also had Gnome Music Player Client installed, and it was suggested elsewhere, so I went with that.  I must say, I must listen to music in a very different way from every other person on Earth, as I always find default music player's 'libraries' to be very unnecessarily complicated.  I just want all my music in one giant list which will play back in order.  Anyway, gmpc worked ok once I figured out how to set it up.

Another problem I had was that mpd purposely only supports one directory.  If you have music in more than one directory and wish to add it all you are expected to set up symlinks in the one directory to the others.  This is all well and good, but as I mentioned before all my music is on Windows Samba shares mounted via gvfs.  Despite having write access in Linux to them, I couldn't set up symlinks in one to the other.  After some experimenting to find the actual problem I found I could create symlinks to them, just not on them.  So the solution was to just set up symlinks in my home directory to the various shares.  This worked well, as it simplified the 20 levels deep nature of the shares at the same time.

All this and mpd still doesn't have an actual random command.  It did have a random mode though.  I got the functionality I wanted with this shell script:
mpc random
mpc next
mpc random

This turns on random mode, skips to the next (now random) song, and then turns off random mode.  Since mpd is very lightweight this is fast enough for me.

The Mouse and Media Buttons
I have a Logitech Performance Mouse MX, which I like very much.  Unfortunately, Logitech doesn't support Linux at all.  This meant I had to configure all the extra buttons myself.  I found these two commands gave a good speed (which I like to be very fast):
xinput set-prop 9 "Device Accel Velocity Scaling" 30
xinput set-prop 9 "Device Accel Profile" 2

As for the mouse buttons, I had to first find what the buttons were called.  There's a simple program called 'xev' which displays the names of buttons that are pressed.  After I found the names I added these lines to the Fluxbox config file keys:
Mouse9 :Exec sh random
Mouse7 :Exec sh tabright
Mouse6 :Exec sh tableft
Mouse13 :Exec xte 'key F5'
Mouse8 :Exec xte 'keydown Control_L' 'key W' 'keyup Control_L'

Note the shell scripts.  I did this because I use the mouse wheel left and right to switch tabs (Ctrl+page down/up).  Since the sideways motion is not very good it would send a lot of key presses all at once and was unusable.  I fixed this by adding some delays via this shell script:
xte 'keydown Control_L'
sleepenh 0.1
xte 'key Page_Up'
sleepenh 0.1
xte 'keyup Control_L'

I had to install sleepenh to get sleep times < 1 second.

Next, to get the media buttons (play, next, mute, etc) on my laptop to work I added these lines to the keys file:
None XF86AudioLowerVolume :Exec amixer -q set Master 5%- unmute
None XF86AudioRaiseVolume :Exec amixer -q set Master 5%+ unmute
None XF86AudioMute :Exec sh togglemute

171 :Exec mpc next
172 :Exec mpc toggle
173 :Exec mpc prev
174 :Exec mpc stop

A note about mute.  There appears to be a bug in amixer where the mute toggle mutes three channels, but only unmutes one.  This means you have to go in and manually unmute in the gui.  I wasted a good hour trying to figure out what was going on with that.  I had to write a shell script to mute and unmute all the channels.  Here it is:
if amixer sget Master | grep "\[on\]"
then sh muteall
else sh unmuteall

and muteall is just:
amixer set Master mute
amixer set "Master Mono" mute
amixer set PCM mute

Startup File
One of those Fluxbox config files is the startup file.  As the name implies it runs any commands in it when Fluxbox starts.  It's important to note the command exec fluxbox is what runs Fluxbox itself.  Any commands after that won't run until Fluxbox exits.  Also, any command that will continue to run needs to have an ampersand placed after it.  Here are the lines I added to my startup file:
xinput set-prop 9 "Device Accel Velocity Scaling" 30
xinput set-prop 9 "Device Accel Profile" 2
xrandr -s 1440x900 -r 60
fbsetbg -c -r ~/.fluxbox/backgrounds/
sh createmenu > .fluxbox/menu
xfce4-power-manager &
gvfs-mount smb://Server-PC/Media &
python ~/gmail-notify/ &

I already explained some.  xrandr sets the resolution.  fbsetbg sets the wallpaper to a random one.  The xfce4-power-manger gives popup alerts when the laptop switches to battery power.  gvfs mounts windows samba shares.  And, gmail notify is a Linux equivalent to the Windows gmail notifier by Google.

I made a bunch of 1440x900 wallpapers from various Hubble shots, and posted them here:

Fluxbox Theme
Fluxbox themes are simple text files.  I downloaded a few, but really didn't like most.  I finally settled on the included 'squared blue' but modified it somewhat.  I'll include my custom theme in the zip of all my custom files at the end of this post.

Cronjobs are the Linux way of running a script or program automatically at set times, however, they are much more powerful than the Windows equivalent.  The 'crontab' stores the various cronjobs set to run.  It isn't stored as a simple file on the computer (while the system is running).  To edit it enter the command crontab -e (-l will list the current one).  This will allow you to edit it in a terminal based text editor, a concept I don't think I will ever embrace.

The format is somewhat interesting.  It is in the form of: Mi H D Mo DOW command.  Or: minute (0-59), hour (0-23, 0 = midnight), day (1-31), month (1-12), weekday (0-6, 0 = Sunday).  For example: 38 6 4 8 0 command would run at 6:38 am on Sunday Aug 4th.  Much greater flexibility comes from wildcards (*), multiple entries (,) and multiples (/):
12 4 * * * command - Every day at 4:12 am
12 4,8 * * * command - Every day at 4:12 am and 8:12 am
*/10 * * * * command - Every minute that is evenly divisible by 10 (ie every 10 minutes).

Many more examples and a more thorough explanation are in this guide.

I wrote a simple backup script that currently consists of just a copy command, but I will likely expand it.  I set it up to run every hour with 38 * * * * nice -n19 ionice -c3 sh ~/backup.  Note the ionice.  In Linux, nice is program which sets priority.  The idea being that higher priority processes will get CPU and memory before lower priority ones will.  In theory, this means that if a process is set very low it won't have any impact on the overall speed no matter how much CPU it eats up.

For things being run as cronjobs it is likely that you just want them to run with whatever idle resources there are, and to not get in your way.  You can do this by making the command in your crontab that calls the script have a lower nice level (note -19 is highest priority, 19 is lowest, 0 is default, so kind of backwards [also it seems to be perhaps 20, at least sometimes]).

This is all well and good, but a backup script probably won't eat up too much CPU.  What it will eat up is HDD I/O.  That is where ionice come in to play.  As you've probably already guessed ionice is a similar concept but just for I/O priority.  In this case, class 3 is the lowest priority, and what I set the above backup script to.

Thunar Custom Actions
Thunar is the default file manager for xfce.  It has something called custom actions which give you the ability to run commands on files from a right click menu.  This has proven to be a very useful feature, which I've used to replace (and expand upon) the features of file menu tools in Windows.

They are stored in an xml file at ~/.config/Thunar/uca.xml.  I've included that file in the zip of my custom text files at the end of this post.

Custom Text Files
As promised I've compiled a bunch of the text files I edited and uploaded them here:

Saturday, August 18, 2012

JavaScript For Cats

I'm pretty much posting this for the picture at the end.

Thursday, August 16, 2012

Hiding multiple results from the same domain in google search

A few months ago Google changed their search result behavior.  Before, it would display one or two results from a single domain, and then give a link for more.  After the change, it simply displayed all of them.  This was made much worse by setting your results per page above the default 10.  This was ridiculously annoying, to the point where I'm inclined to believe the conspiracy theories I read that Google did it to discourage people from setting the results higher than default and thus seeing less ads.  I simply can't believe that Google thinks this is a legitimate improvement.

At this point, I should point out that all this really only applied if you turned off that crazy automatic search thing, and set your results per page to something high.  I had pretty much assumed that everyone on Earth did this, but the fact that there wasn't a solution online after months tends to suggest that they don't.

I asked about this on stackexchange, but was surprised to get no solution.  Although the fact that the question got a few votes suggested that other people were annoyed by this as well.

So, I decided to learn some more javascript by making a greasemonkey script to fix the results.   I was almost deterred by the lack of formatting in the Google search results page, but a quick online formatter made it workable.  Then, I had to figure out more of that javascript dot notation that I hate.  All in all though, it went well.  I wrote a basic script and uploaded it to userscripts.

As of now, it just looks at each result and if it's the same as the previous result it hides it.  This means you will get less results displayed on a page than whatever you have it set to.  I wrote a second version that counts how many duplicate domains there are in a row and hides them if there are more than a set limit, but I kind of like the simplicity of the current version.  I will try to use it for a while and see what improvements I could make, as well as if anyone actually uses it.

How to (unsucessfully) reflow the solder on a motherboard

The HDD in my 7 year old laptop died recently.  Todd was nice enough to give me 2 old laptops he had laying around, one of which was much nicer than the one I had been using.  The catch was that neither of them worked.  I managed to salvage the HDD from the one, and put it in my laptop, which is now going strong for another 7 years.

The other laptop was a HP Pavilion dv6000.  When I turned it on the lights came on, but nothing else, no beeps or screen output, then it began to cycle endlessly.  It turns out this is a common problem with that model, due to faulty solders.  There had been much success online with people fixing this by heating the motherboard up hot enough to melt the solder and let it reflow.  As I had my other laptop working, and nothing to lose I decided to try it.

My first step was to disassemble the laptop gradually over a few days without following any guide or taking any pictures.  While taking it apart I kept noting how there was no way I would remember how to reassemble this thing, but I never let that distract me.  Once I had the motherboard removed, I had to remove everything that could possibly be removed.  Of particular importance were the motherboard battery and paper stickers.

Eventually, I had the bare board.  I decided to use my toaster oven as my regular oven is gas, and there seemed to be concern about airflow moving tiny components around while the solder was liquid.  I also read it was a good idea to cover most of the board in tin foil and only expose the parts you want to reflow.  I didn't do this, because there was no tin foil in the first place I looked.  One helpful tip I found online was to put a piece of solder on one of the exposed screw-in pads, and place that where you can watch it.  This serves as a useful temperature monitor.  I put a piece on each forward corner.

I placed it in the toaster oven and set it to toast at 400F.  It only took about 5 minutes for the one piece of solder to melt, however, the other piece remained solid for a minute longer.  I let it sit molten for about 5 minutes, and then shut it off.  Right after I shut it off I heard a lot of crackling sounds.  I suspect this might have been due to the board cooling very fast in the toaster oven, as opposed to a real oven that would have cooled much more gradually.

I let it sit for 30 minutes and then opened the door and let it sit for about 10 more.  I removed it and nothing seemed like it had burst into flames or exploded.  When I lifted the board up I noticed some solder on the pan, in a splatter like pattern.  I also noticed a tiny component that had fallen off.  I tried to match up where the part was and what part of the motherboard had been above it.  I spent about 5 minutes trying to find a spot it could have came from, but to no avail, not that there was any hope I was going to resolder this flea sized thing back on even if I did know where it came from.

Still, I figured it was possible that it would still work without it.  After all, the tinier something is the less important it is, right?

I reassembled the laptop only as far as I figured it would need to be to get any sort of screen output.  I put in the CPU and RAM and connected the power and power button connector.  I pluged it in and turned it on... and... it did the exact same thing as before.  I was somewhat surprised, with that missing part, but then again I guess if the solders were already bad then a missing part was pretty much the same thing.  I guess I should have flashed the bios instead.

I didn't really care, as I had already gotten my laptop working and spent days researching Linux and setting everything up.  But here are some protips for anyone else that may be trying this and actually wants to succeed:
  • I think I should have used the regular oven instead of the toaster oven.  Even with the gas, I don't think it would be a problem, and I think the fast cooling of the toaster oven was worse.  Also, the fact that the two test pieces of solder melted at different times told me that it wasn't a very even heating.

  • I probably should have wrapped the board in tin foil.  Perhaps this would have helped keep that other random part from falling off.

  • I definitely should have opened windows from the start.

  • There was red glue of some sort around the solders on the chipset.  Someone mentioned that you should carefully remove this.  That was probably good advice because it was charred black.

Wednesday, August 15, 2012

Mountain Dew let the internet name its new soda
Top-voted names for the soda’s new green apple flavor were Hitler Did Nothing Wrong, Diabeetus, Gushing Granny and Fapple.

Thursday, August 9, 2012

With All The Suffering in the World, Why Invest in Science?
Before trying to describe in more detail how our space program is contributing to the solution of our earthly problems, I would like to relate briefly a supposedly true story, which may help support the argument. About 400 years ago, there lived a count in a small town in Germany. He was one of the benign counts, and he gave a large part of his income to the poor in his town. This was much appreciated, because poverty was abundant during medieval times, and there were epidemics of the plague which ravaged the country frequently. One day, the count met a strange man. He had a workbench and little laboratory in his house, and he labored hard during the daytime so that he could afford a few hours every evening to work in his laboratory. He ground small lenses from pieces of glass; he mounted the lenses in tubes, and he used these gadgets to look at very small objects. The count was particularly fascinated by the tiny creatures that could be observed with the strong magnification, and which he had never seen before. He invited the man to move with his laboratory to the castle, to become a member of the count’s household, and to devote henceforth all his time to the development and perfection of his optical gadgets as a special employee of the count.

The townspeople, however, became angry when they realized that the count was wasting his money, as they thought, on a stunt without purpose. “We are suffering from this plague” they said, “while he is paying that man for a useless hobby!” But the count remained firm. “I give you as much as I can afford,” he said, “but I will also support this man and his work, because I know that someday something will come out of it!”

Indeed, something very good came out of this work, and also out of similar work done by others at other places: the microscope. It is well known that the microscope has contributed more than any other invention to the progress of medicine, and that the elimination of the plague and many other contagious diseases from most parts of the world is largely a result of studies which the microscope made possible.

The count, by retaining some of his spending money for research and discovery, contributed far more to the relief of human suffering than he could have contributed by giving all he could possibly spare to his plague-ridden community.

Tuesday, August 7, 2012

Astronomy Wallpapers

I was looking for a new wallpaper.  I figured I'd pick something from the APOD, and downloaded a bunch that looked good.  Well I decided to just crop them all, and set up random wallpapers.

I uploaded them in case there was any interest.  There are about 100 all 1440 x 900 (16:10), which I realize is a somewhat uncommon resolution.  They are mostly galaxies and nebulae, but there is some stuff from the solar system too.


Download: - 30 MB

Random samples:

Monday, August 6, 2012

An Unexpected Ass Kicking

Javascript Color Picker

I made a quick javascript color picker.  There are about a bajillion of these already, but that has never stopped me before.

Friday, August 3, 2012

Two Ridiculous Stories
A farmer in the US state of Vermont who was facing a minor drugs charge is now in more serious trouble after driving a tractor over seven police cars.
Los Angeles police and rescue crews surrounded the fomer Hannah Montana star's house in Hollywood on Wednesday, while helicopters circled the property. But they soon discovered there was no one in the building. Police say the incident may have been part of a trend of prank calls, which has been dubbed "swatting".

Saturday, July 28, 2012

How to compile C++ code in Notepad++ with gcc/g++

I use Notepad++ for all my coding.  It really is a great text editor.  These days I mostly use Perl so I don't have to worry about anything else, but I occasionally use C++ for speed in numerical tasks.  The system I've been using for years is to edit in Notepad++ then hop over to Dev-C++ to compile.  Dev C++ is an IDE that uses g++ to compile, so it's pretty silly to use it as a middle man.  Still, it has worked for years, and continues to.  The problem is that it hasn't been updated in a long time, and it still uses g++ 3.4.2 which is close to 10 years old.

I knew Notepad++ could run programs and that I could just compile directly via command line from it, I had just been too lazy to set it up.  I decided to make it work, and in researching how I found a lot of old info.  I decided to explain the process I used here for future reference.

The first step is to acquire nppexec, which used to be bundled with Notepad++ but isn't any more.  It comes as a zip of one dll and two folders, just put them all in the plugins folder of Notepad++.

Now you need gcc/g++.  MinGW is the standard way of doing this on Windows.  I went with this version because it came with a lot of useful libraries like boost.   Just extract that to some path with no spaces (spaces may work, but are generally a bad idea), and run the batch file.  You may need to add that path to the windows path system variable; I already had it in there from a past install of MinGW.  At this point test the install by running: g++ --version in a command prompt.  I have something like 5 versions of g++ floating around my computer, and it kept running an old version.  I finally just deleted that one.

Now open Notepad++ and hit F6 to bring up the execute window.  Here you can enter this code:
g++ "$(FILE_NAME)" -o $(NAME_PART) -march=native -O3


I'll explain what each line does, so you can edit it as you see fit.  The first saves the current document.  Next changes the directory to the one the file is in.  Before this I kept compiling and couldn't find the executable.  It was in the Notepad++ folder.  Skipping the third for now, the last line runs the executable.

The third line does the actual compiling.  g++.exe "$(FILE_NAME)" should be clear, and -o $(NAME_PART) is the output filename.  -O3 uses the highest optimization level in g++.  There is some debate online as to whether the 3rd level is more trouble than it's worth.  A lot of people suggested that the 3rd was too aggressive, which could cause compile errors and is even sometimes slower than the 2nd level.  This wasn't unanimous though, and there weren't any references to actual examples.  Frankly, I felt like this was probably an example of cargo cult programming where it may have been true 15 years ago but people have just been repeating the same thing without ever stopping to reevaluate if it is still true.  In my exhaustive test of two programs it produced a program that was the same as level 2 as far as I could tell.  -march=native detects which instruction sets (eg SSE) are supported by the system doing the compiling and enables any that are.  This is a good option when the code will run on the same or similar system to the one compiling.

Anyway, once you have this code save it as something like 'Compile C++'.  You could create a few versions here, like one that doesn't run the code after compile, or that uses different flags.  I made one for interpreted languages that don't need compiling that is just:

At this point, you are almost done.  Hitting F6 brings up that execute box where you can select which script to run.  Hitting Ctrl+F6 runs the last ran script.  However, I found hitting Ctrl+F6 to be annoying.  So I decided to remap it to just F5 (leaving F6 to bring up the box).  Under settings go to shortcut mapper.  Under main menu scroll down and find the command currently mapped to F5 (the simpler run box), and change it to something else or nothing.  Then under plugin commands find direct execute last, mapped to Ctrl+F6 and change it to F5.

That's it.  The first time you hit F5 in a document it brings up the box to let you pick which script to run.  Then every time after that it remembers and just runs it.

Thursday, July 26, 2012

Risk Management

This post will be about risk management.  Well not really, I just wanted to post that clip.  Anyway, a few days ago I posted about a site that does some good data analyses.  One of them was on the game Risk.  This post will be a hodgepodge of random items related to Risk.

The Game
Risk is one of the staples of my gaming diet.  The computer version of Risk, that I've been playing for at least 10 years, is called TurboRisk.  Risk is notorious for taking a long time to play.  But that program, while maintaining almost identical rules, reduces games to 5 minutes.  The reason is simply automation.  I'd estimate a large battle (100 vs 100 armies) would take at least 15 minutes (possibly closer to an hour) in real life, but in TurboRisk would happen in about a second.

The speed increase is not the only benefit of TurboRisk though.  It allows anyone to write AI programs that you can play against.  There are about a dozen AI scripts included and they all have some interesting varieties of strategy.

The Program
My strategy is pretty formulistic, and I've long wanted to program an AI script based on it.  Unfortunately, the scripts must be written in pascal.  I had never used pascal, and figured it couldn't be that bad.  However, pascal is an old language and is closer to BASIC than C.  While I've defended BASIC a lot in my life, and still think it's a valid first language, after using C like syntax for so long the thought of using 'begin' and 'end' to deliminate code blocks wasn't appealing.  Still, I gave it a shot, but the combinations of the whole new API, having to code in the custom IDE instead of Notepad++, and length needed even for simple AI caused me to give up after a few hours.

Basic Rules
Before I get on to the fun* stuff I feel the need to explain the basic rules of Risk.  Feel free to skip over them if you are already a Risk master, however no other reason for skipping is acceptable.

Risk is a turn based strategy game.  There are about 40 territories that players can control.  You can attack any territory bordering one of yours (a handful of black lines make some territories border across an ocean).  You can attack as many times as you want on one turn.  To attack you must have at least 2 armies in the attack territory.  The mechanics of attacking are a bit odd.  The attacker can use from 1 to 3 dice, the defender only 1 or 2.  Both players must have at least 1 army per dice being used.  So, if I'm attacking with 2 armies a territory with 1 army I can only use 1 or 2 dice and the defender can only use 1.  It's best to use as many dice as you can.

Both players roll, and then the dice are arranged in order and paired up.  Each pair is compared and who ever has a higher number wins (defender wins ties).  The loser loses 1 army for each loss.  Example: Attacker rolls 5, 3, 1; defender rolls 4, 3.  5 beats 4 (defender loses 1 army), but 3 ties 3 (attacker loses one army).  If the defender had only rolled 1 dice only one army in total could have been lost, but the attacker would have had 3 chances to roll higher than him.

This odd rolling method means that the attacker has a clear advantage (being able to roll more dice), but ties going to the defender is also a clear advantage.  The net result is that for 12 vs 12 and below the defender has the advantage; above 12, the attacker has the advantage.

Moving on, if you defeat all the defending armies you win the territory, and may move any number of armies from the attacking territory into it (you must maintain at least 1 army in all your territories at all times).  The attacker cannot lose his territory in an attack (but if reduced to 1 army it could be an easy target come someone else's turn).  After attacking however many territories you want on one turn, you get one free move.  You can move armies from one of your territories to another bordering it (the same as if you had just conquered it).  Then your turn ends.

Going back to the start of your turn, you are given new armies to place on any of your territories.  The amount you get is the sum of three sources.  First is the number of territories you control (at the start of that turn) divided by 3 (integer only).  The second is a bonus for controlling an entire continent.  The bonus is hardcoded, but ranges from 2-7, and depends on the size.  The last source of armies is for trading in cards.  You get one card at the end of any turn where you captured at least one territory.  There are three kinds of cards; you need either one of each, or 3 of one kind to trade in.  The number of armies you get starts at 4 and grows to 15 for the 6th set, and adding 5 for each set after that.  An important note, in the real game this count is for all card sets traded in by anyone.  In TurboRisk each player has his own count.

To summarize: each turn consists of 3 parts.  Part 1, getting armies based on territories, continents and cards.  Part 2, attacking.  Part 3 moving.

My Strategy
My strategy is a somewhat common one.  Capture a continent and hold it while getting the bonus armies, and slowly build up forces.  Many of the bots do this with Australia as it only has one choke point to control.  I, however, prefer South America.  It has 2 choke points, and the same bonus as Australia , but it has much better options for growth.  From Australia you can only go into Asia, which is wide open and hotly contested.  From SA you can expand up through NA and only have 3 choke points total.  Then you can further expand to take both Europe and Africa while still only having 3 choke points (6, 7 in map).  If you can control that for a few turns you shouldn't have any problem taking the rest of the map.

A word about choke points.  Take 2 and 5, between SA and Africa, for example.  If you are holding SA you will have to build up forces in that choke point.  However, rather than putting them on your continent (2) put them in Africa (5).  This not only gives you 1 more territory, but also denies the Africa continent bonus to anyone else.  The same can be done for all the choke points.

My Analysis
I pointed out this great analysis above.  I wanted to write a program to check these odds, and it was a good excuse to do something in C++.  I spent a while working on speed improvements, with mild success.  My results match his pretty closely, except for one key fact.  As I mentioned above, you must leave 1 army behind when you conquer a new territory.  You must also have at least 2 armies before you can attack at all.  This, in effect, means if you are attacking with 15 armies, only 14 are useful.  As far as all the math is concerned this is the way to go.  This is what he did, but he never added that army back into his analysis.  This was intentional on his part: "All my calculations are based on the number of characters doing the actual attacking, not the number of armies on the starting square."

Still, I feel like it gives the wrong impression about the odds.  He says that 5 vs 5 is the cross over point, when it is actually 13 vs 13 when you count all armies.  Again, it's really just a matter of semantics, but I think it's important to remember which version you are dealing with.

Anyway, I felt like this would be a good thing to code up in javascript, so I largely replicated my C++ code in javascript.  This gave me the excuse to compile some common javascript functions I've used several times into a common shared file.

After finishing it I noted it was fast enough (50 vs 50 and 10,000 rounds takes about 2 seconds for me), but, still I would have liked it to be faster.  A search for risk odds turns up several much better versions.  I decided to blame it on the default javascript PRNG.  Several hours of hunting and benchmarking later, and I can say the default is the fastest.  The first site seems to do the calculations on the server side, which seems odd, but does result in about 2 orders of magnitude of speed increase.  The second site is all javascript though.  Looking at the script shows the key difference.  It has hardcoded the odds for each of the six possible dice combinations.  This is not only faster and more accurate, but is exactly how the guy in the post that started this all did it.

Making a program to calculate this all with recursive probability seems fun, so maybe I'll do that at some point.  History says that I won't though.

* May not actually be fun.

Tuesday, July 24, 2012

Why programmers work at night
But even programmers should be sleeping at night. We are not some race of super humans. Even programmers feel more alert during the day.

Why then do we perform our most mentally complex work work when the brain wants to sleep and we do simpler tasks when our brain is at its sharpest and brightest?

Because being tired makes us better coders.

Similar to the ballmer peak, being tired can make us focus better simply because when your brain is tired it has to focus! There isn’t enough left-over brainpower to afford losing concentration.

Monday, July 23, 2012

Finding interesting things in data

This is the best site I've found in a while.

Wednesday, July 18, 2012

Cryptographic Hash Generators

So, I made a cryptographic hash generator:

Unnecessary Background:
There are plenty of hash generators on the internet, like this one that I stole all the code from.  My problem with this stuff is I never like the UI.  First, they force you to hit a button, instead of just making it generate as you type.  Second, despite the fact that he has all the code for several hashes, the page only displays either sha-1 or md5, and forces you to pick one at a time.

Extra Unnecessary Background:
If you don't know, there are a bunch of extensions that take a strong general password + simple per site password and use a hash to generate a very strong password that is actually inputted into the site.  As with above, while I like the concept I don't like the implementations.

How I would like this to work is this:  Take the general pass + domain name + per site password (any of which could be blank or not used), and generate the sha-512 hash of it.  Then have the user tell it what characters are allowed and how long it may be.  Then convert to a string that meets those constraints while maximizing entropy.

This would mean you could create a nice long pass phrase using all characters as the general password, and enter it once per session, or even store it locally.  Then for extra secure sites you could enter an additional password that would only add to security.  Using the domain name would mean that your passwords would be different for every site even without the extra per site password.  This would also mean you could enter your passwords with no length or character restrictions and convert them to conform with whatever horrible rules various sites had.

The extension I linked to above is pretty good, but only allows for up to 14 chars, and you can't pick only lower or upper case (only both or none).  While 14 chars is very strong for the type of result you're going to get from a hash, there are problems.  What if a site doesn't allow the full array of special characters the extension uses, and only allows lower case?  You'd be stuck with only digits, which would dramatically reduce entropy.  Instead, why not allow any length you want?  If someone is using something like this why not let them use the full allowed password size?  Also, let the user pick exactly what characters are allowed; some sites have some really bizarre password restrictions.

Anyway, since any extension based password generator needs a web based version for when you are away from your home computer, I decided to make the web version, while knowing full well that I will never make the extension.  I then decided to make it more general, and thus it's now just a generic hash generator.

Sunday, July 15, 2012

Orion Nebula: The Hubble View

There have been a lot of really good astronomy picture of the days, but I don't think I've ever posted one before.  This one is really good though. 

Saturday, July 14, 2012

America’s economy: Points of light

A decade ago Air Tractor sold almost all its crop-dusting and fire-fighting aircraft in the United States, leaving it vulnerable both to America’s business cycle and its weather. Now, helped by federal financing, it has increased foreign sales to about half its total. Employment has more than doubled, to 270. From its home in Olney, Texas (population 3,285), Air Tractor this year will sell 40 aircraft, a fifth of its annual total, to Brazil, which needs bigger crop-dusters to expand grain sales worldwide. “If we can do it from a town that has three stop-lights and one Dairy Queen, it can be done by anyone,” says David Ickert, the chief financial officer.

This article does a good job of pointing out the benefit of globalization and a strengthening Chinese economy.  As the Chinese become richer, they go from an unlimited source of cheap labor to a new huge consumer pool.

Thursday, July 12, 2012

zxcvbn: realistic password strength estimation

I've written a password strength estimator that attempts to deal with basic dictionary attacks, but it only attempts a very small dictionary. I've found this checker that works very well, and provides a lot of info about the weaknesses in your password.

Sunday, July 1, 2012

Attempting to predict Mega Millions sales

In the last post, I overviewed the linear least squares method.  Here, we will be using that method to attempt to fit a model to the historical data of Mega Millions jackpots and ticket sales.

As I pointed out before, in order to know if buying a Mega Millions ticket has a positive expected return one needs to know the tickets sold.  That data is only available after the fact.  However, one could attempt to predict the ticket sales based on historical data.  We will do that now.

To begin, I've plotted ticket sales vs jackpot values.  It seems exponential; however, should our independent variable be in the base (`x^2`) or exponent (`e^x`)?  Also, are there any other factors we can find data for that might affect sales?  Our process allows for any number of independent variables (well any number less than the 250 or so data points I'll be using), so we might as well put anything we can think of in there.  The least squares process will minimize factors if they don't have any correlation.

A quick word about the data.  I got the data from the same site I've been using.  In Feb 2010 states became allowed to sell both Mega Millions and Powerball tickets (cross-selling).  Before that there were 12 states selling Megamillions, and after it rose to 35.  Because of that dramatic change in sales, I am only using data from then until now.  Also, a handful of states have been added since then.  That same site also has sales figures for every state, which allowed me to correct for this.  Although I could have done the exact corrections, I just modified each drawing by the average percentage the states added.  I did this because I was too lazy to do it the more thorough way, the percentage of sales was pretty consistent, and the total change ranges from 0% - 3.8% at most.  There are 249 drawings, from Feb 2, 2010 to June 19, 2012 inclusive.

Identifying Variables
The obvious factor is jackpot amount.  We'll represent it as `x_1`.  Another factor that I thought of is simply time.  By this I mean, as time progresses the game becomes more well known and more people play, leading to increased sales.  I'm not too sure how significant this factor would be, but I'll throw it in as `x_2`.  For now we'll go with these two.  There's a few more I can think of to throw in depending on how these two work out.

The next thing to consider is how important these variables are.  More important variables can be used multiple times with different degrees.  Since `x_1` is likely to be very important, we can use it as both `x_1` and `x_1^2`, using a different coefficient for both.  If we felt two factors were interacting with each other we could use something like `(x_1 * x_2)`.

Model 1
I'll begin with a simple model and see how it goes:
`y = beta_0 + beta_1 x_2 + beta_2 x_1 + beta_3 x_1^2`
Where y is ticket sales, `x_1` is jackpot, and `x_2` is drawing number.  Jackpot and ticket sales will be in millions.  Drawing number will be an arbitrary count I made up.  Assigning 1 to the first drawing I have data for, and working up to 249 for the latest.  Hopefully it's clear that since we are going to build the model here it doesn't matter that we are using values in millions or an arbitrary counter, as long as we're consistent.

The betas (`beta`) are the variable coefficients we are trying to find with least squares.  Our goal here is to find actual numerical values for them, and then have a formula where we can plug in jackpot and count values and get out a predicted ticket sales.

If you want actual details of what in the world we're doing here, see my previous post.  We use a spreadsheet to find the values of `x_1` and `x_2` and that gives us a matrix 4 wide by 249 long.  We copy and paste that from the spreadsheet to Maple (which does a nice job of understanding it as a matrix).  We get the value of the vector `bb{b}` from the values for `y`, which are the ticket sales.  Pasteing these two into Maple and running the worksheet gives our coefficients:
`beta_0 = 21.46800262,  beta_1 = 0.000282810868,  beta_2 = -0.1019284315,  beta_3 = 0.001671278620`

Therefore, model 1 is:
`y = 21.46800262 + ( 0.000282810868) x_2 - (0.1019284315) x_1 + (0.001671278620) x_1^2`

Evaluating the Model
In addition, Maple tells us that the normal distance between our model and the actual data is 71.46.  If you live in 249 dimensional space it should be easy for you to visualize what this represents.  For those of you still in 3 dimensional space you'll have to visualize an analogy.   Imagine we have a 2 dimensional plane in which all our answers must lie.  Now imagine we draw a line starting on the plane but pointing out into space in some random direction (a vector).  We can then imagine projecting the line onto the plane in order to get the best possible estimate of that line in our allowed plane.  If you are still visualizing this, there is now a right triangle drawn between the real line, and the projected line.  The distance between the two endpoints (one out in space, and one on our plane) is the third side to the triangle, and it is also the normal distance between the two lines.  This can be thought of as the error between the actual line and our projected 'line model'.  It should be clear that of all the infinite lines we could draw in the plane, and the resulting distance given by them, the one with the smallest distance is the best.

This picture, which I stole from here, arguably does a better job of explaining this than my drunken ramblings.  Here the real line is `bb{b}` (blue), the projection is `tt{proj}_w bb{b}` (yellow) and the normal distance is `bb{b}-tt{proj}_w bb{b}`.  And hopefully to help tie this to the concrete real world example, note that `bb{b}` is our vector containing all the previous ticket sales.  It is 249 dimensions; However, we need it to fit into only 4 dimensions as that is all the unknowns in our model.

Anyway, the point of all this is that our error is about 71.5.  We can use this to quantitatively compare this model to others.  We can also get a general feel for how this model holds up by calculating its predicted ticket sales in our spreadsheet and finding the relative errors from the actual sales.  Here I would say we do pretty well.  Most of the errors are less than 20%, with the largest being 40%.  This error happened on a drawing with a jackpot of $266M and sales of 79M.  This is something of an outlier and can be seen on the plot above.  Another outlier happened a few months later when there was a jackpot of $242M and sales of 122M.  Note this is less of a jackpot and more sales.  It's not hard to see then how our model failed to predict these.  It is worth looking at these outliers and attempting to come up with factors that may have caused them.  There are several I can think of, but for now let's stick to what we have and try a different model.

Model 2
I'd like to try an exponential model, ie, one where the independent variable is in the exponent.  Something like this:
`y = beta_0 * e^{beta_1 * x_1}`

There are some important notes about this one.  First notice that simply plugging in values for `x_1` will not make this linear for the remaining variables.  Since we must have a linear equation for least squares to work, we must find a way to fix this.  This leads to the second point, that we only have a single independent variable here.  Since we are about to use logs to make this a linear equation we are limited by how we can set this up.  We cannot add several variables together, as we did before, because we will end up with a log of that addition, and there isn't any property of logs that will let us do anything with that.

We make this linear by taking the natural log of both sides: `ln(y) = ln(beta_0 * e^{beta_1 * x_1}) to ln(y) = ln(beta_0) + (beta_1 * x_1)`
Is this linear?  Well `ln(y)` is ok, because once we plug in our values for y it will give a number.  It should be clear `(beta_1 * x_1)` is fine, since it is similar to the other models we've used.  It may seem like `ln(beta_0)` could give us problems, however since `beta_0` is a constant and there is not a variable in the argument of the log we can effectively substitute in a new variable such that `beta_2 = ln(beta_0)`.  I'll leave this step in my head for simplicity's sake.

We are thus ready to extract the matrix and vector and use Maple to solve.  Remember to use `ln(y)` and not just `y` values in the vector.

Maple gives us the vector [2.787582661, 0.006493826665].  We must be mindful of what this represents.  `beta_1 = 0.006493826665`, however `beta_0 != 2.776342491`.  Instead, that is the natural log of `beta_0`.  We find that `beta_0 = 16.2417105887`.  Our Model 2 is then:
`y = 16.2417105887 * e^{0.006493826665 * x_1}`

Maple also tells us the normal distance is only 1.6.  Compare this to 71.5 for model 1, and it would seem this is a much better model.  Actually looking at the errors gives some interesting insights.  One thing I notice is that with this model single digit error percentages are more common.  However, the largest error is 59%, quite a bit more than largest of 40% in model 1.  This error is also in the recent $640 M jackpot.  This is rather important as this, ie large jackpots, is exactly where the model is most needed.  Model 1 gave a prediction of 641 M sales, vs the actual figure of 652 M, for an error of about 2%.  Taking the average of all the individual errors with both models gives 9.3% for model 1, and 7.6% for model 2.  The numbers seem to be in favor of model 2, but perhaps it would change if we only considered high jackpots, which is where the model is needed.  Perhaps we should recalculate the models with only the higher jackpots.

For now though, we'll try adding some new variables.

Model 3
I'll revert back to the basic format of model 1, as it's easier to add variables to.  So what are some other possible factors?  Well to begin with, I figure time of year likely has some effect.  The problem is how to represent time of year in a way that would provide meaningful data.  At first just using the month number seems ok; however, this creates a huge difference between December and January.  I doubt that the actual month makes much difference (although since birthdays aren't evenly distributed, maybe that would matter), rather than just the general time of year.  I think the best way to do this is to just create a variable for each time of year.  At first, this meant just having a binary 0 or 1 for 4 variables (seasons).  Then I realized that I might as well vary the intensity of the variable depending on how strongly it was that season.  Then I decided to just go with two seasons (summer/winter) centered around the hottest and coldest parts of the year (July 20, Jan 20).

I was then debating if this factor should be linear or not.  I noted how the average temperatures change quickly in fall and spring, and then generally hover in summer and winter.  Then I realized average temperature is a great factor for this.  I know I always have trouble finding historical weather data on the internet, but luckily averages are a bit easier.  I went with this source for averages in Iowa since the location really didn't matter, as long as it was somewhat representative of the US as a whole.

A second factor that I thought could prove useful was the previous drawing's sales.  I figured that this would help capture factors I didn't think of.  On my first try at this I missed something obvious.  When someone wins the jackpot and it resets, the previous sales don't matter (or at least not in the same way).  To fix this I combined the previous sales with a binary variable that was 1 normally and 0 when the jackpot had just reset.

So, I'm defining three new variables.  `x_3` will be the previous ticket sales.  `x_4` will be a binary switch 1 normally and 0 when the jackpot resets.  `x_5` will be the average temperature.  And our model is:
`y = beta_0 + beta_1 x_2 + beta_2 x_1 + beta_3 x_1^2 + beta_4(x_3*x_4) + beta_5 x_5`

Note this is the same as model 1, but with the two new terms added on the end.

Fast forward some spreadsheet and Maple work and we have our coefficients:
`beta_0 = 24.58,  beta_1 = -0.000739,  beta_2 = -0.10927,  beta_3 = 0.001678,  beta_4 = -0.0463,  beta_5 = 0.0075598`

And our model:
`y = 24.58 - (0.000739) x_2 - (0.10927) x_1 + (0.001678) x_1^2 - (0.0463)(x_3*x_4) + (0.0075598) x_5`

How well does this model perform?  Well the good news is that it's better than model 1; the bad news is that it's basically the same.  Recall that model 1 had a distance of 71.5 and an average error of 9.3%.  Model 3 has a distance of 70.2 and an average error of 9.3%.  I really felt like the temperature should be a good indicator of general season, and that season should have some effect on sales.  I also considered that perhaps forcing the previous sales figure to be 0 when the jackpot reset was doing more harm than good.  So, I redid this model with that term removed and called it model 4.

I won't both posting the details of model 4 because it's much the same.  Distance of 70.2 and average error of 9.3%.  I should mention that if you go out to more decimal places each additional piece of information does decrease both average error, and distance.  However, I'm not going to pretend like that is an improvement in the model.  Given enough variables, no matter how arbitrary they are, the least squares method will approach a perfect representation of the given data.  The problem is that this would only be perfect for the given past data.  If the additional variables have no real effect on the sales there would be no corresponding increase in predictive accuracy.  Many a person has lost money because they over tweaked a model with past data and were so satisfied with how well the model described past results that they were sure it would predict future results equally well.

Another thing worth noting is that our `beta_1` term, which was positive in model 1 is negative in model 3 (and 4).  It is also very small in all models.  This indicates that there isn't any strong correlation here.  So as long as we are removing superfluous variables why not that one too.  For those of you keeping score at home this means we are left with just one variable, the jackpot amount.  This leads us to:

Model 5
We are left with a simple quadratic:
`y = beta_0 + beta_1 x_1 + beta_2 x_1^2`

Its coefficients are:
`y = 21.50354999 - (0.1019449781) x_1 + (0.001671373144) x_1^2`

It has a distance of 71.5 and average error of 9.3%, ie, the same as models 1, 3, and 4.

Model 2 vs Model 5
So it would seem that Model 2 is the winner.  However, I still don't like the huge error on the largest jackpot.  I've plotted these two models and compared them to the actual sales.  I had plotted all the models at first, but since 1, 3, 4, and 5 are so similar they can be treated as the same on the plot.  Anyway, the plot shows that model 5 is very close to actual sales at all points.  Model 2 is too, except for the largest jackpot, where it is way off.  As I said above, since the big jackpots are what matters most, I'm hesitant to recommend model 2 over 5, even though all the number back it up.  Maybe only looking at drawings with a large jackpot will improve the models.

Large Jackpots
I decided to use $100 M as a cut off for the jackpot.  This gives 48 drawings to use for our model.  I then found the coefficients for models 2 and 5 with this smaller data set.  I'm calling these models 2a and 5a. 
Model 2a:
`y = 18.51083081 * e^{0.006057907333 * x_1}`

Model 5a:
`y = 43.12456321 - (0.2824251538) x_1 + (0.001932541134) x_1^2`

Of course, the real question is how accurate are they.  The answer is that model 2a has a distance of 0.74 and average error of 7.0%.  Model 5a has a distance of 53, and an average error of 5.8%.  So, both are more accurate than the original models based on the larger data set.  This isn't surprising, for the same reason it wasn't surprising when adding variables improved accuracy without actually improving the models' predictive power.  Here, however, I think this may be an actual improvement.  It's hard to say though really until we have future high value jackpots to test the models with.

Will it ever be profitable to play the Mega Millions?
Now that we have a way of estimating the ticket sales based on jackpot amount we have the tools needed to answer this question.  In my Mega Millions post I provided the formula needed to determine the expected value of a Mega Millions ticket.  It requires both jackpot and ticket sales, but we now have ticket sales as a function of jackpot.  So, we can plot the expected value of tickets for various jackpots.  The formula for expected jackpot is:
`V = \frac{J}{e^r}\sum_{n=1}^{} \frac{r^n}{n(n!)}`

Where r is the rate of winners. This is found by this formula:
`r = 5.699 \times 10^-9 * y`
Where y is the tickets sold, as given by our shiny new model.  Remember that we used millions in the above model, whereas these formulas all want full figures.  You'll have to convert this manually if you're going to do these calculations yourself (and I know you are).  Also keep in mind that we determined that the non-jackpot prizes were worth about $0.15 per ticket.  Thus, our breakeven point is $0.85.

In theory, we should be able to substitute these equations into each other and end up with one with only one independent variable, J.  Then we could plot that equation, and even take the derivative to find where (and if) it peaks at some maximum expected value.  Unfortunately, if you examine those equations you'll see that we end up with a quadratic for r.  That quadratic then has to be substituted in as an exponent of e; messy but doable.  However, the real problem comes from within the summation.  We have to raise r to a incrementing power.  There's no easy way to do this.  Even letting Maple do the work, we end up with a giant expanded summation hundreds of terms long.  I feel there may yet be a way to get a simplified equation here, but I am unable to figure it out.

It's just as well as it gave me an excuse to figure out some Maple programming.  I simply iterated through all the integer jackpots, from $12 million to $3000 million, and found the expected values (with taxes accounted for, see previous post for details).  The results were interesting.

Since the recent record jackpot had the best expected value in the other post I had expected that the expected value would increase with jackpot.  I knew that the number of multiple winners would increase fast as jackpot increased, particularly as the model shows exponential growth in sales.  This turned out to be the case, but to my surprise the peak was lower than the record jackpot.

The best expected value turns out to be at a jackpot of $530 million.  The value is $0.564751, which falls $0.285249 short of the adjusted ticket price.  I plotted the expected values at various jackpots and found it rather interesting.  The plot is immediately recognizable as a blackbody radiation curve.  I'm not sure what to read into this, other than the obvious fact that the sun controls the Mega Millions jackpot.

Final Thoughts
So, this has all been a good time, but I suppose I should be wrapping this post up at some point.  I'll do that similarly to the first Mega Millions post, by negating everything I've done here with some causal observations.  Despite the impression someone may get after these posts, I don't actually follow the Mega Millions at all, and really had very little idea about it prior to the first post.  Something I'm still not certain of is how the jackpot is calculated and announced.  It seems the jackpot increases continuously during the period between drawings.  Worse, it seems this increase is based on sales.  In other words, the independent variable (jackpot amount) we have been using actually seems to be dependent on the dependent variable (sales), which is itself still dependent on the other.  So both are dependent on each other, and the system is a positive feedback loop.  This means jackpot amount isn't known and can't be used to model ticket sales.  This could be overcome if the jackpot advertised at the last moment to buy tickets is close enough to the final jackpot.

Either way, the Mega Millions certainly has a lot of mathematical life in it.  I could probably keep writing posts about it forever, but won't.

I've updated the lottery spreadsheet with quite a bit of data:

I've also uploaded a zip file of all my Maple worksheets for those of you who wish to play along at home: