Friday, June 29, 2012

Regional names for common things

Here's a link I found a long time ago.  I was reminded about it and had to search through old email to find it, so I'm posting it here for easy future access.  Also every time I start reading old emails I spend 3 hours, and then ask 'why in the world did I just spend 3 hours reading random old emails?'.

http://www4.uwm.edu//FLL/linguistics/dialect/maps.html

Here are two similar ones:
http://www.commoncensus.org/maps.php

http://www.popvssoda.com/countystats/total-county.html

Now back to old emails.

Friday, June 22, 2012

Psychic made me perform like a porn star to contact my dead father

http://www.dailymail.co.uk/news/article-2162634/Psychic-perform-like-porn-star-contact-dead-father-embarrassed-ashamed-victim-tells-court.html?ITO=1490
‘This meant I had to get naked and perform a bit like a porn star. He said the more outrageous I performed, the stronger I would become.’

The woman, who first visited Lang when she was 19, carried on the sessions for more than three years, during which time she was instructed on reaching higher levels of ability, described in terms of ‘levels’ and ‘colours’.

She said: ‘If you didn’t dance to his tune, all hell would break loose. Bad things would happen. I’d lose my colours if I didn’t do what he wanted.’

Thursday, June 21, 2012

Bias and the Big Fingerprint Dust-Up

http://www.psmag.com/legal-affairs/bias-and-the-big-fingerprint-dust-up-3629/
Mayfield, 37, an Oregon lawyer and Muslim convert, was arrested as a material witness after the devastating terrorist train bombings that rocked Madrid in March 2004. The FBI’s Integrated Automated Fingerprint Identification System turned up 20 possible matches to a partial print on a plastic bag of detonator caps found nearby. Three FBI agents were certain that Mayfield’s prints, in the system after a teenage arrest, were a match. Mayfield’s court-ordered, independent expert concurred.

While Mayfield spent two weeks behind bars, skeptical Spanish authorities chased other leads, ultimately determining that the print belonged to an Algerian, Ouhnane Daoud. Chastened, the FBI backed down and even apologized to the Mayfield family. Mayfield, who was sure he was targeted because of his faith, was later awarded $2 million in damages.
In another study, his team had six international experts each view eight latent prints that they’d each previously examined, but now they were accompanied with a new, mundane context — something like, “the suspect has confessed,” or, “the suspect is in custody.” More expert reversals followed. Four of the six reached different conclusions. One changed his mind three times.
I'm somehow both amazed and not surprised at all that fingerprint examiners are allowed to know the details of the case they are working on or how other examiners have decided.

Wednesday, June 20, 2012

Ego Depletion

http://youarenotsosmart.com/2012/04/17/ego-depletion/
A study published in 2010 conducted by Jonathan Leval, Shai Danziger, and Liora Avniam-Pesso of of Columbia and Ben-Guron Universities looked at 1,112 judicial ruling over the course of 10 months concerning prisoner paroles. They found that right after breakfast and lunch, your chances of getting paroled were at their highest. On average, the judges granted parole to around 60 percent of prisoners right after the judge had eaten a meal. The rate of approval crept down after that. Right before a meal, the judges granted parole to about 20 percent of those appearing before them. The less glucose in judges’ bodies, the less willing they were to make the active choice of setting a person free and accepting the consequences and the more likely they were to go with the passive choice to put the fate of the prisoner off until a future date.

Sunday, June 17, 2012

An overview of using linear least squares to create predictive models

You undoubtedly remember my groundbreaking work with the Mega Millions probability and expected value.  To review, I showed that due to multiple winners splitting the jackpot the Mega Millions has never had a positive expected value.  However, there could come a time when this was no longer true.  In that post, I provided all the math one would need to calculate if there was indeed a positive expected value.  There is one problem though.  To do the calculations one would need the ticket sales.  While I found a good site with all the past sales posted, what matters is current sales.  What is needed then, given historical records, is a way to predict what current sales will be.  I will attempt to use linear least squares to find a model that fits the historical data well enough.

Overview of Linear Least Squares
If you're already familiar with least squares or don't care, then feel free to skip ahead to the next section.

You may remember the general rule for solving systems of equations: you need as many equations as you have unknowns (variables), no more, no less.  If you have too few equations you end up with an infinite number of solutions.  If you have too many you'll end up with an inconsistent result, eg, one where 0=1 (this is ignoring the rare case where everything works out perfectly even with too many equations, a situation that pretty much has to be engineered).

Consider the following pairs of x and y coordinates: (-1,3), (0,2), (1,9).

We know that with three points we can fit a quadratic curve to this data.  Starting with the familiar quadratic equation: `Ax^2 + Bx + C = y`, we have three sets of values for x and y, and can thus generate a system of 3 equations:
`A(-1)^2 + B(-1) + C = 3`
`A(0)^2 + B(0) + C =2`
`A(1)^2 + B(1) + C =9`

This is our system of equations with three unknowns and three equations.  Note here that the unknowns are A, B, and C not x or y.  Solving this system yields values of: A=4, B=3, C=2, and thus `4x^2 + 3x + 2 = y`.

It isn't hard to see why adding a new point could be a problem.  If the new point doesn't fall exactly on that line the equation is no loner true, and indeed there would be no quadratic equation that would satisfy the new system with 4 points.  This is all well and good from a mathematical perspective, but what if this data represented the real world?  In effect, we would be adding additional information about the world, but that would result in a losing information in our model (ie, going from having a perfect model to having none at all).  This doesn't seem to make sense.  There has to be a way to find the line that would fit the available data as well as possible.

The answer is least squares.

Using Least Squares
What happens when we have a lot of data that still clearly is represented by a quadratic equation, but doesn't fit in perfectly?  Let's start with this data:
x y
-5 86
-4 57
-3 24
-2 11
-1 2
0 3
1 8
2 23
3 49
4 81

The plot here shows that these points are still clearly following a quadratic curve.  But they don't match up perfectly with our previously found curve.  Indeed, they won't match any quadratic curve we could find.  So we must use least squares to find the closest match.

To begin, we list the equations those values give:
$$
A(-5)^2 + B(-5) + C = 86 \\
A(-4)^2 + B(-4) + C =57 \\
A(-3)^2 + B(-3) + C =24 \\
A(-2)^2 + B(-2) + C =11 \\
A(-1)^2 + B(-1) + C =2 \\
A(0)^2 + B(0) + C =3 \\
A(1)^2 + B(1) + C =8 \\
A(2)^2 + B(2) + C =23 \\
A(3)^2 + B(3) + C =49 \\
A(4)^2 + B(4) + C =81 \\
$$ We now must extract the coefficients of A, B, and C and use them to build a matrix A.  Note that the coefficient of C is 1 in every equation.  Also note that we must square the appropriate values first.  This means that `A(-5)^2 + B(-5) + C = 86` becomes `A(25) + B(-5) + C(1) = 86`.   Even if you've never worked with a matrix before it should be pretty simple to see what we are doing mechanically:
$$A=
\left[\begin{array}{rrr}
25 & -5 & 1 \\
16 & -4 & 1 \\
9 & -3 & 1 \\
4 & -2 & 1 \\
1 & -1 & 1 \\
0 & 0 & 1 \\
1 & 1 & 1 \\
4 & 2 & 1 \\
9 & 3 & 1 \\
16 & 4 & 1
\end{array}\right]
$$
We now take the values of y and use them to define a vector b:
`bb{b}=[[86],[57],[24],[11],[2],[3],[8],[23],[49],[81]]`

We now use what is known as the 'normal equations': `bb{hat{x}} = (A^T A)^(-1) A^T bb{b}`. The resulting value of the vector `hat{x}` will be our values for A, B, and C.  Maple tells me that the values are: A=4.129, B=3.438, C=1.024.  Thus, our best fit curve is `4.129 x^2 + 3.438 x + 1.024 = y`.  This plot looks pretty similar to the other one.  However, if you compare the two it's clear that the curve passes closer to the points in this one.

I'd like to mention a caveat about the normal equations we use to find our best fit equation.  The normal equations come from linear algebra, which as the name implies, deals with linear systems.  Therefore, we must be taking the coefficients for our matrix A from a system of linear equations.   "But wait!" you probably don't say, "we took the coefficients above from a group of quadratic equations.  They're not linear."  However if you look at what we actually did above, we first put our values for x and y in, and once we did that we had linear equations.  Take `A(-5)^2 + B(-5) + C = 86` which evaluates to `A(25) + B(-5) + C(1) = 86`, that is indeed a linear equation.  This is a key point, and why I stressed above that x and y were not unknowns, A, B, and C were.  This reveals the power of the normal equations.  It is possible to make a great number of equations into linear equations if one can plug in data for certain values.  Arbitrary degree functions, exponentials, logarithms, they can all be converted to linear equations under the right circumstances.

Solving the normal equations is all pretty easy arithmetic.  Unfortunately, it is an excruciatingly long process.  However, if you live in a world where computers exist, you are in luck because computers love solving matrix problems.  I've made a Maple worksheet for the example problem I used here, with a few comments.  It should be pretty easy to figure out how to modify it to solve for different systems.  You only need to modify two lines (A and b).
http://daleswanson.org/blog/leastsquares.mw

The next post where I attempt to fit a model to Mega Millions jackpots and ticket sales is here:
http://daleswanson.blogspot.com/2012/07/attempting-to-predict-mega-millions.html

Saturday, June 16, 2012

Spelling Nazis

http://science.slashdot.org/comments.pl?sid=2920459&cid=40345857

Story:
"The Black Death, a strain of bubonic plague that destroyed nearly a third of Europe's entire population between 1347 and 1369, has been found in Oregon. Health officials in Portland have confirmed that a man contracted the plague after getting bitten by a cat. The unidentified man, who is currently in his 50s, had tried to pry a dead mouse from a stray cat's mouth on June 2 when the cat attacked him. Days later, fever and sickness drove the man to check himself into Oregon's St. Charles Medical Center, where he is currently in 'critical condition.'"

Post:
"I can has worldwide pandemic?"

Reply:
"haz"

Reply:
"It's a beautiful world we live in when we have a second spelling and dialect for what we imagine our domesticated companions are telling us... and there are spelling and grammar nazis for that dialect."

Friday, June 15, 2012

Best Mountain Goats Live Shows

I rarely listen to live shows, but Mountain Goats live shows are superb.  Here are my top 10 Mountain Goats live shows by plays/tracks:
Live: 2000/10/19 - WFMU In-Studio
Live: 2003/09/27 - Mercury, Austin TX
Live: 1998/02/06 - Cow Haus, Tallahassee FL
Live: 2002/03/10 - The Green Room, Iowa City IA
Live: 2001/08/03 - 40 Watt Club, Athens GA
Live: 2000/10/15 - Go! Rehearsal Studios Room 4, Carrboro NC
Live: 1998/04/12 - Brownie's, New York NY
Live: 2007/06/17 - Farm Sanctuary, Watkins Glen NY
Live: 1996/03/20 - The Garage, London UK
Live: 2005/06/23 - Bottom Of The Hill, San Francisco CA
Live: 1996/01/18 - WNUR, Evanston, IL

Most of these are on archive.org too. A simple google search gives the download:
http://archive.org/details/mountaingoats1998-02-06.flac16

Thursday, June 14, 2012

What TLDs Does Google Want?

In case you're unaware, ICANN are in the process of auctioning off new TLDs.  This means soon, instead of google.com you'll be able to visit .google (note the dot is required).  Of course, if you simply type 'google' into your browser now it will likely take you to google.com.  As well as the fact that people already don't understand domains or URLs, and I can't imagine telling someone to visit .google would help (although it is my sincerest hope that somehow slashdot is able to obtain the 'dot' TLD, giving the url http://slashdot.dot/).

Anyway, this change is useful for three reasons:
  1. Confusing everyone.
  2. Breaking a lot of software and giving no easy way to detect urls in text.
  3. Making ICANN a lot of money.
Well after of few months of applications the list 1930 initial requested TLDs is in:
http://newgtlds.icann.org/en/program-status/application-results/strings-1200utc-13jun12-en

A quick search shows that dot is already claimed by 'Charleston Road Registry Inc.' and Dish Networks.  They will now send their fittest employee to battle in the thunderdome.  Charleston Road Registry Inc. also requested 100 other TLDs, some detective work (ie noticing the email address is @google.com) reveals that it is actually Google.  Here are the TLDs Google wants (minus 3 non latin alphabet ones, google in japanese and chinese, and everyone in japanese):

Ads And Android App Are Baby Blog Boo Book Buy Cal Car Channel Chrome Cloud Corp Cpa Dad Day Dclk Dds Dev Diy Docs Dog Dot Drive Earth Eat Esq Est Family Film Fly Foo Free Fun Fyi Game Gbiz Gle Gmail Gmbh Goo Goog Google Guge Hangout Here Home How Inc Ing Kid Live Llc Llp Lol Love Mail Map Mba Med Meme Mom Moto Mov Movie Music New Nexus Page Pet Phd Play Plus Prod Prof Rsvp Search Shop Show Site Soy Spot Srl Store Talk Team Tech Tour Tube Vip Web Wow You Youtube Zip

Some are obvious, but some seem really odd.  .meme?

Top requested TLDs:
13        APP
11        HOME
11        INC
10        ART
9        BLOG
9        BOOK
9        LLC
9        SHOP
8        DESIGN
8        MOVIE
8        MUSIC

Other highlights:
Walmart wants .george
Microsoft only want 11, and they're all boring
Amazon want 76, the only other legitimate company to want more than a dozen or so.
Apple only requested .apple.  Perhaps they are using a different shell company for the rest, although nothing else mentioned 'apple'
The award for most goes to some guy named Daniel Schindler with 307 each registered to a different LLC named for the domain.  I can only assume these are some highly legitimate businesses or some sort of plot for fish to take over the internet.
No request for .jello.  Come on Kraft, get your act together.  I would pay top dollar* for a domain at .jello.
In fact they were pretty boring.  .sex and .porn are reserved for current holders of .xxx (ie no one), so there was only one request for those by someone who didn't get the memo.  Other than those there were no dirty words that I could think of (and I can think of a lot).


*top dollar = exactly what I pay now for .org

Wednesday, June 6, 2012

The proper tax rate on capital income is zero

To see why this is so, consider twin brothers who each make $100,000 in wage income. Most people would regard these two people as equally well off, even if one freely chose to consume his income now, while the other chose to consume later. But not advocates of the income tax. They insist the more patient twin brother is “richer” and deserves to be taxed at a higher income tax rate. For instance, compare a 40% wage tax with a 40% income tax. Under the wage tax (sometimes called a “payroll tax”) the spendthrift brother is able to consume $60,000, which is 40% less than in a no-tax economy. Now assume the thrifty brother invests the after-tax wage income for 20 years, and sees the money double to $120,000. Then he can consume $120,000 20 years in the future, which is also 40% less than the no- tax consumption level. This sort of tax is neutral with respect to saving and investment; it’s essentially a flat rate tax on consumption, whenever it occurs.
 I'd counter that the higher $200,000 post-investment income should be taxed at a higher rate than the $100,000 pre-investment income.

In general I'm in favor of inflation adjusted capital gains, even after this article.  However, I feel it's more important to tax wealth transferred between generations (inheritance tax).  Treat inheritance the same as gifts or prizes, and tax all of them along the lines of: $0-$10,000 exempt; $10,000-$100,000 @ 10%; $100,000-$1,000,000 @ 50%; 1,000,000+ @ 80%.