Sunday, July 7, 2019

Social Science Research Network

 I've been into reading random papers from SSRN lately.  There's some really good stuff on there, like the paper I mentioned in my last post.


https://hq.ssrn.com/rankings/Ranking_display.cfm?TRN_gID=10

Sunday, June 30, 2019

The law of small numbers

I was listening to a podcast when I heard about an interesting probability result in the same vein as the Monty Hall Problem.  The new problem is this: Flip a coin 100 times and record the results.  Now pick random flips in the set and see if the next 3 flips are all heads; if so we call this a streak.  Repeat until you find a streak of 3.  Now what is the probability that the 4th flip is also heads?  Is it 50% like we would expect?  It turns out to be closer to 46%, which is not very far from 50%, but is also a clear trend.

You can download the paper here, and I recommend you read through the introduction, which is pretty easy to follow.  I think does a good job of explaining what is going on.  Since no one will do that, here is a table from the paper which helps give some intuition.


This represents every possible outcome from flipping a coin 3 times and looking for a 'streak' of 1 heads.  There are eight total possible outcomes, all equally likely.   In the first two, the streak of 1 heads never happens, or happens on the last flip where there is no following flip to look at.  Those are thrown away and ignored.  In the other six possible outcomes we do get a streak, at least once, and earlier than the last flip.  The underlined flips represent the possible candidates for the flip that is following a streak.  If we pick the preceding streak, then the underlined flips will be the one we are trying to predict.  In three out of the six outcomes with a streak, the following flip will not be heads.  In two out of the six outcomes the following flip will always be heads.  And in the remaining possible outcome it could be either head or tails with 50/50 probability depending on which streak you pick.

If you list out all the possible outcomes from any combination of streak length and total flips, you can see that some number of the heads flips are 'consumed' by the streaks themselves.  Those flips can never be following a streak, because they are part of the streak needed to define the streak.  On the other hand, the tails have no restrictions, they are all available to occur in the flip immediately following a streak.  There are simply more tails available to go in the candidate position.  The effect gets smaller as you decrease the streak length or increase the total number of flips in a set.

I found this very surprising, so I wanted to test it out.  I wrote a Ruby script to simulate various coin flips and look for streaks of different lengths, and output the results.  I then decided to rewrite it in a compiled language so it would be faster.  I decided to try out Go, as I've never used it before and I was hoping for something with a bit more syntactic sugar than C.

https://github.com/StephenWetzel/coin-flips-go

Here are the results of a bunch of combinations of streak lengths and numbers of flips from the Go program:
Looking for a streak of length  1 in    10 total flips. Performed 10000 rounds, and   9973 were successful, found 45.29% continued the streak.
Looking for a streak of length  1 in   100 total flips. Performed 10000 rounds, and  10000 were successful, found 49.43% continued the streak.
Looking for a streak of length  1 in  1000 total flips. Performed 10000 rounds, and  10000 were successful, found 49.91% continued the streak.
Looking for a streak of length  2 in    10 total flips. Performed 10000 rounds, and   8203 were successful, found 38.16% continued the streak.
Looking for a streak of length  2 in   100 total flips. Performed 10000 rounds, and  10000 were successful, found 47.72% continued the streak.
Looking for a streak of length  2 in  1000 total flips. Performed 10000 rounds, and  10000 were successful, found 50.15% continued the streak.
Looking for a streak of length  3 in    10 total flips. Performed 10000 rounds, and   4797 were successful, found 34.88% continued the streak.
Looking for a streak of length  3 in   100 total flips. Performed 10000 rounds, and   9995 were successful, found 45.84% continued the streak.
Looking for a streak of length  3 in  1000 total flips. Performed 10000 rounds, and  10000 were successful, found 49.78% continued the streak.
Looking for a streak of length  4 in    10 total flips. Performed 10000 rounds, and   2152 were successful, found 35.83% continued the streak.
Looking for a streak of length  4 in   100 total flips. Performed 10000 rounds, and   9637 were successful, found 40.61% continued the streak.
Looking for a streak of length  4 in  1000 total flips. Performed 10000 rounds, and  10000 were successful, found 49.21% continued the streak.
Looking for a streak of length  5 in    10 total flips. Performed 10000 rounds, and    985 were successful, found 37.36% continued the streak.
Looking for a streak of length  5 in   100 total flips. Performed 10000 rounds, and   7860 were successful, found 38.66% continued the streak.
Looking for a streak of length  5 in  1000 total flips. Performed 10000 rounds, and  10000 were successful, found 48.91% continued the streak.
Looking for a streak of length  6 in    10 total flips. Performed 10000 rounds, and    388 were successful, found 35.82% continued the streak.
Looking for a streak of length  6 in   100 total flips. Performed 10000 rounds, and   5190 were successful, found 35.24% continued the streak.
Looking for a streak of length  6 in  1000 total flips. Performed 10000 rounds, and   9996 were successful, found 46.68% continued the streak.
Looking for a streak of length  7 in    10 total flips. Performed 10000 rounds, and    140 were successful, found 40.71% continued the streak.
Looking for a streak of length  7 in   100 total flips. Performed 10000 rounds, and   2997 were successful, found 33.83% continued the streak.
Looking for a streak of length  7 in  1000 total flips. Performed 10000 rounds, and   9761 were successful, found 42.40% continued the streak.
Looking for a streak of length  8 in    10 total flips. Performed 10000 rounds, and     52 were successful, found 36.54% continued the streak.
Looking for a streak of length  8 in   100 total flips. Performed 10000 rounds, and   1634 were successful, found 33.60% continued the streak.
Looking for a streak of length  8 in  1000 total flips. Performed 10000 rounds, and   8365 were successful, found 38.27% continued the streak.
Looking for a streak of length  9 in    10 total flips. Performed 10000 rounds, and     17 were successful, found 47.06% continued the streak.
Looking for a streak of length  9 in   100 total flips. Performed 10000 rounds, and    784 were successful, found 33.04% continued the streak.
Looking for a streak of length  9 in  1000 total flips. Performed 10000 rounds, and   6037 were successful, found 35.80% continued the streak.
Looking for a streak of length 10 in    10 total flips. Performed 10000 rounds, and      0 were successful, found NaN% continued the streak.
Looking for a streak of length 10 in   100 total flips. Performed 10000 rounds, and    381 were successful, found 30.71% continued the streak.
Looking for a streak of length 10 in  1000 total flips. Performed 10000 rounds, and   3615 were successful, found 33.91% continued the streak.

Tuesday, April 30, 2019

Should You Time The Market?

https://ofdollarsanddata.com/even-god-couldnt-beat-dollar-cost-averaging/
You have 2 investment strategies to choose from.
  1. Dollar-cost averaging (DCA):  You invest $100 (inflation-adjusted) every month for all 40 years.
  2. Buy the Dip: You save $100 (inflation-adjusted) each month and only buy when the market is in a dip.  A “dip” is defined as anytime when the market is not at an all-time high.  But, I am going to make this second strategy even better.  Not only will you buy the dip, but I am going to make you omniscient (i.e. “God”) about when you buy.  You will know exactly when the market is at the absolute bottom between any two all-time highs.  This will ensure that when you do buy the dip, it is always at the lowest possible price.


Making a DIY smartwatch

https://imgur.com/a/FSBwD3g


Friday, March 15, 2019

Everything Smarthome

This is a long, but enjoyable article in broken Russian-English about everything smarthome in 2019.

https://vas3k.com/blog/dumbass_home/

Wednesday, February 27, 2019

Password strength

Dropbox has a password strength estimator called zxcvbn that I like a lot.  It estimates entropy in your password by looking for dictionary or password list leak matches.  It's long bothered me when sites estimate password strength purely based on complexity.  These sites say a password like Password!1 is much more secure than one like zbuwcramudbpvreorkno (a score of 72% vs 21% respectively).  I discuss this in more detail in my How to be secure online post.

However, a while ago Dropbox changed their algorithm to favor length over resistance to dictionary attacks.  There is some logic in their decision, but I really feel like something is lost by not having the old algorithm.  So, I made a demo comparing the two so you can find passwords both algorithms agree are strong.  At the same time, I finally hooked up this domain I bought a while ago to my github pages site.

Thursday, January 31, 2019

Time


Tuesday, December 25, 2018

How to Be Secure Online: The Blog Post

I've read a lot recently about some new types attacks I wasn't aware of before.  Most of these can be defended against pretty easily, it's just a matter of knowing the threats.  I wanted to summarize some of the things everyone should be doing at this point, but most people aren't.

Use a password manager

At this point, you really should be using a password manager.  You have to assume some of the sites you use will be breached in any given year, and when they are the username and password you use there will be tried on other popular sites.  The only way to be safe is to use different random passwords for every site.  There is no way you can memorize random passwords for every site, even if you limit it to only the sites you actually care about the security of.

However, security isn't the only benefit of a password manager, it is also much more convenient.  You can memorize one really good random password, with no restrictions on maximum length or allowed characters, and then use random passwords on every site.  You'll never have to worry about password complexity restrictions, or being forced to change your password again.  Just generate a new 30 character random password and let the password manager worry about keeping track of it.

I wrote about password managers in more detail here.  If you just want the easiest path, then LastPass will work fine.  I use KeepassXC which is open source and offline.  You have to copy the password file between computers and phones yourself, using something like Dropbox, or the open source Syncthing.


Use a long password

You should only need one or two passwords, if you are using a password manager, so you can make them very strong.  You should make your password very long, and not worry about complexity too much.

I've always been bothered with password strength estimators that score you based on complexity.  A classic example of a bad password estimator is http://www.passwordmeter.com/

If I generate a random 20 character password, but one that consists of only lowercase letters like xznmjetjsciqukhspaxv passwordmeter.com gives that a score of 21% (weak).  A 6 character random password like z&*4uV gets a score of 64% (strong), merely because it has lower case, upper case, digits, and special characters.  Tacking on 2 more characters z&*4uV.9 gets you to 100% (very strong).  While that is an ok password, the 20 character one is much, much stronger, despite being all lower case.  Even if the attacker knew that your password was all lowercases there would still be over 10^28 possibilities.  Trying every possible 6 character password, even with all 95 normal keyboard characters possible, is only about 10^12 possibilities.  Which makes the 20 character password roughly a quadrillion times more secure than the 6 character one.  Even the 8 character one is a trillion times worse than the 20 character one.

Luckily, people are starting to wise up to how useless things like replacing o with 0 are.  NIST has updated password guidelines that are a great summary of what restrictions should be on password systems.  Password estimators like the one above used to be much more common, and even major companies used them.  A long time ago I made my own password estimator, which attempted to replace common dictionary words and then figure out the number of possible combinations, however Dropbox has a way better version of that called zxcvbn, named for the bottom row of letters on a keyboard.  Using zxcvbn as a password would seem random to many estimators, but isn't actually, and attackers were already trying keyboard patterns.

At some point, zxcvbn changed its algorithm for calculating entropy.  I didn't like this change, so I made a page with both the new and old versions of it so you can compare the two.


Don't use SMS for 2 factor authentication

Don't use actual cell phone numbers with a traditional carrier, like Verizon, for 2 factor auth.  It is quite easy, and increasingly common to intercept SMS codes via SIM swapping attacks.  All an attacker needs is your phone number; then they call your carrier and pretend to be you with a new phone and SIM card, and ask for your number to be ported to the new phone.  Then they request a 2 factor auth code and it goes to the phone they have instead of yours.

If you are going to use 2 factor auth, you should use a hardware device like a Yubikey, or an app like Authy.  If the service only supports SMS based 2 factor auth, then use a VOIP number like Google Voice, which can't be easily ported to a new carrier.

The worst part of this, is that using plain SMS for 2 factor auth can make you less secure than no 2 factor auth, because an attacker attempting to social engineer their way into your account will be more believable if they have access to SMS codes being sent to them, versus if there is no 2 factor turned on.  In some cases services allow you to reset your password using only your SMS phone number, so someone who knows your phone number, but not your password, can reset it and get into your account.


Freeze your credit

After the Equifax data breach it's safe to assume that if you have a credit history in the US, that history including SSN and date of birth was leaked.  To open new accounts one typically only needs SSN, DOB and name.  To prove your identity online you are sometimes asked security questions generated from your credit history (things like what bank was your car loan in 2015 with?).  All those things were leaked.

A credit freeze simply adds a random PIN that will be needed to open new accounts, ie, any time someone wants to do a hard pull of your credit with one of the reporting agencies, they will require you to lift the freeze, using the PIN.  Note that you can still use your existing accounts with the freeze in place, it's only opening new accounts that will be blocked.  You can quickly and temporarily remove a freeze (called thawing) within a few minutes.  See here or here for more info on how to freeze your credit.

When freezing your credit, make sure they use the word "Freeze" on the page.  Be careful not to do any sort of credit monitoring or "locking", those are paid services that are less effective than freezes.  They will push those hard, both because they can charge for them, and because people freezing their credit restricts the agencies from doing whatever they want with your info.  Worse still, if the monitoring is with a third party, the will require your SSN and other info to monitor your credit, giving your info to yet another database that will inevitable be leaked at some point.