Tuesday, December 31, 2019

Predictions for the decade, from 2010

https://news.ycombinator.com/item?id=1025681

This is a good look back at what people thought the 2010s would bring at the start of them.

Wednesday, October 30, 2019

A comparision of AWS S3 Glacier Deep Archive region pricing


I'm considering using S3 for personal backups.  They recently introduced a new tier of storage called "S3 Glacier Deep Archive" which is intended for storing files that you will likely never, or perhaps once need to read.  Every geographic region AWS offers storage in has its own pricing.  I couldn't find a nice table with all the prices compared so I found the price to store 1 TB for 1 year in each region:


Using their tool: https://aws.amazon.com/s3/pricing/

If you're considering this keep in mind there are some important caveats.  First you pay for each request, which means if you're storing 1,000,000 files you will pay $50 just for the requests.  Doesn't matter if each file is 1 MB, or 1 KB, or even 1 byte each, it's $0.50 per 1000 PUT requests.  You will then also pay storage fees every month on top of that.  As far as I can tell, you don't pay for the bandwidth to upload the files.

Retrieving the files has more caveats.  First you need to pick a speed, standard or bulk.  Standard takes up to 12 hours, and bulk is up to 48 hours.  Standard also costs about 10x as much as bulk.  And here you pay for the individual requests, the data retrieved, and (I believe) bandwidth to download from S3.

So if you're storing many smallish files (documents) you're probably much better off combing them all into a single zip file, to reduce the number of requests you have to do.  On the other hand if you're storing large files (videos), you'd probably be better off leaving them on their own so that ideally you just need to recover one or two, and then don't have to pay for the bandwidth to download them all.


I made this table to compare some scenarios.  The first 3 rows shows the costs to retrieve 1 TB split across either 1, 1024, or1048576 file.  The less file scenarios are cheaper, but not by a ton, and keep in mind if you only needed a few of those files it'd be much cheaper to just grab those individual files if they weren't zipped together.

The bottom 2 rows shows the cost to get 1 GB of files, either as 1 file or 1024 files.  Here the cost is negligible, pretty much however you store and access it.

So it seems in any case the bandwidth is the biggest cost.  Still, since you generally only pay for bandwidth out of S3 and not in to it, you should never really have to pay this, unless you're recovering from a pretty major disaster.  There is also the option to use AWS Snowball, where they will mail you a physical drive which you keep for up to 10 days then mail back.  That works out to be $200 + $0.03 per GB vs just $0.09 per GB for bandwidth.  So you need to be transferring 10s of TBs before it makes sense.

Wednesday, August 14, 2019

nandgame.com

Build a computer out of NAND gates in stages.  This is essentially a game version of my post about how computers work.

http://nandgame.com/

Sunday, July 7, 2019

Social Science Research Network

 I've been into reading random papers from SSRN lately.  There's some really good stuff on there, like the paper I mentioned in my last post.


https://hq.ssrn.com/rankings/Ranking_display.cfm?TRN_gID=10

Sunday, June 30, 2019

The law of small numbers

I was listening to a podcast when I heard about an interesting probability result in the same vein as the Monty Hall Problem.  The new problem is this: Flip a coin 100 times and record the results.  Now pick random flips in the set and see if the next 3 flips are all heads; if so we call this a streak.  Repeat until you find a streak of 3.  Now what is the probability that the 4th flip is also heads?  Is it 50% like we would expect?  It turns out to be closer to 46%, which is not very far from 50%, but is also a clear trend.

You can download the paper here, and I recommend you read through the introduction, which is pretty easy to follow.  I think does a good job of explaining what is going on.  Since no one will do that, here is a table from the paper which helps give some intuition.


This represents every possible outcome from flipping a coin 3 times and looking for a 'streak' of 1 heads.  There are eight total possible outcomes, all equally likely.   In the first two, the streak of 1 heads never happens, or happens on the last flip where there is no following flip to look at.  Those are thrown away and ignored.  In the other six possible outcomes we do get a streak, at least once, and earlier than the last flip.  The underlined flips represent the possible candidates for the flip that is following a streak.  If we pick the preceding streak, then the underlined flips will be the one we are trying to predict.  In three out of the six outcomes with a streak, the following flip will not be heads.  In two out of the six outcomes the following flip will always be heads.  And in the remaining possible outcome it could be either head or tails with 50/50 probability depending on which streak you pick.

If you list out all the possible outcomes from any combination of streak length and total flips, you can see that some number of the heads flips are 'consumed' by the streaks themselves.  Those flips can never be following a streak, because they are part of the streak needed to define the streak.  On the other hand, the tails have no restrictions, they are all available to occur in the flip immediately following a streak.  There are simply more tails available to go in the candidate position.  The effect gets smaller as you decrease the streak length or increase the total number of flips in a set.

I found this very surprising, so I wanted to test it out.  I wrote a Ruby script to simulate various coin flips and look for streaks of different lengths, and output the results.  I then decided to rewrite it in a compiled language so it would be faster.  I decided to try out Go, as I've never used it before and I was hoping for something with a bit more syntactic sugar than C.

https://github.com/StephenWetzel/coin-flips-go

Here are the results of a bunch of combinations of streak lengths and numbers of flips from the Go program:
Looking for a streak of length  1 in    10 total flips. Performed 10000 rounds, and   9973 were successful, found 45.29% continued the streak.
Looking for a streak of length  1 in   100 total flips. Performed 10000 rounds, and  10000 were successful, found 49.43% continued the streak.
Looking for a streak of length  1 in  1000 total flips. Performed 10000 rounds, and  10000 were successful, found 49.91% continued the streak.
Looking for a streak of length  2 in    10 total flips. Performed 10000 rounds, and   8203 were successful, found 38.16% continued the streak.
Looking for a streak of length  2 in   100 total flips. Performed 10000 rounds, and  10000 were successful, found 47.72% continued the streak.
Looking for a streak of length  2 in  1000 total flips. Performed 10000 rounds, and  10000 were successful, found 50.15% continued the streak.
Looking for a streak of length  3 in    10 total flips. Performed 10000 rounds, and   4797 were successful, found 34.88% continued the streak.
Looking for a streak of length  3 in   100 total flips. Performed 10000 rounds, and   9995 were successful, found 45.84% continued the streak.
Looking for a streak of length  3 in  1000 total flips. Performed 10000 rounds, and  10000 were successful, found 49.78% continued the streak.
Looking for a streak of length  4 in    10 total flips. Performed 10000 rounds, and   2152 were successful, found 35.83% continued the streak.
Looking for a streak of length  4 in   100 total flips. Performed 10000 rounds, and   9637 were successful, found 40.61% continued the streak.
Looking for a streak of length  4 in  1000 total flips. Performed 10000 rounds, and  10000 were successful, found 49.21% continued the streak.
Looking for a streak of length  5 in    10 total flips. Performed 10000 rounds, and    985 were successful, found 37.36% continued the streak.
Looking for a streak of length  5 in   100 total flips. Performed 10000 rounds, and   7860 were successful, found 38.66% continued the streak.
Looking for a streak of length  5 in  1000 total flips. Performed 10000 rounds, and  10000 were successful, found 48.91% continued the streak.
Looking for a streak of length  6 in    10 total flips. Performed 10000 rounds, and    388 were successful, found 35.82% continued the streak.
Looking for a streak of length  6 in   100 total flips. Performed 10000 rounds, and   5190 were successful, found 35.24% continued the streak.
Looking for a streak of length  6 in  1000 total flips. Performed 10000 rounds, and   9996 were successful, found 46.68% continued the streak.
Looking for a streak of length  7 in    10 total flips. Performed 10000 rounds, and    140 were successful, found 40.71% continued the streak.
Looking for a streak of length  7 in   100 total flips. Performed 10000 rounds, and   2997 were successful, found 33.83% continued the streak.
Looking for a streak of length  7 in  1000 total flips. Performed 10000 rounds, and   9761 were successful, found 42.40% continued the streak.
Looking for a streak of length  8 in    10 total flips. Performed 10000 rounds, and     52 were successful, found 36.54% continued the streak.
Looking for a streak of length  8 in   100 total flips. Performed 10000 rounds, and   1634 were successful, found 33.60% continued the streak.
Looking for a streak of length  8 in  1000 total flips. Performed 10000 rounds, and   8365 were successful, found 38.27% continued the streak.
Looking for a streak of length  9 in    10 total flips. Performed 10000 rounds, and     17 were successful, found 47.06% continued the streak.
Looking for a streak of length  9 in   100 total flips. Performed 10000 rounds, and    784 were successful, found 33.04% continued the streak.
Looking for a streak of length  9 in  1000 total flips. Performed 10000 rounds, and   6037 were successful, found 35.80% continued the streak.
Looking for a streak of length 10 in    10 total flips. Performed 10000 rounds, and      0 were successful, found NaN% continued the streak.
Looking for a streak of length 10 in   100 total flips. Performed 10000 rounds, and    381 were successful, found 30.71% continued the streak.
Looking for a streak of length 10 in  1000 total flips. Performed 10000 rounds, and   3615 were successful, found 33.91% continued the streak.

Tuesday, April 30, 2019

Should You Time The Market?

https://ofdollarsanddata.com/even-god-couldnt-beat-dollar-cost-averaging/
You have 2 investment strategies to choose from.
  1. Dollar-cost averaging (DCA):  You invest $100 (inflation-adjusted) every month for all 40 years.
  2. Buy the Dip: You save $100 (inflation-adjusted) each month and only buy when the market is in a dip.  A “dip” is defined as anytime when the market is not at an all-time high.  But, I am going to make this second strategy even better.  Not only will you buy the dip, but I am going to make you omniscient (i.e. “God”) about when you buy.  You will know exactly when the market is at the absolute bottom between any two all-time highs.  This will ensure that when you do buy the dip, it is always at the lowest possible price.


Friday, March 15, 2019

Everything Smarthome

This is a long, but enjoyable article in broken Russian-English about everything smarthome in 2019.

https://vas3k.com/blog/dumbass_home/

Wednesday, February 27, 2019

Password strength

Dropbox has a password strength estimator called zxcvbn that I like a lot.  It estimates entropy in your password by looking for dictionary or password list leak matches.  It's long bothered me when sites estimate password strength purely based on complexity.  These sites say a password like Password!1 is much more secure than one like zbuwcramudbpvreorkno (a score of 72% vs 21% respectively).  I discuss this in more detail in my How to be secure online post.

However, a while ago Dropbox changed their algorithm to favor length over resistance to dictionary attacks.  There is some logic in their decision, but I really feel like something is lost by not having the old algorithm.  So, I made a demo comparing the two so you can find passwords both algorithms agree are strong.  At the same time, I finally hooked up this domain I bought a while ago to my github pages site.

Thursday, January 31, 2019