Tuesday, March 12, 2024

Inside North Korea's Forced-Labor Program

https://www.newyorker.com/magazine/2024/03/04/inside-north-koreas-forced-labor-program-in-china

In late 2023, an investigator hired by my team visited a Chinese plant called Donggang Xinxin Foodstuff. He found hundreds of North Korean women working under a red banner that read, in Korean, “Let’s carry out the resolution of the 8th Congress of the Workers’ Party.” Soon afterward, the investigator visited a nearby plant called Donggang Haimeng Foodstuff, and found a North Korean manager sitting at a wooden desk with two miniature flags, one Chinese and one North Korean. The walls around the desk were mostly bare except for two portraits of the past North Korean leaders Kim Il Sung and Kim Jong Il. The manager took our investigator to the workers’ cafeteria to eat a North Korean cold-noodle dish called naengmyeon, and then gave him a tour of the processing floor. Several hundred North Korean women dressed in red uniforms, plastic aprons, and white rubber boots stood shoulder to shoulder at long metal tables under harsh lights, hunched over plastic baskets of seafood, slicing and sorting products by hand. “They work hard,” the manager said. The factory has exported thousands of tons of fish to companies that supply major U.S. retailers, including Walmart and ShopRite.

Thursday, January 18, 2024

Let´s play money making game.

I've recently been playing through the NES Legend of Zelda; a game I played a lot as a kid.  In the preinternet era I somehow spent enough time in this game to know every secret, every hidden door, and could beat the game in a single play through.

 



However, one aspect never really spent much time on was the "money making game" aka, the basic gambling game where you choose one of three options and randomly either won or lost rupees.  I didn't spend much time playing it as a kid, because I had the distinct memory that it wasn't a good choice, ie, it had a negative expected value (I'm not sure I would have phrased it quite like that as an 8 year old, but I digress).  Another side note, while I didn't play it much, the idea of a random gambling minigame inside an unrelated game always stuck with me, and directly sparked the inclusion of a similar concept in the game pirate2, which you're undoubtedly familiar with.

Anyway, this is turning into a long post which is just a couple links, but I was wondering if the MMG was in fact a bad value or if there was any secrets there I didn't know about.  So, I started search for the answer, and came across these two very labor intensive analyses of the game, which come from totally different angles.

The first is a look at the assembly code of the game itself to see how the RNG worked, and what exactly the distribution of negative and positive payoffs was.

The second was just a guy who played the game 500 times and kept track of the outcomes.

They both reached the same conclusion: The right rupee is the worst choice and the middle rupee is the best choice.  In fact, the middle rupee does have a positive expected value, and so you can expect to make money playing it, in the long run.

Friday, December 22, 2023

Credit card debt collection

https://www.bitsaboutmoney.com/archive/the-waste-stream-of-consumer-finance/

This was one of those articles I had bookmarked for months, and when I finally started reading it seemed like a lot of stuff I already knew.  But, it got pretty interesting towards the end.  I'd recommend reading the whole thing, but I'll quote some longer parts I found interesting here.

The rights of debtors are observed by both primary lenders and eventual debt buyers mostly in the breach. One of those rights is to a written “debt verification”, with specified information in it, and (surprisingly, if you haven’t worked in this field) despite that being the law many debts are sold in such a fashion that the buyer couldn’t produce a responsive verification even if they wanted to. That isn’t even a political claim; it’s just the engineering reality of which columns are in their CSV file.

The former advocate in me will observe that the single most effective method for resolving debts is carefully sending a series of letters invoking one’s rights under the FDCPA (and other legislation) to a debt collector who is operationally incapable of respecting those rights, then threatening them with legal or regulatory action when they inevitably infringe upon them in writing, leading to them abandoning further attempts at collection.

This effectively makes paying consumer debts basically optional in the United States, contingent on one being sufficiently organized and informed. That is likely a surprising result to many people. Is the financial industry unaware of this? Oh no. Issuing consumer debt is an enormously profitable business. The vast majority of consumers, including those with the socioeconomic wherewithal to walk away from their debts, feel themselves morally bound and pay as agreed.

Why are debt collectors so bad at debt collection? Partially it is because credit card issuers are large national institutions with large, automated processes sitting atop a legacy of corporate acquisitions, IT migrations, and similar that makes availability of non-critical information extremely fragmentary. They then want to dump that complexity through a very small pipe (CSV files) onto the debt collection industry.

And this

The FDCPA and state legislation provides for automatic damages for illegal behavior from collectors, the incidence of illegal behavior is extremely high, and a debt collector with a high school education and three months of experience will frequently commit three federal torts in a few minutes of talking to a debtor then follow up with a confirmation of the same in writing. (You think I am exaggerating. Reader, I am not. “If you don’t pay me I will sue you and then Immigration will take notice of that and yank your green card” contains three separate causes of action: (frequently) a false threat to file a suit where that is not actually a business practice of the firm, a false alleged affiliation with a government agency, and a false alleged consequence for debt nonpayment not provided for in law.)

As a result, private companies compiled databases of (public in the U.S.) court filings and organized them by Social Security number, address, and similar to allow debt collectors to identify which debtors are aware of their legal rights. In principle, a debt collector could do anything they wanted with that fact, like being extra careful to follow the law in contacting them. But the economics of debt collection do not counsel careful, individualized consideration of credit card debt.

I will bet you that, in practice, they simply avoid collecting against anyone who demonstrates ability and financial resources to enforce their rights. This is one for the history books of borked equilibriums. We devoted substantial efforts to pro-consumer legislation to address abuse of (mostly) poor people. We gated redress behind labor that is abundantly available in the professional managerial class and scarce outside of it, like writing letters and counting to 30 days. (People telling me they were incapable of doing these two things is why I started ghostwriting letters for debtors.) We now have literal computer programs exempting heuristically identified professional managerial class members from debt collection, inclusive of their legitimate debts, so that debt collectors can more profitably conserve their time to do abusive and frequently illegal shakedowns of the people the legislation was meant to benefit.

 

 

Friday, December 15, 2023

The Business of Theme Parks, How Much Money Do They Make?

Despite our extended rambling, we’d like to think that you can leave with some concrete implications.  In summary, here are some of the major ones.

  • Mega theme parks such as Disney and Universal find it difficult to be profitable in the first few years after opening, and have a very low project ROI.  But it doesn’t matter.
  • Don’t seek to compete with Disney or Universal, unless you are them, or you can line up tremendous sources of financing.  Look for creative sources of financing.
  • For investors, indoor parks are promising because they generate enough in sales productivity to be appropriate for urban retail environments.
  • Regional and superregional parks are appropriate when land prices are extremely low, and the market does not have similar types of developments.  They can be built relatively cheaply.  And once built, these parks can become the landmark attraction of a market for decades.
  • Waterparks are often a more efficient alternative to regional parks, in that they occupy less space but have a similar return profile.
  • It’s important to define the return measure for attractions.  Very often, the partner/developer/government cares less about financial returns and more about employment, tourism, and GDP impacts.  Governments can be a good partner.  On the other hand, don’t benchmark these kinds of developments if you’re a purely private operation.


https://www.theparkdb.com/blog/the-business-of-theme-parks-part-i-how-much-money-do-they-make/


https://www.theparkdb.com/blog/the-business-of-theme-parks-part-ii-how-much-do-they-cost-and-earn/

 

Tuesday, November 28, 2023

The McDonald's theory of why everyone thinks the economy sucks

https://www.natesilver.net/p/the-mcdonalds-theory-of-why-everyone

The point is simply this: it’s very easy to spend a lot more these days on fast food in ways that don’t necessarily show up in inflation data. Three years ago, I might have walked down the block and ordered a Super Duper Double Cheeseburger, fries and a Diet Coke for $9.67 before tax. Now, because Uber Eats and the restaurant have correctly determined that I’m lazy and they can price-discriminate against me and I fell for their viral marketing campaign, I’ll have them deliver me a Triple Super Duper Cheeseburger Deluxe with grilled onions, plus fries and a Diet Coke — at a price of $24.25 before tax.

And don’t think this experience — or the Idaho man’s experience — is atypical. Getting fast food delivered is pretty damned expensive. In-store, fast food can be expensive too if you start messing with upgrades, add-ons and specialty items. This is why McDonald’s revenues are still rocketing up even as inflation has slowed down.

Thursday, September 28, 2023

An analysis of radio song play frequency for 102.9 WMGK, in 2023

Background

The other day I happened to be reading through my post about how often the classic rock station in Philly, WMGK, plays different songs.  I noticed that that post was made 10 years ago, on August 26, 2013.  That made me wonder how things may have changed since then and how much work it would be to scrape their recently played page again to collect the data.  I was sure my old code wouldn't work, but figured I'd take a look at the page and see how hard it'd be to write something new.

It didn't take long to realize this was the one gift every scraper lives in hope of: Their front end was just hitting an open API to get the data.  The network tools showed the request:
https://nowplaying.bbgi.com/WMGKFM/list?limit=200&offset=0

Which returns a nicely formatted JSON list of the 200 most recently played songs.  That would make it trivial to collect this data, but it got even better.  If you noticed the offset parameter there, this was pretty much just letting me get data directly out of their database.  I tested it, and sure enough, I could just page through the data to go back as far as I tested (a few months).  This meant I could get the full 60 days worth of data, for the exact same time period as I used in 2013, instantly.

I whipped up some code real quick to do the scraping, and store the data in a basic SQLite database.  I want to mention here that I'm increasingly a fan of using SQLite for storing data for these types of quick projects.  Don't get me wrong, I still wished I had a real database a dozen times when writing queries in this project, and wouldn't use SQLite for an actual application, but the alternative here is CSV files, not a full Postgres database.  And when compared to CSV files, there's no contest.  Being able to write SQL, and being able to decide to expand the scope and add a few more tables later on, while still having those relations represented is very nice.  And from the other side, using SQLite means you still have a file you can easily share like a CSV file.  With the expected audience of this post, I probably don't have to spend any more time arguing that databases are good, so I'll stop here.

I also want to say that while I was doing the scraping, I was very paranoid they would realize what I was doing and cut me off at any moment.  That is, until I realized there was no way anyone in the world cared about this data besides me, and there was no one watching a dashboard live seeing the spike in queries and immediately launching into action to prevent the scraping of their priceless WMGK recently played data.


So what does WMGK play?

With that all out of the way, how does WMGK in 2023 compare with WMGK in 2013?  I was able to look at the exact same date range as in 2013, June 26th to August 24th, or 60 days.

In my previous post I compared what WMGK played with what the most popular songs by those bands were on Last.fm a lot.  I'm not doing any of that here.  You can refer to the graphs from that post if you want to see what the most popular songs by bands are, but here I'm focused on how WMGK has changed in 10 years.

I would assume that they are playing more recent bands, and perhaps have retired some of the older bands they used to play.  Philly still has a Rock station in WMMR though, so I would think their might be a more older bias still on the Classic Rock station.  Also WMGK (and WMMR) were bought in 2016 by the Beasley Media Group, which I assume is ran by Pam from The Office.  Might the new owners prefer more variety?  Conglomerates buying and consolidating industries usually results in a better product, right?

Coverage

To review, in 2013 WMGK played 14,286 songs during the 60 day period, of which 924 were unique.  But of those unique songs, there was a clear trend where a few dozen got played daily or every other day, and then they occasionally threw in a "deep" cut (which likely wasn't that deep).  The best summary of that fact, was that just 172 songs represented half of those 14,286 plays over 60 days.

In 2023 WMGK played 15,190 songs during the same 60 day period (6/26/2023 to 8/24/2023), of which 1060 were unique.  So far so good, that is better than last time, albeit not much.  Really the question is how distributed are those songs.  Is it the same as last time, where the same handful of songs get played constantly?  In a word, yes.

Here's the coverage breakdown.

2013
2023
Coverage Songs
Coverage Songs
10.04% 28
10.30% 31
25.27% 77
25.02% 79
50.23% 172
50.03% 168
75.13% 279
75.10% 275
90.03% 373
90.01% 381
100.00% 924
100.00% 1060

To review what I mean by "coverage" this is the number of songs you'd need to represent x% of plays.  In other words, there's a 25% chance that any random song you heard on WMGK was one of the top 79 songs they play.  If anything, it's striking how similar those numbers are.  The only noticeable difference is a slightly longer tail of songs they played once.

Top Songs

So, things aren't looking good for 2023 WMGK, but let's look closer.  What were the top songs played in 2023 vs 2013?  The first table is sorted by top songs in 2013, and the second is top songs in 2023.

Plays/30 days (2013) Band Song Plays/30 days (2023)
27.5 Warren Zevon Werewolves Of London 0
27 Cars Just What I Needed 26.5
27 Blue Oyster Cult Burnin' For You 22
27 Steve Miller Band Rock 'n Me 26
26.5 Supertramp The Logical Song 23.5
26.5 David Bowie Changes 18
26.5 Pink Floyd Another Brick In The Wall 21.5
26.5 Electric Light Orchestra Do Ya 0
26 J. Geils Band Centerfold 25.5
26 War Low Rider 25.5




Plays/30 days (2013) Band Song Plays/30 days (2023)
0 Survivor Eye Of The Tiger 27.5
0 Outfield Your Love 27.5
4.5 Allman Brothers Band Ramblin' Man 26.5
27 Cars Just What I Needed 26.5
1 Eddie Money Take Me Home Tonight (Be My Baby) 26.5
14 Pete Townshend Let My Love Open The Door 26
27 Steve Miller Band Rock 'n Me 26
16.5 Fleetwood Mac Go Your Own Way 25.5
22 Guess Who American Woman 25.5
26 J. Geils Band Centerfold 25.5

Obviously, I expected changes, but it is surprising how many songs went from 0 plays in 2013 to top 10 song in 2023 or vice versa.  While I don't have anything against Eye of the Tiger, it doesn't strike me as a particularly "Classic Rock" song.  Admittedly, there's some association with Philly due to Rocky, but still, it's an odd top song.

However, it's time to discuss the biggest travesty I discovered while looking through the 2023 data.  You may have noticed that Warren Zevon's Werewolves of London went from the #1 song, with 27.5 average plays in 30 days, down to 0 plays across 60 days.  That is bad enough, however, one of the new bands to be played in 2023 was Kid Rock.  He had a single play for a song called "All Summer Long", which I decided to listen to to see if it qualified as Classic Rock.  Well I'm guessing most people reading this are ahead of me on the punchline, but that song is the one were he "samples" the Werewolves of London intro.

So, to review: If you were listening to 102.9 WMGK, the Classic Rock station from June 26th 2023 to August 24th 2023, and you heard the Werewolves of London intro, we can say with absolute certainty that what you were actually hearing was Kid Rock.  Our ancestors weep.

I'm hesitant to even link to the Kid Rock song, but if you aren't familiar with it, here it is.  After you watch 5 seconds of that, please switch to the Werewolves of London video, and tell me that isn't an objectively better video.

Top Bands

Well, I don't even know if there's any point in going on, but for what it's worth, here's the top bands.

Plays (2013) Band Plays (2023) Change
356.5 Rolling Stones 251.5 -105
334.5 Led Zeppelin 252 -82.5
283 Beatles 69.5 -213.5
183 Pink Floyd 147.5 -35.5
177.5 Who 82 -95.5
169.5 Van Halen 130 -39.5
164 Queen 135.5 -28.5
154.5 Journey 161.5 7
143 Cars 71.5 -71.5
138.5 Billy Joel 178 39.5








Plays (2013) Band Plays (2023) Change
334.5 Led Zeppelin 252 -82.5
356.5 Rolling Stones 251.5 -105
138.5 Billy Joel 178 39.5
105 Elton John 168.5 63.5
105.5 Aerosmith 167 61.5
154.5 Journey 161.5 7
135.5 Tom Petty & The Heartbreakers 160 24.5
25.5 U2 148 122.5
183 Pink Floyd 147.5 -35.5
77.5 Fleetwood Mac 147 69.5

More similar than the top songs.  Biggest changes are the dropping of Beatles and the rise of U2.  Sorry, I can't think straight after that Kid Rock/Werewolves of London debacle.  But let's perservere

Here's a graph of the top 50 bands.

And here's the 2013 version, for comparison.

Chart is pretty similar, although the top bands in 2023 are only getting 250 plays/month vs 350 for in 2013.  The rest of the top tier bands got 150 plays/month in both 2013 and 2023.

Plays per hour of day

Here's a quick plays per hour chart.

And then the 2023 version.

Similar, but less of a clear pattern.

Top songs by band

This is the part of the post where I looked at a bunch of individual bands last time.  I'm not really going to do that here, we already know MGK doesn't play a variety from any band, they just pick 2 or 3 hits and only play those.  But I will post one comparison for one band.

That band is Yes, who admittedly are maybe not the best radio band, but they absolutely have plenty of Classic Rock hits.  In 2013 MGK played I've Seen All Good People 22 times/month, and then occasional plays from "deep cuts" like Roundabout and a few others.

In 2023 MGK played only 2 Yes songs, one was Roundabout with 1.5 plays per month (down from 3.5 in 2013).  The other got 25.5 average plays in 30 days, and that song was Owner of a Lonely Heart.

Now I like Owner of a Lonely Heart, just like I like every Yes song, but it is not the first song that comes to mind when I think of Classic Rock.  At this point, I just want to say that I did go to WMGK's site and confirm that they still refer to themselves as a "Classic Rock" station, which they do.  I just can't fathom the decision making here.  I'd like to present this dramatization of how that decision may have been made.

"Ok, we need our single song for our Yes allotment for our Classic Rock station that we can play every single day"

"What if instead of playing the same Yes song every day we just choose from a handful of the best Yes songs?"

"You're fired, who's got my pick?"

"What era do people most associate with both 'classic rock' and the band Yes?  The 80s right? So let's play the most 80s Yes song, Owner of a Lonely Heart"

"Perfect, let's go to lunch" 

Changes in band makeup

Another thing I wanted to look at was what bands from 2013 were no longer played in 2023, and what bands were new in 2023.  I don't think I ever gave this stat in my prior post, but in 2013 there were 172 distinct bands played.  In 2023 that number has gone up to 293.  Looking at a diff of the two, there aren't many that disappeared; a few that look like they did are just due to formatting changes, like replacing & with and, which I could clean up, but I'm not.

Skimming through the diff, the only bands I notice going away in 2023 are: Jefferson Airplane, The Animals, Yardbirds, and Zombies.  Which are are all very 60s era bands, so I guess that makes sense.  There's a ton of new 90s bands, which I can't say I agree with for a Classic Rock station, but that is far from the top problem.

"One hit" wonders

Finally, I attempted to answer the question, how many bands do they just play one song (or nearly one song) from?  So to start with some high level stats.  There were 293 distinct bands, and of those 78 were only played 1 time, leaving 215 bands that were played multiple times.  Of those, there were 45 bands with at least 3 plays, yet only 1 song ever played.  In addition to those 45 bands with only 1 song played multiple times, there were 33 more bands with multiple songs played, but where the top song was at least 80% of those plays.  Finally, there are only 68 bands with at least 3 plays where the top song wasn't most (> 50%) of those plays.

I eyeballed the list of some of these groups, and two stood out to me from the list of bands where MGK played just 1 song by a band: Alice Cooper (School's Out), and Thin Lizzy (The Boys Are Back In Town).  If we expand that list out to include bands with multiple songs, but where the top song was at least 80% of the plays then we get a lot more, including:

Alice In Chains Man In The Box
Neil Young Rockin' In The Free World
Yes Owner Of A Lonely Heart
Allman Brothers Band Ramblin' Man
Electric Light Orchestra Evil Woman
Van Morrison Moondance
Rush Tom Sawyer

Keep in mind these were just the bands that stood out to me as bands where I know they have a bunch of hits that could be played besides just one song.


Closing

I think the data speaks for itself (with the help of my multiple page essay you just read through): WMGK has not gotten better from 2013 to 2023, and in fact, has gotten much worse in some key areas.  I'm curious what 2033 holds.

Also, for the record, I don't think I've listened to WMGK, or any radio station since my 2013 post.  So maybe I have no room to talk, maybe the people listening to the radio in 2023 want to hear the same 80 songs on repeat every day forever.

 

Further Reading

As I mentioned above, I did a similar analysis in 2013, which you probably should have read before this post.

In the extraordinarily unlikely scenario that you want to scrape some WGMK recently played songs for yourself, the code is up on Github.

In the slightly less unlikely scenario you just want to play around with the data here is an unorganized spreadsheet with 10 tabs worth of data I used for this post.  And here is the full SQLite database which includes some additional time than just the 60 days I focused on in this post.

Wednesday, August 30, 2023

See Random Wikipedia Articles

https://wetzel.dev/tools/wikipedia.html

I made this page that shows random pages from Wikipedia.  You may know Wikipedia has a random article link already, which I've always liked.  But my version has two advantages:

  1. It shows quick summaries of articles and an image for each article so you can quickly skim through them.
  2. It allows you to set a popularity filter.  This uses the number of inbound links to that page as a proxy for how popular it is, so you can filter out huge pool of random-town-in-Wales articles.

The second point is the real reason I made it.  The actual random article link just gives so much uninteresting stubs.  I feel like using number of inbound links works well for only seeing somewhat meaningful articles.

Monday, July 31, 2023

Enshittification

https://pluralistic.net/2023/01/21/potemkin-ai/

When a platform starts, it needs users, so it makes itself valuable to users. Think of Amazon: for many years, it operated at a loss, using its access to the capital markets to subsidize everything you bought. It sold goods below cost and shipped them below cost. It operated a clean and useful search. If you searched for a product, Amazon tried its damndest to put it at the top of the search results.

This was a hell of a good deal for Amazon's customers. Lots of us piled in, and lots of brick-and-mortar retailers withered and died, making it hard to go elsewhere. Amazon sold us ebooks and audiobooks that were permanently locked to its platform with DRM, so that every dollar we spent on media was a dollar we'd have to give up if we deleted Amazon and its apps. And Amazon sold us Prime, getting us to pre-pay for a year's worth of shipping. Prime customers start their shopping on Amazon, and 90% of the time, they don't search anywhere else.

That tempted in lots of business customers – Marketplace sellers who turned Amazon into the "everything store" it had promised from the beginning. As these sellers piled in, Amazon shifted to subsidizing suppliers. Kindle and Audible creators got generous packages. Marketplace sellers reached huge audiences and Amazon took low commissions from them.

This strategy meant that it became progressively harder for shoppers to find things anywhere except Amazon, which meant that they only searched on Amazon, which meant that sellers had to sell on Amazon.

That's when Amazon started to harvest the surplus from its business customers and send it to Amazon's shareholders. Today, Marketplace sellers are handing 45%+ of the sale price to Amazon in junk fees. The company's $31b "advertising" program is really a payola scheme that pits sellers against each other, forcing them to bid on the chance to be at the top of your search.

Searching Amazon doesn't produce a list of the products that most closely match your search, it brings up a list of products whose sellers have paid the most to be at the top of that search. Those fees are built into the cost you pay for the product, and Amazon's "Most Favored Nation" requirement sellers means that they can't sell more cheaply elsewhere, so Amazon has driven prices at every retailer.

 

Wednesday, June 14, 2023

The time adding a limit to a Postgres query made it slower

This is something I ran into at work.  I wouldn't usually post stuff like this here, but I thought it was pretty interesting.  It involves optimizations to SQL queries, so best to stop reading here if that's not something you're interested in.  I've cleaned up the example below and made it generic.


Out database is Postgres 13 and we have two tables.  One is named offers which has a primary key of offer_id and a foreign key column named plan_id.  The other table is named plans and has a plan_id primary key and then a column named plan_name.  Many offers can have the same one plan_id.  There are indexes on the primary and foreign key columns.  We want to find all the offers that have a particular plan_name.


This was the starting query:
select * from offers where plan_id in (
  select plan_id from plans where plan_name = 'Example Plan'
)
order by offer_id desc
limit 5


It took about 200 seconds to run.

If I removed the order by and ran it again, it only took 14 ms for me.  So far so good, sorting can be slow, although offer_id is the primary key, and has an index, so sorting by it shouldn't be that slow.

Here's where things get interesting though.  If I removed the limit, and keep the order by:
select * from offers where plan_id in (
  select plan_id from plans where plan_name = 'Example Plan'
)
order by offer_id desc

That only takes 323 ms.

To be clear: Adding a limit to this query causes it to go from less than a second to 3 minutes!

It turns out this is a thing.  If you google "Postgres query slower with limit" you'll find lots of confused Stack Overflow posts.  The problem is when Postgres's statistics are wrong and the query planner assumes the limit will make one approach faster, but actually there are far less rows meeting your criteria than it thought, and it takes much longer.

In this case, Postgres thinks "oh I have an index on offer_id, let me just sort by that column and scan through all those rows until I find 5 rows that match that plan_id filter, that'll be fast cause there's plenty of rows that match that filter and I need to sort them anyway, so I'm accomplishing 2 things with one sort" (Postgres has a problem with run on sentences).  However, Postgres only thinks that's a good plan because it misestimated how many rows match the filter, thinking there are thousands when there are only a dozen or so.  So it wastes a lot of time scanning through rows in reverse chronological order until it finds 5.  A better plan is to use the index on the plan_id to find all the rows that match that, and then sort those and take the newest 5.  But that is only a better plan because we know how few rows actually match the filter, and Postgres has a bad estimate for that.

The "solution" is to force Postgres to not use that plan.  Postgres does not have query plan hints, so we have to use hacks to force Postgres to not take the slow approach.  There are various suggestions to use subqueries or CTEs and put the order by in the subquery, and the limit on the outer query.  So something like this
with all_results as (
  select * from offers
  join plans using (plan_id)
  where plan_name = 'Example Plan'
  order by offer_id desc
)
select * from all_results
limit 5

But it seems Postgres 13 is smarter than that, and still optimizes the combined query.  It does seem a bit faster than the first query, but still took 2 minutes to run.

However, a much hackier solution, which also seems to be fastest and simplest to understand is to "modify" the column you are ordering on so that it can't use the index.  So bizarrely, this query
select * from offers where plan_id in (
  select plan_id from plans where plan_name = 'Example Plan'
)
order by offer_id + 0 desc
limit 5

Ran in 272 ms.  Removing the + 0 causes it to go back to 3+ minutes.

This article goes into more detail, and is where I got the + 0 idea came from.

I should also mention that the "real" solution to this is to increase the amount of statistics on the columns Postgres is having a hard time estimating, but I tested this and it didn't seem to fix this case.  Removing the index on offer_id would also work, although that's not an option here with it being the primary key, it also feels like a bad idea to remove the index entirely rather than just forcing Postgres not to use it on a particular query where we know it's a bad idea.

Tuesday, May 23, 2023

Whole home energy monitoring with IoTaWatt

I recently installed an IoTaWatt open source whole home energy monitor, and wanted to summarize my experience and what I learned about the install.

Full disclosure, after I had written this post, but hadn't posted it yet the created of IoTaWatt announced he was stopping production, but continuing to support existing hardware, and working on a new ESP32 based version that he plans to make available as a DIY project.

What is a whole home energy monitor?

A whole home energy monitor is a device that measures all the electrical energy your house uses, both total usage and then usage on individual circuits.  You may have heard of Sense, which is the most well know of these.  Unfortunately, Sense fails my most fundamental rule of smart home tech: it's cloud based.  Which means both that your data will eventually be sold off or leak, and that the device will eventually stop working when the company shuts down the servers.  That's why I insist upon fully local smart home devices.

Why would I want one?

Like most things I do, the answer is data.  Knowing how much energy your house is using can help identify trends and highlight when new devices use a lot of power at idle.  That can help you save money, but realistically you're not going to save enough money to offset the few hundred you spend on the device.  Instead, knowing when circuits are drawing power and how much can help drive automations.

Here are some examples of things I'm detecting or have plans to set up:

  • Know when the hardwired dishwasher is running/done
  • Know when bathroom circuits have a large draw on them (straightener left on)
  • Detect when fridge usage is high for a long time (either door left open, or fridge failing)

There are more examples where I'm instead using a Zigbee smart outlet with energy monitoring to measure just that device (washer, dryer, sump pump).  The IoTaWatt can't replace those smart plugs entirely, as they allow me to measure just one device on a circuit that can have multiple devices on it.  But it is great for measuring the energy usage for either hardwired devices or when you do want an entire circuit at once and can't rely on a smart plug always being used (like the bathrooms example).

A line graph of several circuits being monitored over a day

How does the IoTaWatt work?

All these devices work the same way.  They use Current Transformers (CT) which clip around one of the two wires in a circuit (so either hot or neutral, but not both), and measure the strength of the magnetic field, which increases as the current increases.  There is a base device which you plug in these CTs to, and you tell it what circuit each CT is monitoring and then that reports that usage back to you.

IoTaWatt pros and cons

If you google "best whole home energy monitor" and click on a bunch of lists, you won't even find any the mention IoTaWatt.  I'm sure it comes as no surprise to anyone reading this that I'm using the hipster whole home energy monitor.  I think the main reasons IoTaWatt isn't more popular is that it's more of a DIY solution, as well as the fact that it's a one man operation with 0 SEO or marketing.  That being said, I consider both of those pros.  Here are the pros and cons as I see them, starting with the cons
Cons

  • Less user friendly to set up.
  • Less polished interface.
  • Limited to monitoring 14 circuits, of which 2 are typically your mains, so you really have 12 circuits you can monitor.
  • Must be connected to via WiFi and the antenna is quite weak, will likely require a closer AP/repeater.
  • Not the cheapest option.

Pros

  • Fully local means it will work unless there is a hardware failure.
  • No cloud server to be shut down, and no company to decide they want to start charging a subscription.
  • Made by a guy in New Hampshire, who basically lives on the forums (the fried egg profile) and will provide unlimited, detailed, support.  Often pays for shipping units back to him if he can't figure out what is wrong with it remotely.
  • Very flexible in install and configuration. 

Of the cons, the only big one to me is the 14 (or really 12) circuit limit.  That being said, the bulk of the rest of this post will go into detail on how to mitigate that limit.

Combining circuits

The TL;DR of this is that there are several ways you can combine multiple circuits on one input.  The downside is that you lose the ability to differentiate the circuits you combine, but you often don't really care if you combine them in logical ways.  For example, if you have several bathrooms each with a dedicated circuit, you can combine all of those into one common bathroom input.  You'll lose the ability to know which bathroom the power is being used in, but for most purposes just knowing "all the bathrooms combined are using x watts" is good enough.

How to combine circuits

There are 3 main ways to combine circuits:

  1. Combine them in software
  2. Combine them using a splitter
  3. Put multiple wires in one CT

#1 is the easiest by far, but it doesn't actually save you an input.  You just tell the IoTaWatt to combine inputs 1 and 2 (or whatever).  Not terribly useful, but can be nice to test combining circuits before you try the other two methods out.

#2 is interesting.  It turns out that the CTs just have standard 1/8" stereo headphones plugs on the end of them.  That means you can combine multiple CTs into one input by just using regular, cheap, headphone splitters.  There are a few caveats there, like you need to use identical CTs (size and brand), and each of your CTs need to be sized to handle the combined input of all circuits you are combining.  So, if you are combining two 20 amp circuits, you can't use two 20 amp CTs, you'd have to use two 50 amp CTs since the total power could be 40 amps.  The nice thing about this method is you can combine circuits without opening your service panel again.  You just unplug the CTs from the main IoTaWatt body, plug them both into a splitter, and then plug that splitter into one of the inputs on the IoTaWatt (and tell the software about the new configuration).  You will have to open the service panel to add a new CT if you want to actually use your newly freed up input, but you can combine several and test how they work together over weeks.

#3 has the advantage of being free, but requires the most planning.  You can simply pass multiple wires through the same CT and their current will be summed.  However, they must either be the same phase, or they must be passed through in opposite directions.  Another practical constraint is that the wires need to be physically near each other in the panel.  If you have a panel laid out in an organized way, without much slack, what this means in practice, is that you can only combine circuits in this way if they are on the same side of your panel (left vs right) and the same phase (alternating rows).

This picture from the IoTaWatt docs demonstrates how you can measure two circuits with different phases, by passing one through in the opposite direction.  I don't have slack on any of my circuits to do this kind of thing, but perhaps you do.  Extending circuits inside your panel is a good use for the new Wago inline connectors.

A circuit breaks with 2 wires leaving it looping through a CT in opposite directions

One more note about this picture, is that it's showing a 240V circuit, which uses both phases and so will always have two out of phase circuits that you likely want to combine.  If it is a "pure 240V" circuit then you can just measure one of them and double it in software (which I do for my AC compressor).  However, some 240V circuits also have 120V available, which means there will be one phase with more current than the other, and just doubling will not be accurate, although may be good enough still since the 120V current draw is likely to be quite small.  You can tell if a circuit is pure 240V or 240V plus 120V based on if it has a neutral running to it, or just two hots.  So, 2 wire (plus ground) = pure 240V, but 3 wire (plus ground) = 240V and 120V.

Other thoughts on making 12 inputs work

By combining inputs with either headphone splitters or multiple wires inside one CT you can monitor any number of circuits, at the cost of a loss insight into the individual circuits.  Some other options you have are to combine your two mains, using either of the options just discussed, or to not monitor the mains at all.  There is no reason the IoTaWatt needs to monitor the mains, so if those inputs would be more valuable to you monitoring branch circuits that is an option.  Measuring the mains though allows you to produce an "other" output though.  You can add your two mains together and then subtract each branch circuit input from it.  What is left is all the power your house is using that isn't monitoring by any of your inputs.  That's what I did and my "other" output is quite boring (which is the goal).  I have a roughly constant 14 watt draw on other, and then it spikes when the basement lights go on (which I can monitor being on via a lux meter in the basement).

My install

This is a bit out of order now, but I can't be bothered to copy and paste this section before the previous stuff.

The before state of my service panel

I started with a clean panel which I had rewired when I moved in.  I made a spread sheet of all the breakers, what they had on them, and which physical location and phase they were on, so I could plan which might be combined in one CT.  The IoTaWatt requires two plugs, one for power, and one to use as a reference voltage.  Ideally this is a dedicated circuit so there is less voltage fluctuation as things turn on and off.  I added a dedicated 20 amp outlet right next to my service panel for the IoTaWatt.

I then opened my panel, and began attaching the CTs to the circuits I wanted to measure.  Combining the ones that made sense.  I haven't mentioned this yet, but even for just single wires in a CT the direction of the wire needs to match the phase, or else it will be negative.  Luckily if you mess this up you can just tell the IoTaWatt to flip the value in software.  Still, I put some thought into this at attempted to correctly attach all the CTs in the right orientation from the start. 

Several CTs attached to the hot wires inside my service panel

I knocked out a 1 and 1/4" knockout on the bottom left of the service panel, and put a threaded conduit terminal adapter there to protect the CT wires.  I still don't have the wires in conduit, but when I do that I won't have to open the panel again because I can just unplug the CTs from the IoTaWatt and run them through the conduit and then replug them in.  The important thing is having the terminal piece there.  The 1 and 1/4" conduit is more than enough for the 15 CT wires I have now, 1" may have worked too, but there didn't seem to be much downside to going up to 1 and 1/4" since the knockout I was using went up to that anyway.

 

My CT wires entering the service panel through a conduit terminal

I should mention that I'm lucky to have a disconnect outside my house by the meter.  When I disconnect that there is no power inside my house, including the service panel.  Normally even with your main break off you still have power where your mains come in.  You can decide if you're comfortable with that, but if not an electrician should be able to install this pretty easily.

My opened service panel with all the CTs added

Another tip, which you can see in the below pic is to label all the CT wires, ideally while you still have the panel opened and as you run each one.  Personally, I just used the notion of which number breaker each was on, using + or , when there were multiple ones.

My closed service pane with the IoTaWatt installed

 

The finished product is bit messy, but A. it's a basement, and B. I can easily run conduit to a box and install everything in there without opening the panel again.