Friday, July 23, 2021

Zip - How not to design a file format.

 https://news.ycombinator.com/item?id=27925393

> How do you read a zip file?

> This is undefined by the spec.

> There are 2 obvious ways.

> 1. Scan from the front, when you see an id for a record do the appropriate thing.

> 2. Scan from the back, find the end-of-central-directory-record and then use it to read through the central directory, only looking at things the central directory references.

I was recently bitten by this at work. I got a zip from someone and couldn't find inside the files that were supposed to be there. I asked a colleague, and they sent me a screenshot showing that the files were there, and that they didn't see the set of files that I saw. I listed the content of the zip using the "unzip -l" command. They used the engrampa GUI. At that point I looked at the hexdump of the file. What caught my eye was that I saw the zip magic number near the end of the zip, which was odd. The magic number was also present at the beginning of the file. At this point I suspected that someone used cat(1) to concatenate two zips together. I checked it with dd(1), extracting the sequence of bytes before the second occurrence of the zip magic number and the remainder into two separate files. And sure enough at that point both "unzip -l" and "engrampa" showed the same set of files, and both could show both zips correctly. Turns out engrampa was reading the file forwards, whereas unzip was reading the file backwards. 


No comments:

Post a Comment