Java had zip-reading capabilities for a long time, naturally because
jar files are simply compressed zip files with some meta data. The needed classes reside in the
java.util.zip namespace and are
ZipInputStream gave me a huge headache. My use case was as simple as
- read the zip entries of a list of zip files (each varying in size, but usually around 20MB)
- skip to the zip entry that has a certain name (a single text file with only two bytes of contents)
- read the contents of this zip entry and close the zip
Doing this for about 25 files took my Pentium D (2GHz) with 3GB of RAM roughly 20 seconds. Wow, 20 seconds really? I created a test case and profiled the code in question separately with YourKit (which is a really great tool, by the way!):
It got stuck quite a bit in
java.util.zip.Inflater.inflateBytes – but that seemed to use native code, so I couldn’t profile any further.
So I went on and searched for an alternative of
java.util.zip – and luckily I found one with jazzlib, which provides a pure Java implementation for ZIP compression and decompression. This library is GPL-licensed (with a small exception clause to prevent the pervasiveness of the GPL) and comes in two versions, one that duplicates the single library classes underknees
java.util.zip (as a drop-in replacement for JDK versions where this is missing) and one that comes in its own namespace,
After I went for the second version, I restarted my test and it only took about 7 seconds this time. At first I thought that there must be some downside to this approach, so I checked the timings for a complete decompression of the archive, but the timings here were on par with the ones from
java.util.zip (roughly 5 seconds for a single 20MB file).
I haven’t tested compression speed, because it doesn’t matter much for my use case, but the decompression speed alone is astonishing. I wonder why nobody else stumbled upon these performance problems before…