jazzlib – an alternative for reading ZIP files in Java

Java had zip-reading capabilities for a long time, naturally because `jar` files are simply compressed zip files with some meta data. The needed classes reside in the `java.util.zip` namespace and are `ZipInputStream` and `ZipEntry`.

Recently, however, `ZipInputStream` gave me a huge headache. My use case was as simple as

* read the zip entries of a list of zip files (each varying in size, but usually around 20MB)
* skip to the zip entry that has a certain name (a single text file with only two bytes of contents)
* read the contents of this zip entry and close the zip

Doing this for about 25 files took my Pentium D (2GHz) with 3GB of RAM roughly **20 seconds**. Wow, 20 seconds really? I created a test case and profiled the code in question separately with [YourKit](http://www.yourkit.com) (which is a really great tool, by the way!):

It got stuck quite a bit in `java.util.zip.Inflater.inflateBytes` – but that seemed to use native code, so I couldn’t profile any further.

So I went on and searched for an alternative of `java.util.zip` – and luckily I found one with [jazzlib](http://jazzlib.sourceforge.net), which provides a pure Java implementation for ZIP compression and decompression. This library is GPL-licensed (with a small exception clause to prevent the pervasiveness of the GPL) and comes in two versions, one that duplicates the single library classes underknees `java.util.zip` (as a drop-in replacement for JDK versions where this is missing) and one that comes in its own namespace, `net.sf.jazzlib`.

After I went for the second version, I restarted my test and it only took about **7 seconds** this time. At first I thought that there must be some downside to this approach, so I checked the timings for a complete decompression of the archive, but the timings here were on par with the ones from `java.util.zip` (roughly 5 seconds for a single 20MB file).

I haven’t tested compression speed, because it doesn’t matter much for my use case, but the decompression speed alone is astonishing. I wonder why nobody else stumbled upon these performance problems before…