Tuesday, April 3, 2007

java.io.StreamCorruptedException message = invalid stream header: The problem is solved

A short history of the problem: our VoIP solution is installed on some servers and some Java object are serialized and the serialized object is stored to database (in mysql TEXT field). When it is needed, the text filed is read from database, and when we are trying to deserialize it, this exception is thrown:

java.io.StreamCorruptedException: message = invalid stream header
java.io.ObjectInputStream::readStreamHeader [ObjectInputStream.java:764]
java.io.ObjectInputStream:: [ObjectInputStream.java:277]
com.ttechgroup.loggers.app.Logger::_loadSerializedObjects [Logger.java:389]
com.ttechgroup.loggers.app.Logger::getSerializable [Logger.java:199]
com.ttechgroup.loggers.SystemRestarterLogger::getCause
[SystemRestarterLogger.java:59]



My statistics are that on 50% of servers with our application, this exeption is never thrown, but on the rest, it is always thrown by my function _loadSerializedObjects.
Problem is that the bute block where is stored serialized objects, starts with 2 magic numbers (they are the header of invalid stream header)  with values > 127, so they are not standart ASCII values. Our database and database connector are set-uped to use utf-8 but the problem is in convertions of String => byte[] =>String (mey be). Here is a solution with using Base64 encoding:


A very useful article is http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4968673 Here is my "cache" of this article:

The provided test code serializes an object to a ByteArrayOutputStream, converts the generated byte array into a string using the ByteArrayOutputStream.toString() method, converts the string back into a byte array using the String.getBytes() method, and then attempts to deserialize the object from the byte array using a ByteArrayInputStream. This procedure will in most cases fail because of the transformations that take place within ByteArrayOutputStream.toString() and String.getBytes(): in order to convert the contained sequence of bytes into a string, ByteArrayOutputStream.toString() decodes the bytes according to the default charset in effect; similarly, in order to convert the string back into a sequence of bytes, String.getBytes() encodes the characters according to the default charset.

Converting bytes into characters and back again according to a given charset is generally not an identity-preserving operation. As the javadoc for the String(byte[], int, int) constructor (which is called by ByteArrayOutputStream.toString()) states, "the behavior ... when the given bytes are not valid in the default charset is unspecified". In the test case provided, the first two bytes of the serialization stream, 0xac and 0xed (see java.io.ObjectStreamConstants.STREAM_MAGIC), both get mapped to the character '?' since they are not valid in the default charset (ISO646-US in the JDK I'm running). The two '?' characters are then mapped back to the byte sequence 0x3f 0x3f in the reconstructed data stream, which do not constitute a valid header.

The solution, from the perspective of the test case, is to use ByteArrayOutputStream.toByteArray() instead of toString(), which will yield the raw byte sequence; this can then be fed directly to the ByteArrayInputStream(byte[]) constructor.


The solution

I use convertion to Base64 format before writing everything to the database. The cons of this method is that standart format of charset in database is UTF-8 or LATIN-1, 8 bit formats, so in our case we use 6 bits (64 = 2^6) so this is 33% less storage efficency, but this is most portable solution and if the space is not crytical, you can use it too!

For this I use slightly modified version of Base64InputStream and Base64OutputStream classes of package org.mozilla.jss.util. It is modified only for disatatching Asserts (and may be other little things) of no importance in this case. Here they are  Base64InputStream.java and Base64OutputStream.java

The change of source code:
Here are my changes of the source code for the
loadSerializedObjects-change.JPG
setSerializableObjectParams-change.JPG


Thanks to my boss Assen Stoyanov (softa) from TTechGroup for the idea of the solution!
If you have any difficulties with the solution, please write a comments here!