In addition, all the selected base64 characters are URL-safe.
Example:
This UUID : 7e47c34a-eebc-4387-b5a4-c6b558bdc407
is compressed down to this: 35Hw0ruvEOHbWkxrVYvcQH
public class KeyGen { private KeyGen() { } // constructor // base64url, see: http://tools.ietf.org/html/rfc4648 section 5 private static String chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_"; /** * Generates a UUID and compresses it into a base 64 character string; this * results in a 22 character string and since each character represents 6 bits * of data that means the result can represent up to 132 bits. However, since * a UUID is only 128 bits, 4 additional randomize bits are inserted into the * result (if desired); this means that the number of available unique IDs is * increased by a factor of 16 * * @param enhanced specifies whether or not to enhance the result with 4 * additional bits of data since a 22 base64 characters * can hold 132 bits of data and a UUID is only 128 bits * @return a 22 character string where each character is from the file and url safe * base64 character set [A-Za-z0-9-_] */ public static String getCompressedUuid(boolean enhanced) { UUID uuid = UUID.randomUUID(); return compressLong(uuid.getMostSignificantBits(), enhanced) + compressLong(uuid.getLeastSignificantBits(), enhanced); } // getCompressedUuid() // compress a 64 bit number into 11 6-bit characters private static String compressLong(long key, boolean enhance) { // randomize 2 bits as a prefix for the leftmost character which would // otherwise only have 4 bits of data in the 6 bits long prefix = enhance ? (long)(Math.random() * 4) << 62 : 0; // extract the first 6-bit character from the key String result = "" + chars.charAt((int)(key & 0x3f)); // shifting in 2 extra random bits since we have the room key = ((key >>> 2) | prefix) >>> 4; // iterate thru the next 10 characters for (int i = 1; i < 11; i++) { // strip off the last 6 bits from the key, look up the matching character // and prepend that character to the result result = chars.charAt((int)(key & 0x3f)) + result; // logical bit shift right so we can isolate the next 6 bits key = key >>> 6; } return result; } // compressLong() } // class KeyGen
2 comments:
Good Job!!
I'm confused. The compression purpose is to reduce the number of bits. However, you have GUID with 128bit, after compression you obtain 132 bits, so 4 more bits. I can see the advantage of increasing the number of GUID but What about the compression? It seems like it doesn't help for compression. Am I right?
Hi Dominique, that's a great observation and question.
When you compress the character representation of the GUID down using a base 64 encoding you can get it down to 22 characters. You could then expand that 22 chars back into a 128 bit GUID by reversing the process (you have to drop the last 4 bits, which could have been filled in with meaningless data).
But, since a base 64 character requires 6 bits to uniquely represent it, 22 characters can effectively hold 132 bits of numeric data. So, the 22 character base 64 string has 4 extra bits.
If you don't plan to reconstitute those 22 chars into a 128 bit number you can use those 4 extra bits to increase the number of unique 22 character strings by a factor of 16. If you don't use those extra bits you can just fill them in with zeros.
Look at it this way...
32 characters = 256 bits of character data (8 bits per character)
But, only half of those bits are used to represent the 128 bit GUID. Each 4 bits of numeric data in the GUID is expanded into 8 bits of character data in the string.
Likewise, 22 characters = 132 bits of character data. Of that, with a base 64 encoding, 128 bits of that represents the numeric data. In that very last 6 bit base 64 character only 2 of its bits are being used. There are 4 bits available for actual data.
Putting those 4 bits to work by filling them with random data instead of all zeros lets you increase the number of unique 22 character strings by a factor of 16.
I hope I was more articulate with this response. Let me know if I didn't clear that up.
Thanks.
Russ
Post a Comment