In addition, all the selected base64 characters are URL-safe.
Example:
This UUID : 7e47c34a-eebc-4387-b5a4-c6b558bdc407
is compressed down to this: 35Hw0ruvEOHbWkxrVYvcQH
public class KeyGen {
private KeyGen() {
} // constructor
// base64url, see: http://tools.ietf.org/html/rfc4648 section 5
private static String chars
= "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_";
/**
* Generates a UUID and compresses it into a base 64 character string; this
* results in a 22 character string and since each character represents 6 bits
* of data that means the result can represent up to 132 bits. However, since
* a UUID is only 128 bits, 4 additional randomize bits are inserted into the
* result (if desired); this means that the number of available unique IDs is
* increased by a factor of 16
*
* @param enhanced specifies whether or not to enhance the result with 4
* additional bits of data since a 22 base64 characters
* can hold 132 bits of data and a UUID is only 128 bits
* @return a 22 character string where each character is from the file and url safe
* base64 character set [A-Za-z0-9-_]
*/
public static String getCompressedUuid(boolean enhanced) {
UUID uuid = UUID.randomUUID();
return compressLong(uuid.getMostSignificantBits(), enhanced)
+ compressLong(uuid.getLeastSignificantBits(), enhanced);
} // getCompressedUuid()
// compress a 64 bit number into 11 6-bit characters
private static String compressLong(long key, boolean enhance) {
// randomize 2 bits as a prefix for the leftmost character which would
// otherwise only have 4 bits of data in the 6 bits
long prefix = enhance ? (long)(Math.random() * 4) << 62 : 0;
// extract the first 6-bit character from the key
String result = "" + chars.charAt((int)(key & 0x3f));
// shifting in 2 extra random bits since we have the room
key = ((key >>> 2) | prefix) >>> 4;
// iterate thru the next 10 characters
for (int i = 1; i < 11; i++) {
// strip off the last 6 bits from the key, look up the matching character
// and prepend that character to the result
result = chars.charAt((int)(key & 0x3f)) + result;
// logical bit shift right so we can isolate the next 6 bits
key = key >>> 6;
}
return result;
} // compressLong()
} // class KeyGen
Good Job!!
ReplyDeleteI'm confused. The compression purpose is to reduce the number of bits. However, you have GUID with 128bit, after compression you obtain 132 bits, so 4 more bits. I can see the advantage of increasing the number of GUID but What about the compression? It seems like it doesn't help for compression. Am I right?
Hi Dominique, that's a great observation and question.
ReplyDeleteWhen you compress the character representation of the GUID down using a base 64 encoding you can get it down to 22 characters. You could then expand that 22 chars back into a 128 bit GUID by reversing the process (you have to drop the last 4 bits, which could have been filled in with meaningless data).
But, since a base 64 character requires 6 bits to uniquely represent it, 22 characters can effectively hold 132 bits of numeric data. So, the 22 character base 64 string has 4 extra bits.
If you don't plan to reconstitute those 22 chars into a 128 bit number you can use those 4 extra bits to increase the number of unique 22 character strings by a factor of 16. If you don't use those extra bits you can just fill them in with zeros.
Look at it this way...
32 characters = 256 bits of character data (8 bits per character)
But, only half of those bits are used to represent the 128 bit GUID. Each 4 bits of numeric data in the GUID is expanded into 8 bits of character data in the string.
Likewise, 22 characters = 132 bits of character data. Of that, with a base 64 encoding, 128 bits of that represents the numeric data. In that very last 6 bit base 64 character only 2 of its bits are being used. There are 4 bits available for actual data.
Putting those 4 bits to work by filling them with random data instead of all zeros lets you increase the number of unique 22 character strings by a factor of 16.
I hope I was more articulate with this response. Let me know if I didn't clear that up.
Thanks.
Russ