#ifndef EXAMPLE_H_123e4567_e89b_12d3_a456_426614174000
#define EXAMPLE_H_123e4567_e89b_12d3_a456_426614174000
Problem I have with UUID is that there is a lot of wasted bytes if you don’t need that level or randomness. Take an online game server. If you send it as the normal textual representation then you will be transmitting 36 bytes. Even if you send it as binary then you will be sending 16 identification bytes with every packet. On a fast paced twitch game then that will mount up to a lot of bytes, especially if you are sending every 20ms. A lot of bits go into recording the time since 1582 in 100 nanosecond increments. Also more bits are spent recording the MAC address (in some variants) to further differentiate the value. So maybe the UUID is not for us. What about other algorithms that are out there?
One is the Snowflake Id created by Twitter and is used for the IDs of tweets. This is 8 bytes in length to half that of the UUID or if you send it via hexadecimal text then it would be 16. Like UUID it has a component of time since an epoch (41 bits), but this has the resolution milliseconds. It too has a component for the machine id and several bits for sequence meaning it can handle 4096 items per millisecond.
Which maybe enough for our needs. But can we reduce the size? If we refine our use case we can look for optimisations. Say we contact a server with a all zero representation, the server will then assign us a value which becomes our identifier. We could just have an ever increasing number but we want something fairly non-guessable as a first level defence (remember obfuscation is not security!). With this knowledge we can drop the need for data that defines the machine. Lets just say that the server hits come in waves as new lobbies open.
By varying we the resolution of the time and the sequence we can determine the effect. If we have 1 second resolution from an epoch and reserve 8 bits for sequencing, if we have a 10,000 people all join at the same time then it will take nearly 40 second to assign all the id’s. If we save 10 bits then reduces to under 10 seconds. A further 2 bits will reduce that to 2.5 seconds, which maybe acceptable but do we have the bits to spare?
So let’s see if we do: The standard Unix Epoch is January 1, 1970. At the moment of writing 1680009514 seconds have elapsed. Splitting it up into more readable format we get 1,680,009,514 or just over a billion and half seconds. It doesn’t take a maths major to see that we need 31 bits to store this value. How about we shift the epoch? The dropping the billion would give us a wrap around date of 2033 which may be enough, but with some games being around a long, long time (Counterstrike, World of Warcraft, etc.) then it may not be enough. If we shift the epoch to 1.5 billion seconds after the Unix Epoch that gives us 168+ million to store. That takes our requirements to 28 bits. That only leaves us we 4 bits for sequencing (if we are aiming for 32 bits) and a wraparound date of 2026.
So we reduce further: 1.6 billion seconds after the epoch give us 27 bits which puts the wrap around date to December of next year. Then next thing is to drop the 1.68 billion. That funnily enough puts the epoch to about 3 hours ago (at time of writing). The puts the requirement to store 8 digits which can be done in 26 bits. That extends wraparound to just 2025 which is not enough.
All in all, you can’t really get a unique time based identifier to 32 bytes without some serious limitations.
So what can we do? Increase the bits. 36 bits would be enough to store the epoch, give a massive wrap around date and 8 bits of sequencing which may or may not be enough. Also accessing 8 bytes may dramatically increase access time and space due to padding and cache lines. Remember 40 seconds to service 10,000 users?!
By using 48 bits of storage you can store a fairly recent epoch (2015) to millisecond accuracy with 41 bits of time (wrap around occurs in about 1000 years) with 7 bits for sequence which means we could generate 128 ids every millisecond. That means our 10,000 users would be serviced in 78 milliseconds.
So while you can reduce the bytes used for identification you have to know your use case and tailor it to it.
Be the first to comment