UUIDs solved a problem nobody thinks about: how do two systems, never having spoken, agree on a unique identifier without coordinating? But not every UUID is built for every job. Pick the wrong version and your database performance can tank by a factor of ten. This is the honest guide.
UUID versions explained
A UUID is 128 bits — 16 bytes — usually written as 32 lowercase hex characters split into 8-4-4-4-12 groups. Inside that fixed layout, RFC 9562 (the May 2024 spec that replaces the older RFC 4122) defines eight version families that differ in how the bits are filled.
- v1 — timestamp + node. The original. Combines a 60-bit timestamp with a 48-bit node identifier. In the wild, that node was often a MAC address, which leaked machine identity into every record. RFC 4122 §4.5 permits a random multicast-flagged node instead. Our generator only does the random path — we never read your MAC.
- v3 and v5 — name-based. Deterministic hashes of a namespace + name (v3 uses MD5, v5 uses SHA-1). Same input always yields the same UUID. Useful when you need a stable identifier from a stable string.
- v4 — random. 122 bits of cryptographic randomness with the version and variant nibbles fixed. This is what most people mean when they say "UUID". Excellent collision properties, terrible insert performance on B-tree indexes.
- v6 — reordered v1. Same fields as v1 but with the timestamp parts shuffled into big-endian order so the result is sortable. Rare in practice; v7 is the better-known successor.
- v7 — Unix time + random. 48 bits of millisecond Unix epoch up front, 74 bits of random tail. Lexicographic sort order matches insertion time order. The new default for database primary keys.
- v8 — custom. Reserved for application-specific layouts; we don't ship it.
When to use v4 versus v7
The decision is almost always about whether the identifier will ever be used as a database primary key.
Pick v4 when: the UUID identifies API tokens, idempotency keys, session handles, message envelopes, distributed event IDs, or anything that lives in a hash-table- shaped store. Random distribution is exactly what those stores want.
Pick v7 when: the UUID will be the clustering key in a B-tree index — Postgres, MySQL InnoDB, SQL Server. K-sortable identifiers turn inserts into appends, which is the access pattern B-trees were designed for.
The k-sortable advantage of v7
A B-tree index keeps its leaves sorted by key. When you insert a v4 UUID, the engine has to find the right leaf at a random position in the tree, possibly splitting it because there's no room. Page splits write twice as many pages, fragment the index, and ruin the read-after-write cache locality.
A v7 UUID is monotone within the resolution of a millisecond. New inserts almost always land in the same right-most leaf as the previous insert. No page split, no fragmentation, no cache miss.
Independent benchmarks on PostgreSQL — published in 2024 and 2025 — quantify the gap: inserting 50 million rows with v4 primary keys took roughly 20 minutes, while v7 finished the same load in about 1 minute 46 seconds. The v4 index ended up 24% larger and took eleven times longer to rebuild from scratch. Leaf-page density landed at 89.98% for v7 versus 71% for v4. PostgreSQL 18, released in 2025, ships a native uuidv7() generator function partly because of these numbers.
The effect compounds on workloads that read what they just wrote. A logging table, an outbox pattern, or a "show me the last 100 events" query all want the recent rows close together in the index. With v4 those rows are scattered across the index by the random high bits; with v7 they live in the same page, often still warm in shared buffers from the insert that put them there a few milliseconds earlier.
Anatomy of a UUID
The five segments of the canonical form aren't arbitrary. Each one carries different bits depending on the version:
xxxxxxxx - xxxx - Mxxx - Nxxx - xxxxxxxxxxxx
|------| |--| |--| |--| |---------|
seg 1 seg 2 seg 3 seg 4 seg 5
8 hex 4 hex 4 hex 4 hex 12 hex
^
| M = version nibble (1, 4, 7…)
^
| N = variant high bits
| 8/9/a/b = RFC variantFor v4, segments 1–2–5 plus the tail of 3–4 are all random. For v1, segment 1 is time_low, segment 2 is time_mid, segment 3 carries the version nibble and the high 12 bits of the timestamp, segment 4 holds the variant and a 14-bit clock sequence, and segment 5 is the 48-bit node. For v7, segments 1–2 are the most significant 32 + 16 = 48 bits of the Unix millisecond timestamp; segment 3 packs the version nibble and the first 12 bits of randomness; segment 4 holds the variant and the top of the remaining 62-bit random tail; segment 5 is the rest of that tail.
Collision probability
A v4 UUID has 122 random bits — about 5.3 × 1036 distinct values. The birthday bound says you'd need to generate roughly 2.71 × 1018 UUIDs before there's a 50% chance of a collision. Practical phrasing: if you generated one billion v4 UUIDs per second, you'd hit that 50% mark after 86 years. For most engineering decisions, that's the definition of "safe".
v7 has fewer random bits per identifier (74 instead of 122), but the timestamp prefix partitions the space — two v7 UUIDs only collide if they share the millisecond AND collide in the 74-bit tail. The practical safety margin is the same.
When NOT to use UUIDs
- Public-facing URLs. A 36-character identifier in the address bar is ugly and discouraging to share. Consider
nanoid(21 characters of URL-safe random) or a slug derived from content. - Human-readable references. "Order #12847" beats "Order 550e8400-e29b-41d4-a716-446655440000" at the help desk. Auto-increment surrogate keys still have a place.
- Tightly-packed compound keys. 16 bytes is a lot when you have 600 million rows and want the entire index in shared buffer. Compact alternatives (8-byte Snowflake IDs, for example) can be worth the coordination cost.
- "Anonymous" identifiers that aren't. If you accidentally embed information that lets an attacker enumerate accounts, the UUID's randomness saves you nothing. Random doesn't equal private.

