Data integrity isn’t just theory—it’s the difference between a cheeseburger and a milkshake full of onions.
Intro
Picture yourself running a classic diner. Burgers sizzling on the grill, fries crisping in the fryer, and milkshakes whirring in the blender—beef, onions, vanilla, and lettuce all in their rightful place.
Now imagine your inventory app glitches. One day, a customer orders a vanilla milkshake… and gets a tall glass of onion-flavored chaos.
You’ve just witnessed foreign key confusion in action—a silent saboteur more common in databases than most developers or DBAs care to admit. In this post, we’ll explore how to keep your diner (and your data) sane by:
- Normalizing the right way
- Exposing the pitfalls of integer pseudokeys
- Showing how GUIDs banish entire classes of bugs
- Proving why BLAKE3-to-GUID deterministic IDs make distributed systems safer and simpler to merge
1. Normalization: Where Most Developers Begin
A well-run diner starts with a clean inventory. But clean doesn’t mean safe—not yet.
Here’s how most developers (and many ORM tools) would model a menu:
CREATE TABLE ingredients (
id INT PRIMARY KEY,
name TEXT UNIQUE
);
CREATE TABLE menu_items (
id INT PRIMARY KEY,
name TEXT UNIQUE
);
CREATE TABLE menu_ingredients (
id INT PRIMARY KEY,
menu_item_id INT NOT NULL,
ingredient_id INT NOT NULL,
FOREIGN KEY (menu_item_id) REFERENCES menu_items(id),
FOREIGN KEY (ingredient_id) REFERENCES ingredients(id)
);
At first glance, this seems normal:
- Every row has a numeric
id
- Names are unique
- Foreign keys link through
id
But this design contains a silent flaw that grows over time. The id
field is an opaque pseudokey—a meaningless number that doesn’t reflect the real-world identity of the row. Meanwhile, the column that does represent identity (name
) is shoved off to the side as just a UNIQUE
constraint.
❌ What’s wrong with this picture?
- “Onion” might be
id = 1
in staging andid = 42
in production. - Merges between environments or systems will silently break foreign keys.
- The database will let you create bad relationships—like linking “milkshake” to “onion”—because integer
id
s validate but don’t mean anything. - Most query bugs don’t show up until it’s too late—after the data is already corrupted.
🧠 Side Note: Foreign keys require uniqueness
Most (if not all) SQL databases require a UNIQUE
or PRIMARY KEY
constraint on the column being referenced in a foreign key. So even if you’re pointing to a name
column, you must ensure it’s constrained as UNIQUE
or PRIMARY KEY
.
This means you can make name
the primary key—and you should, if it represents the real-world uniqueness of the row.
✅ What we want instead
CREATE TABLE ingredients (
id UUID UNIQUE,
name TEXT PRIMARY KEY
);
CREATE TABLE menu_items (
id UUID UNIQUE,
name TEXT PRIMARY KEY
);
CREATE TABLE menu_ingredients (
id UUID UNIQUE,
menu_item_id UUID NOT NULL,
ingredient_id UUID NOT NULL,
PRIMARY KEY (menu_item_id, ingredient_id),
FOREIGN KEY (menu_item_id) REFERENCES menu_items(id),
FOREIGN KEY (ingredient_id) REFERENCES ingredients(id)
);
Now we’ve done three important things:
name
is the primary key—the real reason each ingredient or menu item exists.id
is always present—a consistent, deterministic UUID across all tables, including join tables. Even if not used as the primary key, it gives us a single-column system identifier that’s useful for logs, referencing, UI editing, or future use cases.menu_ingredients
uses a composite primary key of(menu_item_id, ingredient_id)
—because that’s the reason that row exists—but still includes a uniqueid
for operational consistency.
This keeps your schema relationally correct, semantically clear, and operationally consistent.
2. Column Swap: The Onion Milkshake Disaster
A developer fat-fingers an insert:
-- Meant: (burger, onion). Actually: (milkshake, onion)
INSERT INTO menu_ingredients (menu_item_id, ingredient_id)
VALUES (2, 1);
Later, you run:
SELECT i.name
FROM menu_items m
JOIN menu_ingredients mi ON m.id = mi.menu_item_id
JOIN ingredients i ON i.id = mi.ingredient_id
WHERE m.name = 'Milkshake';
Result: Onion.
No error, no alarm—just a milkshake that drives customers out. Integer keys are too forgiving. They validate—but they don’t mean anything.
3. Deterministic GUIDs: Meaningful Identity at Scale
Instead of random IDs or numeric sequences, we can generate deterministic GUIDs from input data using a hashing algorithm, like SHA-256 or SHA-384. We choose BLAKE3, for speed.
Example in code:
string key = "ingredient:onion";
var hasher = Blake3.Hasher.New();
hasher.Update(Encoding.UTF8.GetBytes(key));
byte[] hash = hasher.Finalize().AsSpan(0, 16).ToArray();
Guid id = new Guid(hash);
This lets us create stable identifiers—independent of environment, without coordination.
Now every “Onion” entry across systems gets the same deterministic UUID. While name
is the logical identity, id
is the structural key—used to link data across tables and systems. Since it’s deterministic and derived from meaningful input, it guarantees consistency. And because it’s unique and non-null, it’s safe for use in foreign keys and join operations—where ambiguity is unacceptable.
🤔 Sidebar: “But won’t GUIDs ruin my clustered index?”
This comes up often—but it’s a misunderstanding. By default, PRIMARY KEY
is also the clustered index, which determines how rows are stored. But you control that.
Since we use name
as the PRIMARY KEY
, that’s your clustered index, and that is what the business cares about. The id
column—your deterministic UUID—is UNIQUE
, but not the storage key. Its job is to enable safe joins and relational stability, not to dictate row order.
So, no, deterministic GUIDs won’t fragment your disk layout. Their purpose isn’t to dictate sort order, but to act as durable structural join keys. And if you need to control clustering directly, just declare it explicitly:
CREATE TABLE ingredients (
id UUID UNIQUE,
name TEXT PRIMARY KEY
-- name becomes clustered index
);
Bottom line: the “GUIDs wreck performance” argument only applies when people use NEWID()
as a clustered primary key. We’re not doing that. We’re designing with intent.
4. Failure by Design: Insert Crashes That Save Your Data
One of the best parts of using deterministic GUIDs is that your database will actually reject invalid relationships—on purpose.
Let’s say we hash “ingredient:onion” and “menu:milkshake” into GUIDs:
-- These rows don't exist yet in the parents
INSERT INTO menu_ingredients (menu_item_id, ingredient_id)
VALUES ('8fd80e7a-918f-4260-a0c1-04c68cb55fa4',
'eb9e9f1c-62b3-4aa2-b13b-d6d3a0c1164c');
💥 Result: Foreign key violation.
That’s not a bug. That’s your schema screaming at you:
“You tried to link a menu item or ingredient that isn’t defined. Fix your insert order or add the missing data.”
This is impossible to detect with integer keys unless you have a deep join and a data validation process. With deterministic identity, it’s built-in.
So instead of silent corruption, you get loud protection.
5. Sanity Through Structure
We’ve all tasted the chaos of treating IDs as magic numbers. Milkshakes turn onion-y, customers storm out, and data rots into garbage. But with:
- A normalized schema to keep things clean
- Real-world fields like
name
as primary keys - Deterministic UUIDs via BLAKE3 to safely link and merge data across systems
- And explicit foreign key enforcement that actually works
…you build a diner—and a database—that scales, heals, and always serves what’s on the menu.
“The moment you stop trusting that
1
means what you think it means… is the moment your schema becomes trustworthy.”🧩 Caveats & Considerations
This approach isn’t without tradeoffs. While deterministic IDs via BLAKE3 offer powerful guarantees — immutability, deduplication, and structural integrity — they also come with some caution signs.
Most importantly: determinism can leak information. If your input data is predictable (usernames, emails, slugs, etc.), then the resulting hash could be reverse-engineered or precomputed by an attacker. In public or semi-public systems, this can reveal sensitive associations or allow for enumeration of records.
In these contexts, it’s worth considering:
Using a keyed variant (e.g., HMAC-BLAKE3) to obscure the input without losing determinism.
Scoping IDs with namespaces or tenant-specific prefixes to limit exposure.
Or simply reserving this technique for internal systems where threat models are more controlled.
Design is about context and constraints. This pattern isn’t a universal answer — but in the right hands and the right systems, it’s a scalpel, not a hammer.