March 2025 – Pengdows

March 28, 2025March 28, 2025

Onions in Milkshakes: A Case for GUIDs and Deterministic IDs with BLAKE3

Data integrity isn’t just theory—it’s the difference between a cheeseburger and a milkshake full of onions.

Intro

Picture yourself running a classic diner. Burgers sizzling on the grill, fries crisping in the fryer, and milkshakes whirring in the blender—beef, onions, vanilla, and lettuce all in their rightful place.

Now imagine your inventory app glitches. One day, a customer orders a vanilla milkshake… and gets a tall glass of onion-flavored chaos.

You’ve just witnessed foreign key confusion in action—a silent saboteur more common in databases than most developers or DBAs care to admit. In this post, we’ll explore how to keep your diner (and your data) sane by:

Normalizing the right way
Exposing the pitfalls of integer pseudokeys
Showing how GUIDs banish entire classes of bugs
Proving why BLAKE3-to-GUID deterministic IDs make distributed systems safer and simpler to merge

1. Normalization: Where Most Developers Begin

A well-run diner starts with a clean inventory. But clean doesn’t mean safe—not yet.

Here’s how most developers (and many ORM tools) would model a menu:

CREATE TABLE ingredients (
    id INT PRIMARY KEY,
    name TEXT UNIQUE
);

CREATE TABLE menu_items (
    id INT PRIMARY KEY,
    name TEXT UNIQUE
);

CREATE TABLE menu_ingredients (
    id INT PRIMARY KEY,
    menu_item_id INT NOT NULL,
    ingredient_id INT NOT NULL,
    FOREIGN KEY (menu_item_id) REFERENCES menu_items(id),
    FOREIGN KEY (ingredient_id) REFERENCES ingredients(id)
);

At first glance, this seems normal:

Every row has a numeric id
Names are unique
Foreign keys link through id

But this design contains a silent flaw that grows over time. The id field is an opaque pseudokey—a meaningless number that doesn’t reflect the real-world identity of the row. Meanwhile, the column that does represent identity (name) is shoved off to the side as just a UNIQUE constraint.

❌ What’s wrong with this picture?

“Onion” might be id = 1 in staging and id = 42 in production.
Merges between environments or systems will silently break foreign keys.
The database will let you create bad relationships—like linking “milkshake” to “onion”—because integer ids validate but don’t mean anything.
Most query bugs don’t show up until it’s too late—after the data is already corrupted.

🧠 Side Note: Foreign keys require uniqueness

Most (if not all) SQL databases require a UNIQUE or PRIMARY KEY constraint on the column being referenced in a foreign key. So even if you’re pointing to a name column, you must ensure it’s constrained as UNIQUE or PRIMARY KEY.

This means you can make name the primary key—and you should, if it represents the real-world uniqueness of the row.

✅ What we want instead

CREATE TABLE ingredients (
    id UUID UNIQUE,
    name TEXT PRIMARY KEY
);

CREATE TABLE menu_items (
    id UUID UNIQUE,
    name TEXT PRIMARY KEY
);

CREATE TABLE menu_ingredients (
    id UUID UNIQUE,
    menu_item_id UUID NOT NULL,
    ingredient_id UUID NOT NULL,
    PRIMARY KEY (menu_item_id, ingredient_id),
    FOREIGN KEY (menu_item_id) REFERENCES menu_items(id),
    FOREIGN KEY (ingredient_id) REFERENCES ingredients(id)
);

Now we’ve done three important things:

name is the primary key—the real reason each ingredient or menu item exists.
id is always present—a consistent, deterministic UUID across all tables, including join tables. Even if not used as the primary key, it gives us a single-column system identifier that’s useful for logs, referencing, UI editing, or future use cases.
menu_ingredients uses a composite primary key of (menu_item_id, ingredient_id)—because that’s the reason that row exists—but still includes a unique id for operational consistency.

This keeps your schema relationally correct, semantically clear, and operationally consistent.

2. Column Swap: The Onion Milkshake Disaster

A developer fat-fingers an insert:

-- Meant: (burger, onion). Actually: (milkshake, onion)
INSERT INTO menu_ingredients (menu_item_id, ingredient_id)
VALUES (2, 1);

Later, you run:

SELECT i.name
FROM menu_items m
JOIN menu_ingredients mi ON m.id = mi.menu_item_id
JOIN ingredients i ON i.id = mi.ingredient_id
WHERE m.name = 'Milkshake';

Result: Onion.

No error, no alarm—just a milkshake that drives customers out. Integer keys are too forgiving. They validate—but they don’t mean anything.

3. Deterministic GUIDs: Meaningful Identity at Scale

Instead of random IDs or numeric sequences, we can generate deterministic GUIDs from input data using a hashing algorithm, like SHA-256 or SHA-384. We choose BLAKE3, for speed.

Example in code:

string key = "ingredient:onion";
var hasher = Blake3.Hasher.New();
hasher.Update(Encoding.UTF8.GetBytes(key));
byte[] hash = hasher.Finalize().AsSpan(0, 16).ToArray();
Guid id = new Guid(hash);

This lets us create stable identifiers—independent of environment, without coordination.

Now every “Onion” entry across systems gets the same deterministic UUID. While name is the logical identity, id is the structural key—used to link data across tables and systems. Since it’s deterministic and derived from meaningful input, it guarantees consistency. And because it’s unique and non-null, it’s safe for use in foreign keys and join operations—where ambiguity is unacceptable.

🤔 Sidebar: “But won’t GUIDs ruin my clustered index?”

This comes up often—but it’s a misunderstanding. By default, PRIMARY KEY is also the clustered index, which determines how rows are stored. But you control that.

Since we use name as the PRIMARY KEY, that’s your clustered index, and that is what the business cares about. The id column—your deterministic UUID—is UNIQUE, but not the storage key. Its job is to enable safe joins and relational stability, not to dictate row order.

So, no, deterministic GUIDs won’t fragment your disk layout. Their purpose isn’t to dictate sort order, but to act as durable structural join keys. And if you need to control clustering directly, just declare it explicitly:

CREATE TABLE ingredients (
    id UUID UNIQUE,
    name TEXT PRIMARY KEY
    -- name becomes clustered index
);

Bottom line: the “GUIDs wreck performance” argument only applies when people use NEWID() as a clustered primary key. We’re not doing that. We’re designing with intent.

4. Failure by Design: Insert Crashes That Save Your Data

One of the best parts of using deterministic GUIDs is that your database will actually reject invalid relationships—on purpose.

Let’s say we hash “ingredient:onion” and “menu:milkshake” into GUIDs:

-- These rows don't exist yet in the parents
INSERT INTO menu_ingredients (menu_item_id, ingredient_id)
VALUES ('8fd80e7a-918f-4260-a0c1-04c68cb55fa4', 
        'eb9e9f1c-62b3-4aa2-b13b-d6d3a0c1164c');

💥 Result: Foreign key violation.

That’s not a bug. That’s your schema screaming at you:

“You tried to link a menu item or ingredient that isn’t defined. Fix your insert order or add the missing data.”

This is impossible to detect with integer keys unless you have a deep join and a data validation process. With deterministic identity, it’s built-in.

So instead of silent corruption, you get loud protection.

5. Sanity Through Structure

We’ve all tasted the chaos of treating IDs as magic numbers. Milkshakes turn onion-y, customers storm out, and data rots into garbage. But with:

A normalized schema to keep things clean
Real-world fields like name as primary keys
Deterministic UUIDs via BLAKE3 to safely link and merge data across systems
And explicit foreign key enforcement that actually works

…you build a diner—and a database—that scales, heals, and always serves what’s on the menu.

“The moment you stop trusting that 1 means what you think it means… is the moment your schema becomes trustworthy.”

🧩 Caveats & Considerations

This approach isn’t without tradeoffs. While deterministic IDs via BLAKE3 offer powerful guarantees — immutability, deduplication, and structural integrity — they also come with some caution signs.

Most importantly: determinism can leak information. If your input data is predictable (usernames, emails, slugs, etc.), then the resulting hash could be reverse-engineered or precomputed by an attacker. In public or semi-public systems, this can reveal sensitive associations or allow for enumeration of records.

In these contexts, it’s worth considering:

Using a keyed variant (e.g., HMAC-BLAKE3) to obscure the input without losing determinism.

Scoping IDs with namespaces or tenant-specific prefixes to limit exposure.

Or simply reserving this technique for internal systems where threat models are more controlled.

Design is about context and constraints. This pattern isn’t a universal answer — but in the right hands and the right systems, it’s a scalpel, not a hammer.

March 15, 2025

The Danger of Email-Only E-Signatures in Real Estate Transactions

There was a time when signing important legal documents required an in-person meeting with a notary—someone who verified your identity, ensured you were signing voluntarily, and applied their seal as a mark of authenticity.

Today, companies like **DocuSign, DotLoop, and similar platforms** have removed that layer of security. They allow **critical documents, including real estate contracts, to be “signed” with nothing more than email validation.**

Let me be blunt—this is not just flawed, it’s dangerous.

The Illusion of Security

These platforms claim to simplify transactions, but in doing so, they sacrifice one of the **core tenets of legal signing: knowing who is actually signing.**

- Validating that someone controls an email address is **not the same** as validating their identity.
- Email accounts can be **hacked, shared, or even spoofed.**
- This allows **fraudulent signings** to go undetected.

You might think, **”Surely this isn’t allowed for high-stakes transactions like real estate.”**

Unfortunately, it is. Despite the **high value and legal importance** of real estate transactions, **many platforms permit email-only validation** with:

- **No verified ID check**
- **No real-time identity confirmation**
- **No notary involved**

Why Notarization Exists

Notarization has always served as a safeguard against fraud. A notary:

- Confirms the signer is **who they claim to be.**
- Verifies that the signer is acting **willingly and knowingly.**
- Provides **legal defensibility** if the signature is challenged later.

Removing notarization **and replacing it with an email check** is an open invitation to fraud. Imagine:

- Someone **gains access to your email**.
- They **digitally “sign” your property away**.
- Because the system relies on **email validation alone**, the fraud **may go undetected**.

This isn’t hypothetical—this is happening. Yet, **the platforms shift liability to users**, while profiting from a system that is **fundamentally flawed**.

Legal Loopholes and Industry Complacency

How is this even legal? The answer lies in outdated legislation.

Thanks to laws like the **E-SIGN Act** and **UETA**, electronic signatures are considered legally binding. However, these laws:

- Are **technology-neutral**, meaning they **don’t mandate identity verification**.
- Assume that **all parties consent to e-signing**, ignoring the reality of fraud.

As a result, platforms **default to the easiest, cheapest option**—validating only an email address.

Why This is a Legal Time Bomb

It’s only a matter of time before a **major fraud case** blows this whole system wide open.

Imagine someone challenges a **fraudulent real estate sale**, arguing:

- The signature was **forged using email-only validation**.
- The e-signature platform **didn’t verify the signer’s real identity.**
- The fraud resulted in **financial loss and legal consequences.**

A class-action lawsuit against these platforms could be worth **millions**, especially if courts determine they **failed to protect consumers from fraud**.

The Solution: Verified Identity in E-Signatures

Fixing this is simple—yet the industry resists it because **convenience is prioritized over security**.

A proper solution for real estate transactions would include:

- **Government ID checks** before allowing an e-signature.
- **Biometric verification** (e.g., facial recognition with liveness detection).
- **Remote Online Notarization (RON)** to verify signers in real-time.
- **Immutable audit trails** to ensure legal defensibility.

A Call to Action

If you’re involved in real estate—whether buying, selling, or facilitating—you **should be concerned**.

If you’re a **lawyer looking for the next big case**, take a closer look at **the legal exposure** these platforms are creating.

The integrity of **property rights** should never be compromised for convenience.

It’s time to **demand higher standards** for digital signatures before the inevitable fraud scandals make the news.

March 13, 2025

Why Breaking Down User Stories by Deployable Units is Best Practice in Microservices Architecture

Introduction

In modern software development, particularly within microservices architecture, there’s an ongoing debate about how to best break down user stories. While traditional Agile methods emphasize vertical slicing—cutting through the entire stack (UI, API, and database) to deliver a user-facing feature—this approach can fall apart when applied to complex, distributed systems with independent deployable units.

I’ve encountered resistance when suggesting that, after defining an initial vertical slice, it’s best to break down stories by deployable unit or repository. The concern is that this practice “bucks the industry standard.” However, when considering industry best practices, especially from organizations like Amazon, it becomes clear that breaking stories by deployable units aligns with modern Agile and DevOps principles.

The Nature of Microservices: Why Independent Deployability Matters

Microservices are designed to be independently developed, tested, and deployed. Each service is an autonomous unit responsible for a specific business capability. This architectural style aims to enable rapid, frequent, and safe deployments.

Key characteristics of microservices include:

- Independent Deployments: Each service can be updated and deployed without impacting other services.
- Decentralized Data Management: Each service manages its own database schema and data.
- Autonomous Teams: Teams can work independently on different services, reducing cross-team dependencies.

Forcing teams to combine multiple services into a single, massive vertical slice undermines these benefits, increasing complexity and deployment risks.

Why Breaking Stories by Deployable Unit is the Right Approach

1. Supports the INVEST Principles

Agile user stories should be:

- Independent: Each story should be deployable without waiting for other stories.
- Small: Smaller stories reduce risk and accelerate delivery.
- Testable: Each story should be verifiable in isolation.

When stories cross multiple services, they violate independence and smallness, resulting in slower progress and riskier integrations. By breaking down stories by deployable unit (like individual microservices or front-end components), teams ensure faster, safer progress.

2. Respects Modern CI/CD Practices

Industry leaders like Amazon deploy thousands of times per day. This is only possible because:

- Services are independently deployable.
- Stories are small and focused, often aligning with single deployable components.
- Teams use feature toggles to deploy backend functionality before the frontend is ready, reducing integration risks.

Source: DevSkillBuilder

3. Minimizes Integration Risk

Breaking stories by repo allows each service to be independently tested and deployed. Teams can validate each piece through contract tests and feature flags, avoiding large, risky deployments where a bug in one service can block the entire release.

4. Enables Parallel Work Across Teams

When stories are tied to specific repos, teams can work in parallel. Backend teams, frontend teams, and database teams can all proceed without blocking one another. This is critical in modern DevOps environments that prioritize speed and autonomy.

5. Simplifies Progress Tracking and Risk Management

Large stories that cross services hide complexity and risk. Smaller, repo-specific stories are easier to track, ensuring better visibility for project managers and reducing the likelihood of surprises late in the development cycle.

The Misunderstanding About “Tasks vs. Stories”

Some argue that microservice-specific work should be “just a task” under a larger story. However, this breaks down when considering:

- Tasks aren’t deployable, but microservices are.
- You can’t track testing and readiness effectively if it’s all buried in a single story.
- Deployments become riskier, and you lose the ability to test services independently.

By treating each deployable change (like a new endpoint or schema change) as its own story, you maintain better clarity, accountability, and alignment with CI/CD pipelines.

What About Small Vertical Slices?

Some counter-argue, “Just make the vertical slices smaller, like adding one field.” But even a “small slice” can involve:

- A database schema change (that needs to be backward compatible).
- A backend API update.
- A frontend component update to consume and display the field.
- Tests and validation for each layer.

If you bundle all that into one story, you’re still dealing with a large, cross-repo story that violates independence and smallness. By contrast, breaking it down by repo:

1. Enables independent testing and deployment.
2. Aligns with CI/CD pipelines.
3. Allows for parallel progress.

How Industry Leaders Do It (Like Amazon)

Amazon and other leaders rely on independent, deployable services. Their approach emphasizes:

- Feature Toggles for safe, incremental releases.
- Service-Aligned User Stories where each deployable unit is a story.
- Parallel Development by autonomous teams.
- End-to-End Integration Tests only after individual components are deployed and validated.

This approach ensures that even when deploying thousands of times a day, each deployment is small, safe, and reliable.

Source: TechTarget

Conclusion

Breaking stories by deployable unit in a microservices architecture is not “bucking the industry standard”—it’s adhering to it.

- It respects microservice autonomy.
- It aligns with CI/CD and DevOps best practices.
- It reduces risk and accelerates delivery.
- It maintains the INVEST principles for Agile success.

If you’re developing in a modern, distributed system, forcing massive, cross-repo stories is outdated and risky. The industry has evolved toward small, independent, deployable stories—and for good reason.

References

March 12, 2025

Gnu tools on Mac via Brew

I am always using unix/linux tools like grep, sed, awk and others. MacOs has many but not all. Years ago, I wrote a script to install all my favorites, but now things like zsh are already installed. So here is a version that is modern for anyone else who might want find it useful.

# Essential Tools brew install file-formula brew install git brew install openssh brew install perl brew install python brew install rsync brew install svn brew install unzip brew install vim brew install macvim brew install binutils brew install diffutils brew install ed brew install findutils brew install gawk brew install gnu-indent brew install gnu-sed brew install gnu-tar brew install gnu-which brew install gnutls brew install grep brew install gzip brew install screen brew install watch brew install wdiff brew install wget brew install bash brew install emacs brew install gdb brew install gpatch brew install less brew install m4 brew install make brew install nano
# Tools that # – # – # File command (should still be useful if you're working with files of different types)
# Git (already installed on most systems, but can update via brew)
# SSH tools (also might be included by default, but it's easy to keep it up-to-date via brew)
# Perl
# Python (ensure you're getting the latest version, or python@3.x for specific versions)
# Rsync
# Subversion (SVN)
# Unzip (useful if you need newer versions)
# Vim (install a more recent version, `brew install vim` may override system Vim)
# MacVim (if you prefer it over the default Terminal Vim)
# GNU Binutils (for more advanced tools like `nm`, `objdump`)
# GNU Diffutils (use `diff` and `cmp` from GNU version)
# GNU ed (useful for scripting in certain situations)
# GNU Findutils (provides more powerful find tools than the default macOS version)
# GNU AWK
# GNU Indent
# GNU Sed
# GNU Tar
# GNU Which
# GnuTLS for SSL/TLS support
# GNU Grep (for advanced searching, `grep` from GNU)
# GNU Gzip (for compression tools)
# Screen (GNU Screen)
# Watch (for repeated execution of commands)
# GNU Wdiff
# GNU Wget (used for downloading files over HTTP/FTP)
# GNU Bash (macOS uses zsh by default now, but if you need bash 5.x+)
# Emacs (alternative to Vim)
# GDB Debugger (requires additional setup as per `brew info gdb`)
# GNU Patch
# GNU Less (more advanced pager)
# GNU M4 (macro processor)
# GNU Make (if you want the latest version)
# GNU Nano (if you prefer Nano over Vim or Emacs)
are no longer necessary:
zsh is now the default shell in macOS, so you don’t need to install it via Homebrew.
Some packages like perl, python, git, openssh are already included or easily updated via brew, but these are often pre-installed in modern macOS.