Onions in Milkshakes: A Case for GUIDs and Deterministic IDs with BLAKE3

Data integrity isn’t just theory—it’s the difference between a cheeseburger and a milkshake full of onions.


Intro

Picture yourself running a classic diner. Burgers sizzling on the grill, fries crisping in the fryer, and milkshakes whirring in the blender—beef, onions, vanilla, and lettuce all in their rightful place.

Now imagine your inventory app glitches. One day, a customer orders a vanilla milkshake… and gets a tall glass of onion-flavored chaos.

You’ve just witnessed foreign key confusion in action—a silent saboteur more common in databases than most developers or DBAs care to admit. In this post, we’ll explore how to keep your diner (and your data) sane by:

  • Normalizing the right way
  • Exposing the pitfalls of integer pseudokeys
  • Showing how GUIDs banish entire classes of bugs
  • Proving why BLAKE3-to-GUID deterministic IDs make distributed systems safer and simpler to merge

1. Normalization: Where Most Developers Begin

A well-run diner starts with a clean inventory. But clean doesn’t mean safe—not yet.

Here’s how most developers (and many ORM tools) would model a menu:

CREATE TABLE ingredients (
    id INT PRIMARY KEY,
    name TEXT UNIQUE
);

CREATE TABLE menu_items (
    id INT PRIMARY KEY,
    name TEXT UNIQUE
);

CREATE TABLE menu_ingredients (
    id INT PRIMARY KEY,
    menu_item_id INT NOT NULL,
    ingredient_id INT NOT NULL,
    FOREIGN KEY (menu_item_id) REFERENCES menu_items(id),
    FOREIGN KEY (ingredient_id) REFERENCES ingredients(id)
);

At first glance, this seems normal:

  • Every row has a numeric id
  • Names are unique
  • Foreign keys link through id

But this design contains a silent flaw that grows over time. The id field is an opaque pseudokey—a meaningless number that doesn’t reflect the real-world identity of the row. Meanwhile, the column that does represent identity (name) is shoved off to the side as just a UNIQUE constraint.


❌ What’s wrong with this picture?

  • “Onion” might be id = 1 in staging and id = 42 in production.
  • Merges between environments or systems will silently break foreign keys.
  • The database will let you create bad relationships—like linking “milkshake” to “onion”—because integer ids validate but don’t mean anything.
  • Most query bugs don’t show up until it’s too late—after the data is already corrupted.

🧠 Side Note: Foreign keys require uniqueness

Most (if not all) SQL databases require a UNIQUE or PRIMARY KEY constraint on the column being referenced in a foreign key. So even if you’re pointing to a name column, you must ensure it’s constrained as UNIQUE or PRIMARY KEY.

This means you can make name the primary key—and you should, if it represents the real-world uniqueness of the row.


✅ What we want instead

CREATE TABLE ingredients (
    id UUID UNIQUE,
    name TEXT PRIMARY KEY
);

CREATE TABLE menu_items (
    id UUID UNIQUE,
    name TEXT PRIMARY KEY
);

CREATE TABLE menu_ingredients (
    id UUID UNIQUE,
    menu_item_id UUID NOT NULL,
    ingredient_id UUID NOT NULL,
    PRIMARY KEY (menu_item_id, ingredient_id),
    FOREIGN KEY (menu_item_id) REFERENCES menu_items(id),
    FOREIGN KEY (ingredient_id) REFERENCES ingredients(id)
);

Now we’ve done three important things:

  1. name is the primary key—the real reason each ingredient or menu item exists.
  2. id is always present—a consistent, deterministic UUID across all tables, including join tables. Even if not used as the primary key, it gives us a single-column system identifier that’s useful for logs, referencing, UI editing, or future use cases.
  3. menu_ingredients uses a composite primary key of (menu_item_id, ingredient_id)—because that’s the reason that row exists—but still includes a unique id for operational consistency.

This keeps your schema relationally correct, semantically clear, and operationally consistent.


2. Column Swap: The Onion Milkshake Disaster

A developer fat-fingers an insert:

-- Meant: (burger, onion). Actually: (milkshake, onion)
INSERT INTO menu_ingredients (menu_item_id, ingredient_id)
VALUES (2, 1);

Later, you run:

SELECT i.name
FROM menu_items m
JOIN menu_ingredients mi ON m.id = mi.menu_item_id
JOIN ingredients i ON i.id = mi.ingredient_id
WHERE m.name = 'Milkshake';

Result: Onion.

No error, no alarm—just a milkshake that drives customers out. Integer keys are too forgiving. They validate—but they don’t mean anything.


3. Deterministic GUIDs: Meaningful Identity at Scale

Instead of random IDs or numeric sequences, we can generate deterministic GUIDs from input data using a hashing algorithm, like SHA-256 or SHA-384. We choose BLAKE3, for speed.

Example in code:

string key = "ingredient:onion";
var hasher = Blake3.Hasher.New();
hasher.Update(Encoding.UTF8.GetBytes(key));
byte[] hash = hasher.Finalize().AsSpan(0, 16).ToArray();
Guid id = new Guid(hash);

This lets us create stable identifiers—independent of environment, without coordination.

Now every “Onion” entry across systems gets the same deterministic UUID. While name is the logical identity, id is the structural key—used to link data across tables and systems. Since it’s deterministic and derived from meaningful input, it guarantees consistency. And because it’s unique and non-null, it’s safe for use in foreign keys and join operations—where ambiguity is unacceptable.


🤔 Sidebar: “But won’t GUIDs ruin my clustered index?”

This comes up often—but it’s a misunderstanding. By default, PRIMARY KEY is also the clustered index, which determines how rows are stored. But you control that.

Since we use name as the PRIMARY KEY, that’s your clustered index, and that is what the business cares about. The id column—your deterministic UUID—is UNIQUE, but not the storage key. Its job is to enable safe joins and relational stability, not to dictate row order.

So, no, deterministic GUIDs won’t fragment your disk layout. Their purpose isn’t to dictate sort order, but to act as durable structural join keys. And if you need to control clustering directly, just declare it explicitly:

CREATE TABLE ingredients (
    id UUID UNIQUE,
    name TEXT PRIMARY KEY
    -- name becomes clustered index
);

Bottom line: the “GUIDs wreck performance” argument only applies when people use NEWID() as a clustered primary key. We’re not doing that. We’re designing with intent.


4. Failure by Design: Insert Crashes That Save Your Data

One of the best parts of using deterministic GUIDs is that your database will actually reject invalid relationships—on purpose.

Let’s say we hash “ingredient:onion” and “menu:milkshake” into GUIDs:

-- These rows don't exist yet in the parents
INSERT INTO menu_ingredients (menu_item_id, ingredient_id)
VALUES ('8fd80e7a-918f-4260-a0c1-04c68cb55fa4', 
        'eb9e9f1c-62b3-4aa2-b13b-d6d3a0c1164c');

💥 Result: Foreign key violation.

That’s not a bug. That’s your schema screaming at you:

“You tried to link a menu item or ingredient that isn’t defined. Fix your insert order or add the missing data.”

This is impossible to detect with integer keys unless you have a deep join and a data validation process. With deterministic identity, it’s built-in.

So instead of silent corruption, you get loud protection.


5. Sanity Through Structure

We’ve all tasted the chaos of treating IDs as magic numbers. Milkshakes turn onion-y, customers storm out, and data rots into garbage. But with:

  • A normalized schema to keep things clean
  • Real-world fields like name as primary keys
  • Deterministic UUIDs via BLAKE3 to safely link and merge data across systems
  • And explicit foreign key enforcement that actually works

…you build a diner—and a database—that scales, heals, and always serves what’s on the menu.

“The moment you stop trusting that 1 means what you think it means… is the moment your schema becomes trustworthy.”

🧩 Caveats & Considerations

This approach isn’t without tradeoffs. While deterministic IDs via BLAKE3 offer powerful guarantees — immutability, deduplication, and structural integrity — they also come with some caution signs.

Most importantly: determinism can leak information. If your input data is predictable (usernames, emails, slugs, etc.), then the resulting hash could be reverse-engineered or precomputed by an attacker. In public or semi-public systems, this can reveal sensitive associations or allow for enumeration of records.

In these contexts, it’s worth considering:

  • Using a keyed variant (e.g., HMAC-BLAKE3) to obscure the input without losing determinism.

  • Scoping IDs with namespaces or tenant-specific prefixes to limit exposure.

  • Or simply reserving this technique for internal systems where threat models are more controlled.

Design is about context and constraints. This pattern isn’t a universal answer — but in the right hands and the right systems, it’s a scalpel, not a hammer.

The Danger of Email-Only E-Signatures in Real Estate Transactions

There was a time when signing important legal documents required an in-person meeting with a notary—someone who verified your identity, ensured you were signing voluntarily, and applied their seal as a mark of authenticity.

Today, companies like **DocuSign, DotLoop, and similar platforms** have removed that layer of security. They allow **critical documents, including real estate contracts, to be “signed” with nothing more than email validation.**

Let me be blunt—this is not just flawed, it’s dangerous.

The Illusion of Security

These platforms claim to simplify transactions, but in doing so, they sacrifice one of the **core tenets of legal signing: knowing who is actually signing.**

    • Validating that someone controls an email address is **not the same** as validating their identity.
    • Email accounts can be **hacked, shared, or even spoofed.**
    • This allows **fraudulent signings** to go undetected.

You might think, **”Surely this isn’t allowed for high-stakes transactions like real estate.”**

Unfortunately, it is. Despite the **high value and legal importance** of real estate transactions, **many platforms permit email-only validation** with:

    • **No verified ID check**
    • **No real-time identity confirmation**
    • **No notary involved**

Why Notarization Exists

Notarization has always served as a safeguard against fraud. A notary:

    • Confirms the signer is **who they claim to be.**
    • Verifies that the signer is acting **willingly and knowingly.**
    • Provides **legal defensibility** if the signature is challenged later.

Removing notarization **and replacing it with an email check** is an open invitation to fraud. Imagine:

    • Someone **gains access to your email**.
    • They **digitally “sign” your property away**.
    • Because the system relies on **email validation alone**, the fraud **may go undetected**.

This isn’t hypothetical—this is happening. Yet, **the platforms shift liability to users**, while profiting from a system that is **fundamentally flawed**.

Legal Loopholes and Industry Complacency

How is this even legal? The answer lies in outdated legislation.

Thanks to laws like the **E-SIGN Act** and **UETA**, electronic signatures are considered legally binding. However, these laws:

    • Are **technology-neutral**, meaning they **don’t mandate identity verification**.
    • Assume that **all parties consent to e-signing**, ignoring the reality of fraud.

As a result, platforms **default to the easiest, cheapest option**—validating only an email address.

Why This is a Legal Time Bomb

It’s only a matter of time before a **major fraud case** blows this whole system wide open.

Imagine someone challenges a **fraudulent real estate sale**, arguing:

    • The signature was **forged using email-only validation**.
    • The e-signature platform **didn’t verify the signer’s real identity.**
    • The fraud resulted in **financial loss and legal consequences.**

A class-action lawsuit against these platforms could be worth **millions**, especially if courts determine they **failed to protect consumers from fraud**.

The Solution: Verified Identity in E-Signatures

Fixing this is simple—yet the industry resists it because **convenience is prioritized over security**.

A proper solution for real estate transactions would include:

    • **Government ID checks** before allowing an e-signature.
    • **Biometric verification** (e.g., facial recognition with liveness detection).
    • **Remote Online Notarization (RON)** to verify signers in real-time.
    • **Immutable audit trails** to ensure legal defensibility.

A Call to Action

If you’re involved in real estate—whether buying, selling, or facilitating—you **should be concerned**.

If you’re a **lawyer looking for the next big case**, take a closer look at **the legal exposure** these platforms are creating.

The integrity of **property rights** should never be compromised for convenience.

It’s time to **demand higher standards** for digital signatures before the inevitable fraud scandals make the news.

Why Breaking Down User Stories by Deployable Units is Best Practice in Microservices Architecture

Introduction

In modern software development, particularly within microservices architecture, there’s an ongoing debate about how to best break down user stories. While traditional Agile methods emphasize vertical slicing—cutting through the entire stack (UI, API, and database) to deliver a user-facing feature—this approach can fall apart when applied to complex, distributed systems with independent deployable units.

I’ve encountered resistance when suggesting that, after defining an initial vertical slice, it’s best to break down stories by deployable unit or repository. The concern is that this practice “bucks the industry standard.” However, when considering industry best practices, especially from organizations like Amazon, it becomes clear that breaking stories by deployable units aligns with modern Agile and DevOps principles.


The Nature of Microservices: Why Independent Deployability Matters

Microservices are designed to be independently developed, tested, and deployed. Each service is an autonomous unit responsible for a specific business capability. This architectural style aims to enable rapid, frequent, and safe deployments.

Key characteristics of microservices include:

    • Independent Deployments: Each service can be updated and deployed without impacting other services.
    • Decentralized Data Management: Each service manages its own database schema and data.
    • Autonomous Teams: Teams can work independently on different services, reducing cross-team dependencies.

Forcing teams to combine multiple services into a single, massive vertical slice undermines these benefits, increasing complexity and deployment risks.


Why Breaking Stories by Deployable Unit is the Right Approach

1. Supports the INVEST Principles

Agile user stories should be:

    • Independent: Each story should be deployable without waiting for other stories.
    • Small: Smaller stories reduce risk and accelerate delivery.
    • Testable: Each story should be verifiable in isolation.

When stories cross multiple services, they violate independence and smallness, resulting in slower progress and riskier integrations. By breaking down stories by deployable unit (like individual microservices or front-end components), teams ensure faster, safer progress.


2. Respects Modern CI/CD Practices

Industry leaders like Amazon deploy thousands of times per day. This is only possible because:

    • Services are independently deployable.
    • Stories are small and focused, often aligning with single deployable components.
    • Teams use feature toggles to deploy backend functionality before the frontend is ready, reducing integration risks.

Source: DevSkillBuilder


3. Minimizes Integration Risk

Breaking stories by repo allows each service to be independently tested and deployed. Teams can validate each piece through contract tests and feature flags, avoiding large, risky deployments where a bug in one service can block the entire release.


4. Enables Parallel Work Across Teams

When stories are tied to specific repos, teams can work in parallel. Backend teams, frontend teams, and database teams can all proceed without blocking one another. This is critical in modern DevOps environments that prioritize speed and autonomy.


5. Simplifies Progress Tracking and Risk Management

Large stories that cross services hide complexity and risk. Smaller, repo-specific stories are easier to track, ensuring better visibility for project managers and reducing the likelihood of surprises late in the development cycle.


The Misunderstanding About “Tasks vs. Stories”

Some argue that microservice-specific work should be “just a task” under a larger story. However, this breaks down when considering:

    • Tasks aren’t deployable, but microservices are.
    • You can’t track testing and readiness effectively if it’s all buried in a single story.
    • Deployments become riskier, and you lose the ability to test services independently.

By treating each deployable change (like a new endpoint or schema change) as its own story, you maintain better clarity, accountability, and alignment with CI/CD pipelines.


What About Small Vertical Slices?

Some counter-argue, “Just make the vertical slices smaller, like adding one field.” But even a “small slice” can involve:

    • A database schema change (that needs to be backward compatible).
    • A backend API update.
    • A frontend component update to consume and display the field.
    • Tests and validation for each layer.

If you bundle all that into one story, you’re still dealing with a large, cross-repo story that violates independence and smallness. By contrast, breaking it down by repo:

    1. Enables independent testing and deployment.
    2. Aligns with CI/CD pipelines.
    3. Allows for parallel progress.

How Industry Leaders Do It (Like Amazon)

Amazon and other leaders rely on independent, deployable services. Their approach emphasizes:

    • Feature Toggles for safe, incremental releases.
    • Service-Aligned User Stories where each deployable unit is a story.
    • Parallel Development by autonomous teams.
    • End-to-End Integration Tests only after individual components are deployed and validated.

This approach ensures that even when deploying thousands of times a day, each deployment is small, safe, and reliable.

Source: TechTarget


Conclusion

Breaking stories by deployable unit in a microservices architecture is not “bucking the industry standard”—it’s adhering to it.

    • It respects microservice autonomy.
    • It aligns with CI/CD and DevOps best practices.
    • It reduces risk and accelerates delivery.
    • It maintains the INVEST principles for Agile success.

If you’re developing in a modern, distributed system, forcing massive, cross-repo stories is outdated and risky. The industry has evolved toward small, independent, deployable stories—and for good reason.


References

Gnu tools on Mac via Brew

I am always using unix/linux tools like grep, sed, awk and others. MacOs has many but not all. Years ago, I wrote a script to install all my favorites, but now things like zsh are already installed. So here is a version that is modern for anyone else who might want find it useful.


# Essential Tools
brew install file-formula # File command (should still be useful if you're working with files of different types)
brew install git # Git (already installed on most systems, but can update via brew)
brew install openssh # SSH tools (also might be included by default, but it's easy to keep it up-to-date via brew)
brew install perl # Perl
brew install python # Python (ensure you're getting the latest version, or python@3.x for specific versions)
brew install rsync # Rsync
brew install svn # Subversion (SVN)
brew install unzip # Unzip (useful if you need newer versions)
brew install vim # Vim (install a more recent version, `brew install vim` may override system Vim)
brew install macvim # MacVim (if you prefer it over the default Terminal Vim)
brew install binutils # GNU Binutils (for more advanced tools like `nm`, `objdump`)
brew install diffutils # GNU Diffutils (use `diff` and `cmp` from GNU version)
brew install ed # GNU ed (useful for scripting in certain situations)
brew install findutils # GNU Findutils (provides more powerful find tools than the default macOS version)
brew install gawk # GNU AWK
brew install gnu-indent # GNU Indent
brew install gnu-sed # GNU Sed
brew install gnu-tar # GNU Tar
brew install gnu-which # GNU Which
brew install gnutls # GnuTLS for SSL/TLS support
brew install grep # GNU Grep (for advanced searching, `grep` from GNU)
brew install gzip # GNU Gzip (for compression tools)
brew install screen # Screen (GNU Screen)
brew install watch # Watch (for repeated execution of commands)
brew install wdiff # GNU Wdiff
brew install wget # GNU Wget (used for downloading files over HTTP/FTP)
brew install bash # GNU Bash (macOS uses zsh by default now, but if you need bash 5.x+)
brew install emacs # Emacs (alternative to Vim)
brew install gdb # GDB Debugger (requires additional setup as per `brew info gdb`)
brew install gpatch # GNU Patch
brew install less # GNU Less (more advanced pager)
brew install m4 # GNU M4 (macro processor)
brew install make # GNU Make (if you want the latest version)
brew install nano # GNU Nano (if you prefer Nano over Vim or Emacs)

# Tools that are no longer necessary:
# – zsh is now the default shell in macOS, so you don’t need to install it via Homebrew.
# – Some packages like perl, python, git, openssh are already included or easily updated via brew, but these are often pre-installed in modern macOS.

CentOS Issues

Trying to setup CentOS, which isn’t my normal Linux, I found a number of issues.

      • Time drifts horribly. I tried installing the standard NTP package, but something about CentOS didn’t want to let it work properly. Since CentOS wants to use chrony, I guess I will use crony.
      • Webmin doesn’t correctly authenticate against the normal users.
      • Webmin doesn’t allow sudo users

To fix the time drift.  Unlike every other *nix, CentOS doesn’t use NTP, it uses something called “crony”.  To fix it follow these instructions.

To install webmin so that it works as cleanly as debian based distros:

      • sudo -i
      • yum -y update
      • yum -y install perl perl-Net-SSLeay openssl perl-IO-Tty perl-Encode-Detect
      • vi /etc/yum.repos.d/webmin.repo
      • then add the following block to the file

        [Webmin]
        name=Webmin Distribution Neutral
        #baseurl=http://download.webmin.com/download/yum
        mirrorlist=http://download.webmin.com/download/yum/mirrorlist
        enabled=1

      • rpm --import http://www.webmin.com/jcameron-key.asc
      • yum install -y webmin
      • vi /etc/webmin/miniserv.conf
      • add the following line to the file

        sudo=1

      • firewall-cmd --zone=public --add-port=10000/tcp --permanent
      • firewall-cmd --reload
      • chkconfig webmin on
      • service webmin restart

Let’s break this down,

“sudo -i” logs us in as root, dangerous but you will need to sudo everything if you don’t, so you might as well just login as root.

yum update , updates all the software on your machine.

yum install perl  perl-Net-SSLeay openssl perl-IO-Tty perl-Encode-Detect, installs all the prerequisites to enable logging in using unix users.   Why this isn’t marked as requirements in the RPM, I have no idea.

Creating the webmin.repo allows webmin to be installed, and kept up to date via yum.

The rpm import statement is there to get the GPG key that the software package is signed with.  This allows yum to validate that the software install package is what the publisher created. In truth, it is actually rpm that does the actual verification and install, while yum is used to check for updates and downloading the update.

Modification of the miniserv.conf file is essential to let users that can sudo to be able to login, otherwise you can only log in as root.

The firewall rules reconfigure firewalld to allow access to webmin, and reload the configuration.

The chkconfig, enables the service to start automatically on boot.

Finally, the “service restart” command starts, or restarts, webmin.

Pengdows.CRUD

This very helpful wrapper over ADO.NET has been released to NuGet free of charge and will soon be released as open source. It is now and will forever be free to use.

At the moment, I have only included the .NET 4.0 binary for the moment, this will be remedied soon. For the moment, create a .NET 4 application.

Here is some example code with comments on basic functionality, just to jump start you.

//create a context to the database, this will allow us
// to create objects of the correct type from the factory
// as well as find out about things like quote prefix and
// suffixes. You may either choose a connectionstring and
// provider name, or simply pass they "name" of a
// connectionString entry in the config file.
var context = new DatabaseContext("dsn");

//create a container for the SQL and parameters etc.
 var sc = context.CreateSQLContainer();

//write any sql, I am making sure to create it using
//the providers quotes. The SQLText property is a
//StringBuilder, allowing for complext string manipulation
sc.SQLText.AppendFormat(@"SELECT {0}CategoryID{1}
  ,{0}CategoryName{1}
  ,{0}Description{1}
  ,{0}Picture{1}
 FROM {0}Categories{1}
 WHERE {0}CategoryID{1}=", context.QuotePrefix, context.QuoteSuffix);

//create a parameter, automattically generating a name
//and attaching it to sqlcontainer
var p = sc.AddWithValue(DbType.Int32, 7);

//append the name of the parameter to the SQL string
//if the provider only supports positional parameters
// that will be used. However, if named parameters are
// supported, the proper prefixing will be used with the
//name. For example, @parameter for SQL Server, and
// arameterName for Oracle.
sc.SQLText.AppendFormat(context.SQLParameterName(p));

//write the resulting SQL for examination by the programmer
Debug.WriteLine(sc.SQLText);

// get a datatable
var dt = sc.ExecuteDataTable();

// get the first row of the datatable
var row = dt.GetFirstRow();

//loop through and output all the data to the screen.
foreach (var itm in dt.Columns.OfType())
{
     Console.WriteLine("{0}: {1}", itm.ColumnName, row[itm]);
}

So this is easy, but why would you want to use this? What does it provide over plain ADO.NET, or EnterpriseBlocks?

Here are some of the benefits.

  • Self-contained blocks for execution.
    • SQLContainers – know which database to execute against
      • Carry the SQL and parameters in 1 encapsulated object
      • Adds “ExecuteDataSet” and “ExecuteDataTable” functions, making getting disconnected DataSet and DataTable objects easy. Also, exposing the DataTable, eliminates the overhead of always getting a DataSet, when only a single table is needed.
      • Changes the default on DbDataReaders to automatically close the connection upon closing of the object, based the ConnectionMode.
    • DatabaseContext – Encapsulates much of the programming people skip
      • Using a factory to create the connections
      • Interrogates the provider to determine
        • If there is support for named parameters
        • What the quoting characters are, defaulting to SQL-92 standard double-quotes ( ” ).
        • If there is support for stored procedures.
        • What is the named parameter indicator (such as an @ for SQL Server or : for Oracle).
        • Validates connection string
        • Will automatically read from the “ConnectionStrings” area of the app.config or web.config
        • Allows you to specify connection mode
          • Standard – uses connection pooling, asking for a new connection each time a statement is executed, unless a transaction is being used.
          • SingleConnection – funnels everything through a single connection, useful for databases that only allow a single connection.
          • SqlCe – keeps a single connection open all the time, using it for all write access, while allowing many read-only connections. This prevents the database being unloaded and keeping within the rule of only having a single write connection open.
          • SqlExpressUserMode – The same as “Standard”, however it keeps 1 connection open to prevent unloading of the database. This is useful for the new localDb feature in SQL Express.
        • Sets up SQL Server connections with the proper options to support Indexed Views.
        • Homogenizes connection strings using the DbConnectionStringBuilder class.

Building a BeagleBone Firewall: Part 6

Now we are ready to plug in the USB ethernet connector. I prefer to make this ethernet connection, the connector to your internet provider, but there is nothing to say you can’t make it your LAN connection.

If you don’t have any previous networking experience, LAN means Local Area Network, this is what will be behind your firewall, hidden and protected from the outside world. The way we are going to setup the firewall, all the computers behind it will look like a single computer. By contrast, the internet is a Wide Area Network, or WAN for short.

Make sure your USB ethernet adapter is plugged in, and run the following command.

lsusb

You should see something like the following for the output

Bus 001 Device 004: ID 1267:0103 Logic3 / SpectraVideo plc G-720 Keyboard
Bus 001 Device 003: ID 0b95:7720 ASIX Electronics Corp. AX88772
Bus 001 Device 002: ID 2109:2811  
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 002 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub

In my case, that ASIX Electronics Corp. line is my USB ethernet. This is very good, this means I don’t have to compile a new linux kernel module for it. Now, we want to see a little more information about it. Enter the following into the console.

ifconfig

And you will get something like the following for output

eth0      Link encap:Ethernet  HWaddr d0:39:72:54:4d:e7  
          inet addr:10.0.1.1  Bcast:10.0.1.255  Mask:255.255.255.0
          inet6 addr: fe80::d239:72ff:fe54:4de7/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:22296295 errors:0 dropped:118 overruns:0 frame:0
          TX packets:32827682 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:1367972463 (1.3 GB)  TX bytes:2528887318 (2.5 GB)
          Interrupt:40 


lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:1644 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1644 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:139327 (139.3 KB)  TX bytes:139327 (139.3 KB)

usb0      Link encap:Ethernet  HWaddr ba:67:28:61:85:ea  
          inet addr:192.168.7.2  Bcast:192.168.7.3  Mask:255.255.255.252
          UP BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

rename3   Link encap:Ethernet  HWaddr b6:c3:97:fe:20:c0  
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:34242743 errors:593 dropped:0 overruns:0 frame:593
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)    

This shows our network connections.  “usb0” is NOT our usb to ethernet adapter, rather it is something that is pre-configured in our ubuntu distribution for the beaglebone, I must admit, I am not 100% sure what good it is. “rename3” is the item we are looking for,  as you can see we need to setup an IP for the adapter, and the name of “rename3” is rather obnoxious.  To satisfy my inner “Monk“, and because I am a lazy typist, I want to make the name shorter and more meaningful, thus we will rename the adapter to reflect its purpose “wan0”.

We will need to take note of the HWaddr, which is the MAC address, so we can edit the next file to rename the adapter. To do this renaming, open the file using your text editor of choice/


sudo nano /etc/udev/rules.d/70-persistent-net.rules

You should see a file that looks like



# Auto generated by RootStock-NG: setup_sdcard.sh
# udevadm info -q all -p /sys/class/net/eth0 --attribute-walk

# BeagleBone: net device ()
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{dev_id}=="0x0", ATTR{type}=="1", KERNEL=="eth*", NAME="eth0"

At the end of the file, add the following line, making sure to replace the MAC address from your adapter.


# USB device 0x:0x (AX88772)
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="b6:c3:97:fe:20:c0", NAME="wan0"

Essentially, we are adding a device to the net(working) subsystem, uniquely identifying it by the MAC address (you did remember to change it to yours, right?).  Save the file, and do a clean reboot with the following command

sudo reboot

When the it finishes the reboot, run the ifconfig command again, to verify the adapter was correctly renamed.

Now, we need to setup the adapter to retrieve an IP address from your ISP, to do that we need to edit /etc/network/interfaces

sudo nano /etc/network/interfaces

Now add the following at the bottom of the file

# The WAN(internet) network interface
auto wan0
iface wan0 inet dhcp

Like many scripts and configurations in the Unix world the “#” tells the system to ignore this line, so you can put in stuff that is meaningful to you.  Programmer call these lines “comments”.

auto wan0 tells the system to bring up this network interface upon boot.

iface wan0 inet dhcp tells the system, for interface (iface) wan0, get a version 4 internet (inet)  address from DHCP.  Because my ISP doesn’t support IPv6, I won’t set that up right now.  If you have a static IP from your ISP, or want to do additional things, please refer to the debian documentation.

Now we need to setup a very minimal firewall, so it is safe for us to connect to the internet, and make sure all these changes work.  That will be part 7.

Building a BeagleBone Firewall: Part 5

At this point we have a pretty nice little linux box, quite acceptable for doing many things. If we add a stateful firewall, it would make an acceptable kiosk machine.

However, we have a pretty big security hole, we should fix right now. You see, the image we used to put linux on the eMMC and microSD card have pre-installed SSH keys. Which means, every single machine that is installed with these images have the exact same set of public and private keys.  If you don’t understand what that means, that is ok, but suffice to say, if we don’t fix it there is a major security hole.  So lets fix it.

First we want to remove all the old host keys, but not the config files, so from the console issue the following command.

sudo rm -rf /etc/ssh/ssh_host_*

Now, we will want to generate, the new keys.

sudo dpkg-reconfigure openssh-server

Finally, we need to restart the ssh server.

sudo service ssh restart

For more info on why we want to do this, read this.  I highly recommend that you shut the BeagleBone done, pop out the microSD card, boot from the eMMC (simply boot without the microSD) and repeat this process on that OS as well.  Of course, when you are done, shutdown the BeagleBone, put it the microSD back in, and boot back up.

Next up, we will configure the usb to ethernet adapter.

Building a BeagleBone Firewall: Part 4

One of the main reasons for building this device, it to make sure the software is updated (patched) regularly.   I have a multifaceted strategy to do that.

Before we go further, it is a good time to decide if you want XWindows (or simply X) on your firewall.  X makes, using the machine and configuring it more friendly.  Just like anything else though, the more software you have installed, the more software can be exploited.   If you wish to remove X and all its components it is easiest to run the following at the command line, then get a cup of coffee, this will take a while.

sudo apt-get purge libx11.* libqt.* libgd3 -y

If this fails because libgd3 isn’t installed repeat the command without it.

If you choose to keep X, it is helpful to be able to get to it remotely, you can do this via the Microsoft Remote Desktop Protocol, or VNC by adding a single package.  To install this package, “xrdp”, run the following command

sudo apt-get install xrdp -y

Now lets update all the software on the machine. Updating the software on an Ubuntu or Debian machine is really easy.

Make sure your machine is connected to the internet. Get to a command line, like the console, via SSH, or using something like xterm or terminal. Then type the following command and hit enter, then put in your password, so you get root access.

sudo apt-get update;sudo apt-get dist-upgrade -y;sudo -y --purge autoremove;sudo reboot

So lets explain this a little bit.  The semicolons separate the commands.  In Debian based systems, apt-get is the basic command to work with software packages. There is a huge library of available software available for free, and like the Google Play store, they are able to be installed, removed, and updated using this command, or one of the many wrappers over it.

apt-get update

Updates the local copy of what is available, and versions.

apt-get dist-upgrade -y

“dist-upgrade” tells apt-get to install all software updates, and the “-y” says, “just answer yes”.

apt-get -y --purge autoremove

“autoremove” tells apt-get to remove all software packages that are no longer needed.  “-y” again means answer yes “–purge” says to remove all associated config files, leaving the system squeaky clean.

sudo reboot

For the most part, this isn’t necessary, only a kernel upgrade truly requires a reboot.

But what about automating the updates, so they happen in timely basis, it is a pain to login every day, run these commands, and reboot if necessary.  There is a package in Debian systems that will automatically  install all security updates called “unattended-upgrades”, so lets install it.  Go to the command line again, and install the package by typing the following command.

sudo apt-get install unattended-upgrades -y

Hopefully, you will get a message that says it is already installed, then, use the following command to configure the package to automatically install all the updates, with this command

sudo dpkg-reconfigure -plow unattended-upgrades

However, this will neither reboot the machine when an updates requires it, nor will it remove unused packages, nor will it install non-security updates.  Also, “autoremove” isn’t terribly efficient at removing unneeded software packages.

There is a package called “deborphan”, it will find unused packages, and can be used in combination with apt-get to help keep things clean. The following command, will show you all software packages that don’t really need to be installed, we will make more use of this in a moment.


deborphan --guess-all

So lets make some scripts to help keep things clean.  Lets start with removing old kernels.  Old kernels can take up a huge amount of space.  However, we do not want to remove the kernel we are using, so borrowing from another page as a starting point, we get the following command.  I did add the “grep -v `uname -r`” because the original command did have a problem of removing ALL kernels from the system (which is a great reason to have the backup OS installed on the eMMC).  If you are not familiar with Unix editors like emacs, vim or vi, I suggest you use nano to create follow files.  This first file will be “remove-old-kernels.sh”.  To create it using nano, use the following command:

sudo nano /bin/remove-old-kernels.sh

then copy the following text into the file.

dpkg -l 'linux-image-*' | sed '/^ii/!d;/'"$(uname -r | sed "s/\(.*\)-\([^0-9]\+\)/\1/")"'/d;s/^[^ ]* [^ ]* \([^ ]*\).*/\1/;/[0-9]/!d' | grep -v `uname -r` | xargs sudo apt-get -y purge

A short explanation of the above command, is as follows. The “dpkg -l” portion lists all installed kernels.   The two “sed” pipes grab the linux kernel versions from the installed list. The “grep -v” returns the list of installed kernels EXCEPT the one that is currently being used. “xargs” turns it all into an argument list, and finally “apt-get -y purge” removes everything in that argument list.

So our first cleanup script will be called “remove-old-kernels.sh” and, for lack of a better place, we will put it in the “/bin/” folder.

dpkg -l 'linux-image-*' | sed '/^ii/!d;/'"$(uname -r | sed "s/\(.*\)-\([^0-9]\+\)/\1/")"'/d;s/^[^ ]* [^ ]* \([^ ]*\).*/\1/;/[0-9]/!d' | grep -v `uname -r` | xargs sudo apt-get -y purge" 

next we need to make our autoupdate.sh script, so create it like you did the remove-old-kernels.sh script

sudo nano /bin/autoupdate.sh

and again copy the code

apt-get update
apt-get dist-upgrade -y
apt-get autoremove -y --purge
apt-get autoclean
apt-get purge -y $(deborphan --guess-all)

Lastly, we need a script to run the first two, then reboot, we will call it ‘autoupdate-and-reboot.sh’.

sudo nano /bin/autoupdate-and-reboot.sh

here is the code

/bin/remove-old-kernels.sh
/bin/autoupdate.sh
/sbin/reboot

Now, we have three scripts that can be used to keep the system squeaky clean, and updated. Of course, none of these neat little scripts will work until we tell linux that they should be able to be executed. So enter the following command which will do just that.

sudo chmod +x /bin/*.sh

Yes, you could list out the files individually, but since there shouldn’t be any other .sh files in the freshly built machine, I am not worried about accidentally making a rogue script executable.

You can now run the “autoupdate-and-reboot.sh” script anytime you like, to update all the software, and reboot the machine. Or add it to a cron job, to make sure it is kept up-to-date.