Why Nostr? What is Njump?
2024-07-11 10:17:40

Fabio Manganiello on Nostr: In principle, I agree with the statements made in this article. Natural keys (social ...

In principle, I agree with the statements made in this article.

Natural keys (social security numbers/BSN/CPR numbers for people, chassis numbers for cars, names+locations for restaurants etc.) sound like perfect candidates for database keys. But have one person change their name/gender, or a car getting a new chassis number because of a data entry mistake, or a restaurant changing name, and everything falls apart. Not to mention the uniqueness problem - can we always be able to prove that a key always identifies one and only one record, and the other way around?

That's why synthetic keys come to the rescue.

But, as a seasoned developer who has spent two decades working with databases, I've seen too many things go wrong with synthetic keys too.

Auto-increment integers and random UUIDs are great only as long as you are in a single-threaded environment where you don't have the risk of duplicate records concurrently processed by separate services/processes/threads, where the database is small enough that you don't have to worry about sharding by key, and where performance constraints are lax enough that you don't care of the cost of doing an insert-or-update-and-select in order to retrieve the ID of the record you've just inserted.

In my experience, the best of the two worlds can be achieved when you have deterministic synthetic keys generated on an immutable subset of the attributes.

If you can prove that a subset of columns is really unique and immutable, there's nothing wrong with generating a UUID/hash out of them, and using that as the primary key. That solves the insert-and-fetch and the consistent upsert problems basically for free. It's an approach that I've been following for the past few years in my jobs, and that allowed me to write software that can work on huge shard-friendly databases with nearly no data pollution and high read+write efficiency.

If anything actually changes on those columns, it's ok to create a new record. Just have a procedure in place to explicitly deal with those cases if needed, so anything wired to the old record can be rewired to the new one. But, in my experience, cases where you have to modify what's supposed to be an immutable set of attributes are a (very uncommon) exception, not the norm.

Your database should be optimized for the norm, not for the exception.

https://blog.ploeh.dk/2024/06/03/youll-regret-using-natural-keys/
Author Public Key
npub1s9uc08n58mxqk5umvapqulwzng0sja635q86r36d8n4rr9r9ygaskjdnmu