Postgres generate uuid primary key

11/30/2023

Recently, I wrote a client-side wallet app for cryptocurrency use case and it required transactions to be created and signed entirely on the client-side and so the transaction UUID had to be created on the client side (before signing) and I was surprised at how elegant and simple the application's logic turned out to be (both on the front end and back end). The thing is - BTREE would not have write amplification problem either, because the timestamps are going to be sequential (hence no updates to random leaf pages). If you create BRIN on timestamp, that's not going to have write amplification problem, and it'll be good for querying. But it's also a bit pointless, because BRIN on random data is pretty useless for querying (Well, at least the minmax indexes, are. This is why BRIN does not have the write amplification issue. If the 10 rows get appended to the same table page, that'll be just 1 write, with one FPI. If you have BRIN index on UUID column, this does not happen, because the index is not based on ordering but location in the table. So because btrees are based on ordering, random values end up on random leaf pages, causing write amplification. And every first update of a page after a checkpoint (which typically happens every 30 minutes or so), we have to write a FPI (i.e. Imagine inserting 10 random UUID values into a large index - it's pretty likely those will go into 10 different leaf pages. The main source of write amplification comes from updating random pages of the btree index. This is a bit confusing, as it mixes two things - BRIN index and index on a timestamp. IMO, unless you're building an app for high-frequency trading, auto-incrementing IDs aren't worth the pain and lack of flexibility. Also, you could simply index by a separate date/timestamp field if you need records to be ordered by time and probably won't incur any performance cost. I suspect that the average per-query performance loss for a typical app is probably less than 5%.

How often do you need to access 10 million contiguous records? Most of the time, for user-facing apps, you'll be accessing 100 contiguous records at most and, in fact, most queries will access a single record. COUNT queries aren't very efficient because typically, all records are traversed here we're talking about a 50% slowdown on 10 million records. UUID actually performs much better than I thought based on the author's example with a COUNT query. I can't think of many use cases where I would sacrifice the beauty and elegance of UUIDs to optimize access times by a millisecond or two.

0 Comments

Postgres generate uuid primary key

Leave a Reply.

Author

Archives

Categories