Finding uniqueness
select * from prod.public.customer limit 100
100 rows and 73 columns were returned to me. Their names were ciphers, inviting you to discover their hidden truths. id. user_id. customer_id. lead_id. account_id. anonymous_id. Would the real primary key please stand up?
All of the values in anonymous_id and lead_id were null in this sample, so those were easily eliminated.
select user_id, count(*) from prod.public.customer group by 1 order by 2 desc limit 100
I could type no other sentence in the English language from memory faster. 10 times a day times 250 working days in a year times 10 years equaled 25,000 incantations of select <insert_col>, count(*) from <insert_table_name> group by 1, order by 2 desc limit 100. The query returned
bbd33ea228ab5528c558659338569ac3, 15
348620d4b5905a61e6813edc9ac52468, 14
7c9e1fb19c93958dd2afe38954462c52, 14
…
I didn’t need to scroll further. Another candidate promptly eliminated.
My investigation was interrupted by a sound like a kernel popping over a stove top. My focus shifted to my chat application. A question from “John” in accounting:
“When I join the customer table to the streams table, why do some streams have a null product type? Shouldn’t every stream have a product type?”
Cmd+Shift into a SQL terminal
select * from prod.public.streams limit 100
Only 15 columns. product_type was one of those.
select product_type, count(*) from prod.public.streams group by 1 order by 2 desc limit 100
paid, 27500
free, 26340
trial, 10009
No nulls. Hmm.
“What join key are you using”, I replied. “customer_id”, John fired back. Cmd+Shift to terminal.
select * from prod.public.streams where customer_id is null
No results.
select * from prod.public.customer where customer_id is null
175 results. “Can you send me your query”. John sent back a 153 line mess, but all I needed was the from statement.
<garbage> <garbage> <garbage>
...
from prod.public.customer left join prod.public.streams using(customer_id)
...
<garbage> <garbage> <garbage>
“Not all customers have a customer_id, so when you left join to streams, some customer records won’t join properly and you’ll have null product_types. You probably want to switch the order of the join i.e. streams left join customer.”
“Wow our data is so screwed up”, replied John. I didn’t respond. So customer_id wasn’t the primary key either. That left id or account_id.
select id, count(*) from prod.public.customer group by 1 order by 2 desc limit 100
First result returned a count of 1. Cool, but this was probably just the default primary key created by Postgres that by definition was unique. What I really wanted to know was if each row represented a unique “user” in the abstract sense of the word.
Only account_id remained. I could feel the anticipation bubble through my fingertips as I conjured those magic words for what I hoped would be the last time.
select account_id, count(*) from prod.public.customer group by 1 order by 2 desc limit 100
978febe48756dbf30b863da6a83eec17, 2
6f1d4e4cdff9e5f85623c766c5390be1, 1
ae0a9128cc201f46bbe5dfa8252ab288, 1
…
I was so close to tasting victory. This next one would be my last query for sure.
select username, email, created_at from prod.public.customer where account_id = ‘978febe48756dbf30b863da6a83eec17’
Kenny Ning, kenny@kndata.co, 2021-04-29T22:27:12+00:00
Kenny Ning, kenny@kndata.co, 2021-04-28T21:23:39+00:00
What was this now? A distorted portrait of myself coldly stared back at me from this warped data ware-fun-house mirror. I could hear the database gods mocking me from the clouds, a deity’s finger drunkenly hovering over the switch that would revoke my admin privileges. In a cruel twist of fate, the evil I was rooting out was no other than myself. I had spent my entire life trying to be different, and yet the data was indisputable. I failed even the most basic uniqueness test.
delete from prod.public.customer where email=‘kenny@kndata.co’
Nobody would know I had existed. A small price to pay for a table that satisfied first normal form. I only had one thing left to do.
select account_id, count(*) from prod.public.customer group by 1 order by 2 desc limit 100
6324609a915c34a196c56480fddc4194, 1
b5b82625574194e1012688c827caf1af, 1
0bf0593528c97f35407425f8382a1468, 1
…
This must be what heaven was.