,

It’s Called a Non-Lookup Dude

Seriously…..It’s called a non-lookup dude. Probably one of the most annoying situations I’ve come across when working on Enterprise Data Warehouse {EDW} teams/projects is the non-lookup problem.

First, let’s start with the basics. You know what a database constraint is? No? Go away then….maybe…ok, yeah, go away. How about a foreign key? Let’s talk about them.

Instead of trying to be the smartest person on stackoverflow with this answer, let’s be human. Yeah, that sounds good. A constraint is a developers way of keeping business people from lying to you about something, which will happen anyways. Hey, all you Coders, Engineers, Analysts out there know what I’m talking about it, admit it. You deal with it every day. Let’s role play.

Business Person “Dude we are selling tons of widgets because I’m so awesome, can you store all these sales orders with who the sales person for that order was for me somewhere? Everyone needs to know that I am the best.”

Developer“Ok, no problem fancy pants…by the way, you always assign a sales person to every sales order right?”

Business Person “Seriously? The world would end before that would not happen. Go do you easy job…what’s taking you so long?”

The uncool developer run’s back to his desk full of paper, food crumbs, and Lord of The Rings statues, he quickly writes….

CREATE TABLE dbo.CoolSalesPeople (SalesPerson NVARCHAR(30),SalesPersonName NVARCHAR(30),
CONSTRAINT PK_CoolSalesPeople PRIMARY KEY CLUSTERED (SalesPerson))


CREATE table dbo.WidgetOrders (SalesOrder INT,SalesPerson NVARCHAR(30),
CONSTRAINT FK_SalesPerson FOREIGN KEY (SalesPerson)
REFERENCES dbo.CoolSalesPeople (SalesPerson)
ON DELETE CASCADE
ON UPDATE CASCADE )

INSERT INTO dbo.CoolSalesPeople (SalesPerson) VALUES (‘Fancy Pants 1’), (‘Fancy Pants 2’)

Later that day Fancy Pants salesperson jumps in his BMW and drives home, developer unlocks his bike from the rack and pedals away.

One year later as the some ETL is running in the new EDW, at about 1 a.m., it’s raining outside, in some dusty old server room the following code runs……

SELECT SalesOrder,SalesPerson
FROM dbo.RealOrders

SalesOrder SalesPerson
1234 Fancy Pants 1
1235 Fancy Pants 2
1236 New Fancy Pants

The results set is then used in the ETL to insert into dbo.WidgetOrders

INSERT INTO dbo.WidgetOrders (SalesOrder,SalesPerson) VALUES
SELECT SalesOrder,SalesPerson
FROM dbo.RealOrders

BAM!

Msg 547, Level 16, State 0, Line 11
The INSERT statement conflicted with the FOREIGN KEY constraint “FK_SalesPerson”. The conflict occurred in database “KREG_EDW_DEV”, table “dbo.CoolSalesPeople”, column ‘SalesPerson’.

The ETL fails, a error is thrown, the next day when Fancy Pants checks his sales report he sends an angry email the VP of IT saying ..”I told you that weirdo with the Lord of the Rings stuff on his desk can’t do anything right.”

Looks like Fancy Pants was selling so many widgets he had to hire Fancy Pants 3 and forgot to tell developer dude.

Ok, that was fun. I’ve seen developers and {EDW} teams handle this type of problem two ways.

 

  1. Baaaaaah! Who needs real constraints??? That’s for the birds. If we don’t put on any foreign key’s nothing will ever break and we will be the ones driving BMW’s!!
  2. Let’s get a list of the Fancy Pants Sales people every day, we will load that into our super perfect {EDW}, we will always beat them to it! There is no way someone would forget to answer a new Fancy Pants person into the system!

So, instead of being typical naive IT people who think we can outsmart everyone and everything else, we could design our ETL with non-lookup logic to handle every situation! What and idea. It’s called being prepared and it is written like this….

INSERT INTO dbo.CoolSalesPeople (SalesPerson) VALUES (‘Unknown Fancy Pants’)

One year later in an alternate universe as the some ETL is running in the new EDW, at about 1 a.m., it’s raining outside, in some dusty old server room the following code runs……  

SELECT RealOrders.SalesOrder,CoolSalesPeople.SalesPerson
FROM dbo.RealOrders
LEFT OUTER JOIN dbo.CoolSalesPeople ON RealOrders.SalesPerson = CoolSalesPeople.SalesPerson

SalesOrder SalesPerson
1236 NULL
1234 Fancy Pants 1
1235 Fancy Pants 2

Hey, that’s one step farther, at least now we know we have a sales order coming in for which we have no sales person.

Let’s take it a step farther.

SELECT RealOrders.SalesOrder,ISNULL(CoolSalesPeople.SalesPerson,’Unknown Fancy Pants’) as Salesperson
FROM dbo.RealOrders
LEFT OUTER JOIN dbo.CoolSalesPeople ON RealOrders.SalesPerson = CoolSalesPeople.SalesPerson

SalesOrder Salesperson
1235 Unknown Fancy Pants
1236 Fancy Pants 1
1238 Fancy Pants 2

This time when the code run’s no error is thrown.

INSERT INTO dbo.WidgetOrders (SalesOrder,SalesPerson)
SELECT RealOrders.SalesOrder,ISNULL(CoolSalesPeople.SalesPerson,’Unknown Fancy Pants’) as Salesperson
FROM dbo.RealOrders
LEFT OUTER JOIN dbo.CoolSalesPeople ON RealOrders.SalesPerson = CoolSalesPeople.SalesPerson

We’ve now anticipated those Fancy Pants being wrong. There are many ways to actually write the code to handle the un-known lookups, and this one was simplistic of course, but the basic idea is the same.

Prepare your {EDW} dimensions (Look-Up) tables with a value that is basically there when all else fails. Heck, you can even have reports looking at these tables looking for the non-lookups and notifying those dang Fancy Pants when it happens.

Do you really want a RDBMS without constraints, like foreign keys? No.They force integrity and keep things from getting out of control. If you have them do you want those constraints blowing up your ETL all the time when people lie to you? No. Then get ahead of them.