Where Good Data Warehousing Goes Wrong.

It wouldn’t be the first time.

The story is usually the same, lots of people, contractors, software installation, months of ETL work, months of database work, testing testing and more testing. And then it arrives, a beautiful spiffy Enterprise Data Warehouse with all it’s facts and dimensions in all their Kimball glory.

The dudes in IT are all patting each other on the back. The report developers are hard at work. The business is happy, the Data Warehouse is now live and now all that data is at their finger tips, it’s the dawn of a new era.

After another few weeks of work the Sales team gets their first report, they don’t want anything fancy, just some yearly sales numbers by month of course. The Product team is soon to follow, they want margin information, along with just a little sales data to keep their eye on things.

Then it happens.

The product guy asks the question

“Hey!  My sales number for all my product lines don’t match the total that shows on the Sales team’s dashboard year to date number?!”

Of course the discrepancy flies through the ether hither and thither, and now everyone is wondering why this expensive and buzz wordy EDW didn’t solve the problems we had when we were using Excel.

A good data warehouse just want bad. Why?

Well a few weeks ago when the two Report Developers were putting together the dashboards one guy went and asked the Business Analyst “What’s the official calculation for our sales? Are there things we don’t count? Show me.” In the cubicle next door the other Report Developer was hacking away on another dashboard and thinking “I know how to calculate these numbers, I simply SUM up this column.”

In one fell swoop a year of work and confidence went down the toilet using the first two reports developed.

What’s the answer to keeping a good data warehouse from going bad? Data Marts.

Just because a data warehouse does have a single source of the truth doesn’t mean the work stops there. A data warehouse in itself is just something to keep a few people in IT busy with a job. The whole point is to extract intelligent business insights from the data contained in it. Enter the Data Mart.

What is Data Mart? A Data Mart is just a bunch of data that is already in the Data Warehouse, yes duplicate data, just in a form that is more usable and ready to be consumed by a reporting layer. Lets take the inaccurate sales problem we saw above.

Way number 1. Expect all Report Developers to memorize the business rules surrounding every aspect of the business and write reports accordingly and perfectly every single time.

Way number 2. Data Mart. Take an agreed upon standard and calculation, in a repeatable manner, usually in ETL, apply that business logic and output the sales data filtered and summed in a consumable format that can be used by every single report that needs a sales number in it.

Do Data Marts take more blood, sweat, and tears up front? Yes. Do they require people to agree on a single calculation and business rules? Yes. When used properly do they ensure data for a certain topic will be the same across all business units so everyone can “trust” the data? Yes.

Data Marts. Storage is cheap. Do the work once, filter, apply rules, aggregate, and watch your data discrepancies disappear. Spend time working on things that matter, not trying to figure out why two reports don’t match each other.