, ,

Columnstore Indexes – Always Faster Uh?

Columnstore indexes promise to be the savior of every data warehouse. So, what are they, when should you use them, when to stay away? Columnstore indexes are just what they sound like, data physically stored in a columnar way. This is what makes them so fast when it comes aggregating large amounts of data. The data is compressed and similar values are stored together, the database engine can grab all the values it needs to SUM for example, very quickly, this all leads to faster query results.

It’s also important to remember that seeks on specific data sets are better with row-store. At what point does a table size benefit from a columnstore index? It’s hard to find any specific recommendations, so we are going to test it out.

First, let’s create a common scenario, a table tracking sales.

CREATE TABLE dbo.Sales
	(
		SalesOrder BIGINT IDENTITY(1,1),
		OrderEntryDate DATETIME NOT NULL,
		ProductID INT NOT NULL,
		SalesAmount MONEY NOT NULL
	);

ALTER TABLE Sales ADD PRIMARY KEY (SalesOrder);

CREATE NONCLUSTERED INDEX IX_SalesDate_Amount   
    ON dbo.Sales (OrderEntryDate) INCLUDE(SalesAmount);
CREATE NONCLUSTERED COLUMNSTORE INDEX [IXC_SalesDate_Amount] ON [dbo].[Sales]
([OrderEntryDate], [SalesAmount])

I’m thinking to start with 1 million records, go up to about 100 million, testing query performance on columnar vs row-store indexes. Time to fill’er up.

DECLARE @OrderEntryDate as DATE, @ProductId as INT, @Amount as MONEY, @Counter as INT, @DateCounter as INT;

SET @Counter = 1; SET @OrderEntryDate = '1/1/2019'; SET @ProductId = 1; SET @Amount = 20; SET @DateCounter = 0;

WHILE @Counter < 1000000
	BEGIN
		SET @Amount = RAND()*(1000-5)+5; SET @ProductId = ABS(CHECKSUM(NEWID()) % 6) + 1;
		IF @DateCounter > 2700 BEGIN SET @DateCounter = 1 END
		SET @OrderEntryDate = 
			CASE WHEN @DateCounter = 2700 THEN
				   (SELECT DATEADD(day,1,@OrderEntryDate))
				ELSE @OrderEntryDate
			END 

		INSERT INTO dbo.Sales (OrderEntryDate, ProductID,  SalesAmount)
		VALUES (@OrderEntryDate, @ProductId, @Amount);
		SET @Counter = @Counter + 1; SET @DateCounter = @DateCounter + 1
	END;

Here is the simple query we will use to test.

set statistics time on
SELECT YEAR(OrderEntryDate) as Year, MONTH(OrderEntryDate) as Month, SUM(SalesAmount) as Sales
FROM dbo.SAles with(nolock)
GROUP BY YEAR(OrderEntryDate), MONTH(OrderEntryDate)
set statistics time off

Let’s try with Columnstore Non Clustered index first.

times for 1 million records on Rowstore index.
times for 1 million records no Columnar index.

CPU time down, total elapsed time up. Interesting. The normal rowstore index provides better performance.

Let’s jump to 30 million records. Here is the time for the columnstore index. Pretty quick!

times for 30 million records on Columnar index.
times for 30 million records on Rowstore index.

Now we can really see where the difference comes in! Imagine the difference once you hit 100 million records. Many times in data warehousing environments a lot of the analytical queries will be running on large amounts of rows, many times the whole table. Such a simple change as the type of Index you use can make a big difference.

It pretty apparent you would want to probably only use columnstore indexes on your large Fact tables, where most of your aggregation is done. Based on the results of 1 million vs 30 million, it makes sense to use them when you have Fact tables with 10’s of millions of rows, not on dimension lookup’s what that many records, that are probably seeking specific records anyways.