Tuesday, July 29, 2014

How to Edit Read-Only Non-clustered Columnstore Data

As I've discussed in some of my previous posts, creating a non-clustered Columnstore index will make the index as well as the base table read-only. Which means you can’t insert, update, or delete any data until your drop the index. This may seem like a huge issue, but in reality it’s not that much of a problem. Keep in mind the Columnstore index feature is targeted at data warehouses that modify data infrequently. In the examples below, I go through two methods you can use to edit your read-only data.

To get started, we need to create a test table and insert a few rows.

USE master;
GO

CREATE DATABASE TEST;
GO

USE TEST;
GO

CREATE TABLE dbo.Table1
(
   col1 INT
  ,col2 VARCHAR(20)
);
GO

INSERT INTO dbo.Table1 (col1, col2) VALUES (1, 'Test Value 1');
INSERT INTO dbo.Table1 (col1, col2) VALUES (2, 'Test Value 2');
INSERT INTO dbo.Table1 (col1, col2) VALUES (3, 'Test Value 3');
INSERT INTO dbo.Table1 (col1, col2) VALUES (4, 'Test Value 4');
INSERT INTO dbo.Table1 (col1, col2) VALUES (5, 'Test Value 5');
INSERT INTO dbo.Table1 (col1, col2) VALUES (6, 'Test Value 6');
INSERT INTO dbo.Table1 (col1, col2) VALUES (7, 'Test Value 7');
INSERT INTO dbo.Table1 (col1, col2) VALUES (8, 'Test Value 8');
INSERT INTO dbo.Table1 (col1, col2) VALUES (9, 'Test Value 9');
GO

Next, we'll add a non-clustered Columnstore index to the table.

CREATE NONCLUSTERED COLUMNSTORE INDEX nci_Table1 
  ON dbo.Table1(col1,col2);
GO

At this point, we have effectively made this table read-only. We can read from it all day long, but if we attempt to update a value, we will get an error.

SELECT * FROM dbo.Table1;
GO


UPDATE dbo.Table1 SET col2 = 'changed a value' WHERE col1 = 1;
GO

Msg 35330, Level 15, State 1, Line 41 UPDATE statement failed because data cannot be updated in a table that has a nonclustered columnstore index. Consider disabling the columnstore index before issuing the UPDATE statement, and then rebuilding the columnstore index after UPDATE has completed.

Once again, I love Microsoft’s error message. It gives you very good information on one method to use for changing data in a non-clustered Columnstore index.

Disabling the non-clustered Columnstore index gives you the ability to make changes to the data; however, it can't be used for any queries while it's disabled. I like to wrap all the commands into a single transaction using the XACT_ABORT option. This guarantees that any error will rollback the entire set and not just one single statement.

SET XACT_ABORT ON;
GO

BEGIN TRANSACTION;
GO

ALTER INDEX nci_Table1 ON dbo.Table1 DISABLE;
GO

UPDATE Table1 SET col2 = 'changed a value' WHERE col1 = 1;
GO

ALTER INDEX nci_Table1 ON dbo.Table1 REBUILD;
GO

COMMIT TRANSACTION;
GO


Looking at the table again, we can see the first row was definitely changed.

SELECT * FROM dbo.Table1;
GO


This is probably the easiest method to change your data; however, it’s also the most resource intensive. SQL Server will need to rebuild the index for the entire table not just for that one row that we changed. So if your table has millions or even billions of rows, it could take a lot time to rebuild and utilize a lot of resources. This is probably something you don’t want to do in the middle of your business day.

The second method we’ll cover, involves using partition switching. First, we’ll create the same table but partition it into 3 parts.

USE TEST
GO

CREATE PARTITION FUNCTION myRangePF1 (INT) 
  AS RANGE LEFT FOR VALUES (3,6);
GO 

CREATE PARTITION SCHEME myRangePS1 
  AS PARTITION myRangePF1 
  ALL TO ([PRIMARY]);
GO 

CREATE TABLE dbo.PartitionedTable (col1 INT, col2 VARCHAR(20)) 
  ON myRangePS1 (col1);
GO

INSERT INTO dbo.PartitionedTable (col1, col2) VALUES (1, 'Test Value 1');
INSERT INTO dbo.PartitionedTable (col1, col2) VALUES (2, 'Test Value 2');
INSERT INTO dbo.PartitionedTable (col1, col2) VALUES (3, 'Test Value 3');
INSERT INTO dbo.PartitionedTable (col1, col2) VALUES (4, 'Test Value 4');
INSERT INTO dbo.PartitionedTable (col1, col2) VALUES (5, 'Test Value 5');
INSERT INTO dbo.PartitionedTable (col1, col2) VALUES (6, 'Test Value 6');
INSERT INTO dbo.PartitionedTable (col1, col2) VALUES (7, 'Test Value 7');
INSERT INTO dbo.PartitionedTable (col1, col2) VALUES (8, 'Test Value 8');
INSERT INTO dbo.PartitionedTable (col1, col2) VALUES (9, 'Test Value 9');
GO

CREATE NONCLUSTERED COLUMNSTORE INDEX nci_PartitionedTable 
  ON dbo.PartitionedTable(col1,col2);
GO

SELECT * FROM dbo.ParitionedTable;
GO


This table has the same data as before, but internally it's partitioned. Using sys.partitions to get the details, you can see there are a total of 3 partitions, each with 3 rows.

SELECT
     SCHEMA_NAME(t.schema_id) AS SchemaName
    ,OBJECT_NAME(i.object_id) AS ObjectName
    ,p.partition_number AS PartitionNumber
    ,fg.name AS FilegroupName
    ,rows AS 'Rows'
    ,au.total_pages AS 'TotalDataPages'
    ,CASE boundary_value_on_right
        WHEN 1 THEN 'less than'
        ELSE 'less than or equal to'
     END AS 'Comparison'
    ,value AS 'ComparisonValue'
    ,p.data_compression_desc AS 'DataCompression'
    ,p.partition_id
FROM sys.partitions p
    JOIN sys.indexes i ON p.object_id = i.object_id AND p.index_id = i.index_id
    JOIN sys.partition_schemes ps ON ps.data_space_id = i.data_space_id
    JOIN sys.partition_functions f ON f.function_id = ps.function_id
    LEFT JOIN sys.partition_range_values rv ON f.function_id = rv.function_id AND p.partition_number = rv.boundary_id
    JOIN sys.destination_data_spaces dds ON dds.partition_scheme_id = ps.data_space_id AND dds.destination_id = p.partition_number
    JOIN sys.filegroups fg ON dds.data_space_id = fg.data_space_id
    JOIN (SELECT container_id, sum(total_pages) as total_pages
            FROM sys.allocation_units
            GROUP BY container_id) AS au ON au.container_id = p.partition_id 
    JOIN sys.tables t ON p.object_id = t.object_id
WHERE i.index_id < 2
ORDER BY ObjectName,p.partition_number;
GO


To use partition switching, you have to create a non-partitioned table that matches your partitioned table in every way; including all the indexes and constraints. For this example, we need to edit row 1 that resides in the first partition, so we need to create a non-partitioned table that has a constraint that mimics the first partition; col1 <= 3.

CREATE TABLE dbo.NonPartitionedTable
(
   col1 INT CHECK (col1 <= 3)
  ,col2 VARCHAR(20)
);
GO

CREATE NONCLUSTERED COLUMNSTORE INDEX nci_NonPartitionedTable
  ON dbo.NonPartitionedTable(col1,col2);
GO

Once we have the table created, we can perform the switch.

SET XACT_ABORT ON;
GO

BEGIN TRANSACTION;
GO

ALTER TABLE dbo.PartitionedTable SWITCH PARTITION 1 TO dbo.NonPartitionedTable; 
GO

ALTER INDEX nci_NonPartitionedTable ON dbo.NonPartitionedTable DISABLE;
GO

UPDATE NonPartitionedTable SET col2 = 'changed a value' WHERE col1 = 2;
GO

ALTER INDEX nci_NonPartitionedTable ON dbo.NonPartitionedTable REBUILD;
GO

ALTER TABLE dbo.NonPartitionedTable SWITCH TO dbo.PartitionedTable PARTITION 1; 
GO

COMMIT TRANSACTION;
GO

Finally, all we have to do is just switch that data back into the partitioned table.

SELECT * FROM dbo.ParitionedTable;
GO


I know this may look exactly like what we did in the first method, and it is. However, by having our initial table partitioned, it gives us the ability to only rebuild the non-clustered Columnstore index on smaller subset of data instead of the entire table. If this partitioned table had millions of rows, then that ALTER INDEX REBUILD command might only need to run against a fraction of those rows and therefore complete much quicker and utilize far fewer resources. Not only can both of these methods can be used to edit existing data in a table that has a non-clustered Columnstore index, but they can also be used to insert or delete rows.

On a final note, I will recommend that you always partition any table that has a Columnstore index. This feature is designed for very large tables, and having it partitioned gives you much more flexibility than if it's not. And not just for working with Columnstore indexes, but other tasks as well.

You can read more about Columnstore indexes and partition switching from Books Online.

Tuesday, July 8, 2014

Columnstore Table Analyzer

As I’ve discussed in some of my previous posts, there are quite a few data types that cannot be part of a Columstore index. While there are fewer restrictions in SQL Server 2014, they still exist. I find myself constantly looking back at Books Online trying to make sure data types in my tables don’t contain any of those restricted data types. It would be much easier to know from day one which tables I need to redesign, or at least which columns I need to exclude from a non-clustered Columnstore index. This is why I have created the following script.

-- Find columns in user tables that cannot be included in a columnstore index.
-- These restrictions apply to both clustered and non-clustered columnstore indexes.
-- SQL Server 2014: http://msdn.microsoft.com/en-us/library/gg492153(v=sql.120).aspx
-- SQL Server 2012: http://msdn.microsoft.com/en-us/library/gg492153(v=sql.110).aspx


-- Get the version number of SQL Server
DECLARE @ServerVersion TINYINT = CONVERT(INT,SUBSTRING(CONVERT(VARCHAR,SERVERPROPERTY('ProductVersion')),1,(CHARINDEX('.',(CONVERT(VARCHAR,SERVERPROPERTY('ProductVersion'))))-1)))

IF @ServerVersion = 11
-- This section is only for SQL Server 2012
BEGIN
  SELECT 
     s.name AS 'SchemaName'
    ,o.name AS 'TableName'
    ,c.name AS 'ColumnName'
    ,'ColumnType' = CASE t.name
      WHEN 'decimal' THEN t.name + '(' + CONVERT(VARCHAR,c.precision) + ',' + CONVERT(VARCHAR,c.scale) + ')'
      WHEN 'numeric' THEN t.name + '(' + CONVERT(VARCHAR,c.precision) + ',' + CONVERT(VARCHAR,c.scale) + ')'
      WHEN 'varchar' THEN
        CASE c.max_length
          WHEN -1 THEN 'varchar(max)'
          ELSE 'varchar(' + CONVERT(VARCHAR,c.max_length) + ')'
        END
      WHEN 'nvarchar' THEN 
        CASE c.max_length
          WHEN -1 THEN 'nvarchar(max)'
          ELSE 'nvarchar(' + CONVERT(VARCHAR,c.max_length) + ')'
        END
      WHEN 'datetimeoffset' THEN t.name + '(' + CONVERT(VARCHAR,c.scale) + ')'
      ELSE t.name
     END
    ,'ColumnAttribute' = CASE 
      WHEN (c.is_filestream = 1) THEN 'Filestream'
      WHEN (c.is_sparse = 1) THEN 'Sparse'
      ELSE ''
    END
  FROM sys.columns c 
  JOIN sys.objects o ON c.object_id = o.object_id
  JOIN sys.types t ON c.user_type_id = t.user_type_id
  JOIN sys.schemas s ON o.schema_id = s.schema_id
  WHERE o.is_ms_shipped <> 1
  -- These types cannot be part of a SQL Server 2012 columnstore index
  AND (
    t.name IN 
         ('binary'
      ,'varbinary'
      ,'ntext'
      ,'text'
      ,'image'
      ,'uniqueidentifier'
      ,'rowversion'
      ,'timestamp'
      ,'sql_variant'
      ,'hierarchyid'
      ,'geography'
      ,'geometry'
      ,'xml')
  OR (
      -- Decimal & numeric cannot have a precision over 18
      t.name IN ('decimal','numeric') 
      AND c.precision > 18)
  OR (
      -- Varchar(max) and nvarchar(max)
      t.name = 'datetimeoffset'
      AND c.scale > 2)
  OR (
      -- Varchar(max) and nvarchar(max)
      t.name IN ('varchar','nvarchar') 
      AND c.max_length = -1)
  OR (
      -- Filestream
      c.is_filestream = 1)
  OR (
      -- Sparse
      c.is_sparse = 1)
  )
  ORDER BY s.name,o.name,c.column_id
END
ELSE IF @ServerVersion = 12
-- This section is only for SQL Server 2014
BEGIN
  SELECT 
     s.name AS 'SchemaName'
    ,o.name AS 'TableName'
    ,c.name AS 'ColumnName'
    ,'ColumnType' = CASE t.name
      WHEN 'varchar' THEN
        CASE c.max_length
          WHEN -1 THEN 'varchar(max)'
          ELSE 'varchar(' + CONVERT(VARCHAR,c.max_length) + ')'
        END
      WHEN 'nvarchar' THEN 
        CASE c.max_length
          WHEN -1 THEN 'nvarchar(max)'
          ELSE 'nvarchar(' + CONVERT(VARCHAR,c.max_length) + ')'
        END
      ELSE t.name
     END
    ,'ColumnAttribute' = CASE 
      WHEN (c.is_filestream = 1) THEN 'Filestream'
      WHEN (c.is_sparse = 1) THEN 'Sparse'
      ELSE ''
    END
  FROM sys.columns c 
  JOIN sys.objects o ON c.object_id = o.object_id
  JOIN sys.types t ON c.user_type_id = t.user_type_id
  JOIN sys.schemas s ON o.schema_id = s.schema_id
  WHERE o.is_ms_shipped <> 1
  -- These types cannot be part of a SQL Server 2014 columnstore index
  AND (
    t.name IN 
      ('ntext'
      ,'text'
      ,'image'
      ,'rowversion'
      ,'timestamp'
      ,'sql_variant'
      ,'hierarchyid'
      ,'geography'
      ,'geometry'
      ,'xml')
  OR (
      -- Varchar(max) and nvarchar(max)
      t.name IN ('varchar','nvarchar') 
      AND c.max_length = -1)
  OR (
      -- Filestream
      c.is_filestream = 1)
  OR (
      -- Sparse
      c.is_sparse = 1)
  )
  ORDER BY s.name,o.name,c.column_id
END
ELSE
BEGIN
  RAISERROR ('This script only works on SQL Server 2012 and SQL Server 2014.',16,1);
END
GO

Running this against your database will output an organized list of tables along with the column name and data type that cannot be used within a Columnstore index.


The script can be used if you plan to create a clustered or non-clustered index, since the data type restrictions would apply to both. The script can also be used to analyze databases in either SQL Server 2012 or 2014.

You can read more about the limitations and restrictions of Columnstore indexes in Books Online.