Tuesday, May 16, 2017

Create a Whitelist for SQL Server - UPDATED!!!

A while back, I posted an article about creating a WhiteList for access to SQL Server. Since then I have received a bit of feedback that it was not working as designed. My apologies for taking so long, but I believe I have come up with a fix.

The main issue is the trigger will block some or even all access to the server after it’s created. As it turns out, the issues were really permission being denied. To see it in action, let’s create everything using the original code from here.

We’ll add 1 row to the WhiteList table should allow all users from the workstation, ECHOBASE1, access, regardless of its IP address.

USE master;
GO
INSERT dbo.WhiteList(LoginName,HostName,HostIpAddress)
VALUES ('*','ECHOBASE1','*');
GO

Next, we’ll create a SQL login with only connect permission to the server and nothing else.

CREATE LOGIN LogonTriggerTest WITH PASSWORD = 'Password1';
GO

Finally, we’ll open a new query window using that login.


As you can see, we are denied access to the server because of the logon trigger. If we look in the Errorlog, we can see that we lack the VIEW SERVER STATE permission.


This was my first mistake. I did my initial testing using an administrative login. Most users are not admins; therefore, they will not have the permission required to view sys.dm_exec_connections. I was using this DMV to get the IP address of the client connection, and it requires the VIEW SERVER STATE permission. To get around this, I can use the CONNECTIONPROPERTY function, as it does not require any additional permissions.

SELECT CONNECTIONPROPERTY(‘client_net_address’);
GO

Now let’s try to connect again.


Again, we failed. This would be my second mistake. I failed to grant SELECT access to the WhiteList table. By default, a user is will have public permission to the master database, but no permission to the table. To solve this, we can grant permission to the public database role. This will allow any authenticated user to read from the WhiteList table.

USE master;
GO
GRANT SELECT ON dbo.WhiteList TO public;
GO

Finally, our connection to SQL Server is successful. Using the same code from the trigger, we can compare it what’s in the WhiteList table.

USE master;
GO
SELECT
   ORIGINAL_LOGIN() AS 'LoginName'
  ,HOST_NAME() AS 'HostName'
  ,CONNECTIONPROPERTY('client_net_address') AS 'HostIpAddress';
GO
SELECT * FROM dbo.WhiteList;
GO


If I had followed my own rules, I could have discovered most of these issues before posting the original article.

The fully updated code is below. Please let me know if you run into any other issues with this new version. I also added another column to the WhiteList table that can be used for hold comments. The idea is to provide some documentation about what the white-listed item is attempting to do.

USE master;
GO

IF OBJECT_ID('dbo.WhiteList') IS NOT NULL
  DROP TABLE dbo.WhiteList;
GO

CREATE TABLE dbo.WhiteList
(
   Id INT IDENTITY(1,1) PRIMARY KEY
  ,LoginName VARCHAR(255)
  ,HostName VARCHAR(255)
  ,HostIpAddress VARCHAR(50)
  ,Comments VARCHAR(2000)
);
GO

GRANT SELECT ON dbo.WhiteList TO PUBLIC;
GO

INSERT dbo.WhiteList(LoginName,HostName,HostIpAddress,Comments)
VALUES
   ('*','ECHOBASE1','*','Any user from the workstation "ECHOBASE1" is allowed to connect, regardless of IP address.')
  ,('WebSiteLogin','webserver1','192.168.100.55','Only the WebSiteLogin from webserver1 with an IP of 192.168.100.55 is allowed access.');
GO

CREATE TRIGGER WhiteListTrigger
ON ALL SERVER FOR LOGON
AS
BEGIN
  DECLARE 
     @LoginName VARCHAR(255) = ORIGINAL_LOGIN()
    ,@HostName VARCHAR(255) = HOST_NAME()
    ,@HostIpAddress VARCHAR(50) = CONVERT(VARCHAR(50),CONNECTIONPROPERTY('client_net_address'));

  IF 
  (
    SELECT COUNT(*) FROM dbo.WhiteList
    WHERE 
    (
      (LoginName = @LoginName) OR (LoginName = '*')
    )
    AND
    (
      (HostName = @HostName) OR (HostName = '*')
    )
    AND
    (
      (HostIpAddress = @HostIpAddress) OR (HostIpAddress = '*')
    )
  ) = 0
  ROLLBACK;
END;
GO

Tuesday, May 2, 2017

Dear McAfee, Why are you destroying my buffer pool?

For the past few months, one my customers had been trying to diagnose an issue with SQL Server paging out the entire buffer pool each morning around 8AM. See the error below.


Not only was this causing some serious performance issues, but it was also affecting every SQL Server in the environment.

I began my investigation with the standard troubleshooting techniques for buffer pool paging. One of the main workarounds that’s recommended is to enable the “locked pages in memory” permission for the SQL Server service account; however, this was not possible for this customer. Enabling that permission would violate one of their compliance rules, so I needed to find another solution.

The next stop on the investigation trail is the sys.dm_os_ring_buffers DMV. This is a great tool to help determine if the memory pressure came from within SQL Server or from the Windows OS.

WITH RingBuffer AS (
  SELECT
     CAST(dorb.record AS XML) AS xRecord
    ,dorb.timestamp
  FROM sys.dm_os_ring_buffers AS dorb
  WHERE dorb.ring_buffer_type = 'RING_BUFFER_RESOURCE_MONITOR'
)
SELECT
   DATEADD(ms, rb.timestamp-dosi.ms_ticks, GETDATE()) AS 'DateTime'
  ,xr.value('(ResourceMonitor/Notification)[1]', 'varchar(75)') AS RmNotification
  ,xr.value('(ResourceMonitor/IndicatorsProcess)[1]','tinyint') AS IndicatorsProcess
  ,xr.value('(ResourceMonitor/IndicatorsSystem)[1]','tinyint') AS IndicatorsSystem
FROM RingBuffer AS rb
CROSS APPLY rb.xRecord.nodes('Record') record (xr)
CROSS JOIN sys.dm_os_sys_info AS dosi
ORDER BY 'DateTime' DESC;
GO


Matching the time of our Errorlog entry, we can see the resource monitor reported RESOURCE_MEMPHYSICAL_LOW with an IndicatorsProcess of 0 and IndicatorsSystem of 2. This helps us determine that SQL Server was not the cause of memory pressure and that the issue was caused by another process running on the server. Since these servers are dedicated to SQL Server, I know there are very few applications installed that should be competing with SQL Server for memory.

The next place to check was the Windows event logs. In the Application Log, I found the buffer pool entry by SQL Server (EventID 17890), and then proceeded to search backwards in time. A few minutes before SQL’s error, I noticed this entry.


What is the McShield service? A quick Bing search revealed that it’s one of the services for McAfee VirusScan Enterprise. Could this be the cause? To get a quick look at all the history, I filtered the application log for event IDs: 17890 and 5000. Each time McAfee got an updated virus DAT file, SQL Server soon followed that by paging out the entire buffer pool. I checked the application log on several other SQL Servers for the same event IDs, and sure enough the same events occurred in tandem each morning. I also got confirmation from the security administration team that McAfee is scheduled to check for new DAT files each morning around 8AM. Eureka!

This seems like it could be the cause of our paging, but a little more research is needed. Searching the McAfee knowledge base, lead me to this article about the “Processes on enable” option.

Enabling this option causes memory pages of running processes to get paged to disk. And the example given is “Oracle, SQL, or other critical applications that need to be memory-resident continually, will have their process address space paged to disk when scan Processes On Enable kicks in”. OUCH! So when the McAfee service starts up or it gets a new DAT file, it will page out all processes.

To verify this was really the cause, I needed to setup a test environment. I installed McAfee VirusScan Enterprise (VSE 8.8 Patch 7) on a Windows 2012 R2 server running SQL Server 2016, and then setup HammerDB to generate some activity against the server.

I configured Performance Monitor to capture the “Process:Working Set” and “Process:Page Faults/sec” for the SQL Server process (sqlservr.exe). Once HammerDB had a little bit of time to get running, I manually issued a DAT file update through the McAfee GUI.


Once completed, I checked the Windows Application Log and I could see event ID 5000 that said McShield service had restarted with the newer DAT version 8506.


About 90 seconds later, SQL Server’s buffer pool was paged out, as evident by event ID 17890.


The McAfee knowledge base article said this would happen when a new DAT file is received or when the McShield server starts up, so I decided to manually stop and restart the McShield service. As expected, I saw the same two event IDs in the application log. Below is the data collected by Performance Monitor for both of those tests.


The red line is the Working Set of SQL Server and the blue line is the Page Faults/sec for SQL Server. The first dip in the red line was from the DAT file update, and the second dip was when I manually restarted the McShield service. Since SQL Server’s buffer pool had just been paged out, it needed to read those pages from disk which results in a page fault.

The next step was to verify McAfee had all the correct exclusions for SQL Server.


Now that we know the exclusions were set properly, we can assume the issue is caused by the "Processes on enable" setting as previously mentioned. Let's turn off that setting and rerun our tests


After giving HammerDB a few minutes to generate some activity, I issue a manual DAT file update through the McAfee GUI. The application log shows the Event ID 5000 with a new DAT version of 8507, but no record of Event ID 17890 and there was nothing in the SQL Errorlog about memory paging.


Next, we'll test the manual restart of the McShield service. And we get the same result; Event ID 5000 but no indication of memory paging. Looking at perfmon, the counters show SQL Server's Working Set and Page Faults were unchanged during the tests.


Once we completed these tests, we took that data back to the security team and asked them to turn off that setting within McAfee. They obliged and so we waited and monitored our SQL Servers for the next few weeks. During that time, we did not have any SQL Server page out memory during the times that McAfee was getting new DAT updates.

After doing a little more research on McAfee VSE, I discovered this setting "Processes on enable" is enabled if you install McAfee with the Maximum Protection level. The Standard Protection level does not have this option enabled.


The key takeaway from this whole experience is that even though a program may be a standard install for all servers doesn't mean it won't impact SQL Server. And even though it may have been fully vetted in the past, a change to its settings could impact SQL Server in the future.

The lab tests were done using McAfee VirusScan Enterprise 8.8 Patch 7. I know this behavior is still the same in VSE Patch 8, so I would assume it’s the same in the other patch levels as well.

Additional resources:
https://kc.mcafee.com/corporate/index?page=content&id=KB76157
https://blogs.msdn.microsoft.com/karthick_pk/2012/06/22/a-significant-part-of-sql-server-process-memory-has-been-paged-out/
https://support.microsoft.com/en-us/help/2028324/how-to-troubleshoot-sql-server-error-message-17890