A technology blog by a self motivated DBA .: December 2015A Sql blog

Thursday, December 10, 2015

performance metrics

I found this on the internet and keeping here for quick reference

Memory Bottleneck Analysis

Object: - Memory
Counter: - Available Mbytes
Preferred Value: - > 20MB
Description: -
Reference: - KB 889654

Object: - Memory
Counter: - Free System Page Table Entries
Preferred Value: - > 7000
Description: - Free System Page Table Entries is the number of page table entries not currently in use by the system. If < 7000, consider removing /3GB.
Reference: - KB 311901

Object: - Memory
Counter: - Pages/Sec
Preferred Value: - < 50
Description: - Pages/sec is the rate at which pages are read from or written to disk to resolve hard page faults. This counter is a primary indicator of the kinds of faults that cause system-wide delays.
Reference: - Monitoring and Tuning Your Server

Object: - Memory
Counter: - Pages Input/Sec
Preferred Value: - < 10
Description: - Pages Input/sec is the rate at which pages are read from disk to resolve hard page faults.
Reference: - KB 889654

Object: - Paging File
Counter: - %Usage
Preferred Value: - < 70%
Description: - The amount of the Page File instance in use in percent.
Reference: - KB 889654

Object: - Paging File
Counter: - %Usage
Preferred Value: - < 70%
Description: - The peak usage of the Page File instance in percent.
Reference: - KB 889654

Object: - SQL Server:Buffer Manager
Counter: - Page Life Expectancy
Preferred Value: - > 300
Description: - This performance monitor counter tells you, on average, how long data pages are staying in the buffer. If this value gets below 300 seconds, this is a potential indication that your SQL Server could use more memory in order to boost performance.
Reference: -

Object: - SQL Server:Buffer Manager
Counter: - Lazy Writes/Sec
Preferred Value: - < 20
Description: - This counter tracks how many times a second that the Lazy Writer process is moving dirty pages from the buffer to disk in order to free up buffer space. Generally speaking, this should not be a high value, say more than 20 per second or so. Ideally, it should be close to zero. If it is zero, this indicates that your SQL Server's buffer cache is plenty big and SQL Server doesn't have to free up dirty pages, instead waiting for this to occur during regular checkpoints. If this value is high, then a need for more memory is indicated.
Reference: -

Object: - SQL Server:Buffer Manager
Counter: - Checkpoint Pages/Sec
Preferred Value: - This value is relative, it varies from server to server, we need to compare the average to a base line capture to tell if the value is high or low.
Description: - When a checkpoint occurs, all dirty pages are written to disk. This is a normal procedure and will cause this counter to rise during the checkpoint process. What you don't want to see is a high value for this counter over time. This can indicate that the checkpoint process is running more often than it should, which can use up valuable server resources. If this has a high figure (and this will vary from server to server), consider adding more RAM to reduce how often the checkpoint occurs, or consider increasing the "recovery interval" SQL Server configuration setting.
Reference: -

Object: - SQL Server:Buffer Manager
Counter: - Page reads/sec
Preferred Value: - < 90
Description: - Number of physical database page reads issued. 80 – 90 per second is normal, anything that is above indicates indexing or memory constraint.
Reference: -

Object: - SQL Server:Buffer Manager
Counter: - Page writes/sec
Preferred Value: - < 90
Description: - Number of physical database page writes issued. 80 – 90 per second is normal, anything more we need to check the lazy writer/sec and checkpoint counters, if these counters are also relatively high then, it’s memory constraint.
Reference: -

Object: - SQL Server:Buffer Manager
Counter: - Free pages
Preferred Value: - > 640
Description: - Total number of pages on all free lists.
Reference: -

Object: - SQL Server:Buffer Manager
Counter: - Stolen pages
Preferred Value: - Varies. Compare with baseline
Description: - Number of pages used for miscellaneous server purposes (including procedure cache).
Reference: -

Object: - SQL Server:Buffer Manager
Counter: - Buffer Cache hit ratio
Preferred Value: - > 90%
Description: - Percentage of pages that were found in the buffer pool without having to incur a read from disk.
Reference: -

Object: - SQL Server:Buffer Manager
Counter: - Target Server Memory(KB)
Preferred Value: -
Description: - Total amount of dynamic memory the server can consume.
Reference: -

Object: - SQL Server:Buffer Manager
Counter: - Total Server Memory(KB)
Preferred Value: -
Description: - Total amount of dynamic memory (in kilobytes) that the server is using currently
Reference: -

Disk Bottleneck Analysis
Object: - PhysicalDisk
Counter: - Avg. Disk Sec/Read
Preferred Value: - < 8ms
Description: - Measure of disk latgency. Avg. Disk sec/Read is the average time, in seconds, of a read of data from the disk.
More Info:
Reads or non cached Writes
Excellent < 08 Msec ( .008 seconds )
Good < 12 Msec ( .012 seconds )
Fair < 20 Msec ( .020 seconds )
Poor > 20 Msec ( .020 seconds )
Cached Writes Only
Excellent < 01 Msec ( .001 seconds )
Good < 02 Msec ( .002 seconds )
Fair < 04 Msec ( .004 seconds )
Poor > 04 Msec ( .004 seconds
Reference: -

Object: - PhysicalDisk
Counter: - Avg. Disk sec/Write
Preferred Value: - < 8ms (non cached) < 1ms (cached)
Description: - Measure of disk latency. Avg. Disk sec/Write is the average time, in seconds, of a write of data to the disk.
Reference: -

Object: - PhysicalDisk
Counter: - Avg. Disk Read Queue Length
Preferred Value: - < 2 * spindles
Description: - Avg. Disk Read Queue Length is the average number of read requests that were queued for the selected disk during the sample interval.
More Info:
< (2+ no of spindles) Excellent
< (2*no of spindles) Good
< (3* no of spindles) Fair

Reference - Whitepaper “Performance Monitoring in Windows 2003: Best Practices” by Ben W. Christenbury

Note: If the disk has say 20 disk and it is RAID 10 then no. of spindles = 20/2 = 10. If it is RAID 5 then the no. of spindles = no of disks = 20.
Reference: -

Object: - PhysicalDisk
Counter: - Avg. Disk Write Queue Length
Preferred Value: - < 2 * spindles
Description: - Avg. Disk Write Queue Length is the average number of write requests that were queued for the selected disk during the sample interval.
Reference: -

Object: - SQL Server:Buffer Manager
Counter: - Page reads/sec
Preferred Value: - < 90
Description: - Number of physical database page reads issued. 80 – 90 per second is normal, anything that is above indicates indexing or memory constraint.
Reference: -

Object: - SQL Server:Buffer Manager
Counter: - Page writes/sec
Preferred Value: - < 90
Description: - Number of physical database page writes issued. 80 – 90 per second is normal, anything more we need to check the lazy writer/sec and checkpoint counters, if these counters are also relatively high then, it’s memory constraint.
Reference: -

Object: - SQL Server:Buffer Manager
Counter: - Free pages
Preferred Value: - > 640
Description: - Total number of pages on all free lists.
Reference: -

Object: - SQL Server:Buffer Manager
Counter: - Stolen pages
Preferred Value: - Varies. Compare with baseline
Description: - Number of pages used for miscellaneous server purposes (including procedure cache).
Reference: -

Object: - SQL Server:Buffer Manager
Counter: - Buffer Cache hit ratio
Preferred Value: - > 90%
Description: - Percentage of pages that were found in the buffer pool without having to incur a read from disk.
Reference: -

Processor Bottleneck Analysis

Object: - Processor
Counter: - %Processor Time
Preferred Value: - < 80%
Description: - % Processor Time is the percentage of elapsed time that the processor spends to execute a non-Idle thread.
Reference: -

Object: - Processor
Counter: - %Privileged Time
Preferred Value: - < 30% of Total %Processor Time
Description: - % Privileged Time is the percentage of elapsed time that the process threads spent executing code in privileged mode.
Reference: -

Object: - Process (sqlservr)
Counter: - %Processor Time
Preferred Value: - < 80%
Description: -
Reference: -

Object: - Process (sqlservr)
Counter: - %Privileged Time
Preferred Value: - < 30% of %Processor Time (sqlservr)
Description: - Note: Divide the value by number of processors
Reference: -

Object: - System
Counter: - Context Switches/sec
Preferred Value: - < 3000
Description: - 1500 – 3000 per processor Excellent – Fair
> 6000 per processor Poor
Upper limit is about 40,000 at 90 % CPU per CPU
NOTE: Remember to divide by number of processors
Reference: -

Object: - System
Counter: - Processor Queue Length
Preferred Value: - < 4 per CPU
Description: - For standard servers with long Quantums
<= 4 per CPU Excellent
< 8 per CPU Good
< 12 per CPU Fair
Reference: -

Object: - SQLServer:Access Methods
Counter: - Full Scans / sec
Preferred Value: - < 1
Description: - If we see high CPU then we need to invistigate this counter, otherwise if the full scan are on small tables we can ignore this counter. Values greater than 1 or 2 indicates that we are having table / Index page scans. We need to analyze how this can be avoided.
Reference: -

Object: - SQLServer:Access Methods
Counter: - Worktables Created/Sec
Preferred Value: - < 20
Description: - Number of worktables created in tempdb per second. Worktables are used for queries that use various spools (table spool, index spool, etc).
Reference: -

Object: - SQLServer:Access Methods
Counter: - Workfiles Created/Sec
Preferred Value: - < 20
Description: - Number of work files created per second. Tempdb workfiles are used in processing hash operations when the amount of data being processed is too big to fit into the available memory. They may be able to reduce this number by making the queries more efficient by adding/changing indexes, adding additional memory, etc.
Reference: -

Object: - SQLServer:Access Methods
Counter: - Page Splits/sec
Preferred Value: - < 20
Description: - Interesting counter that can lead us to our table / index design. This value needs to be low as possible. If you find out that the number of page splits is high, consider increasing the fillfactor of your indexes. An increased fillfactor helps to reduce page splits because there is more room in data pages before it fills up and a page split has to occur.
Reference: -

Overall SQL Server Bottleneck Analysis

Object: - SQLServer:General Statistics
Counter: - User Connections
Preferred Value: -
Description: - The number of users currently connected to the SQL Server.
Reference: -

Object: - SQLServer:General Statistics
Counter: - Logins/sec
Preferred Value: - < 2
Description: - > 2 per second indicates that the application is not correctly using connection pooling.

Reference: -

Object: - SQLServer:General Statistics
Counter: - Logouts/sec
Preferred Value: - < 2
Description: - > 2 per second indicates that the application is not correctly using connection pooling.
Reference: -

Object: - SQLServer:SQL Statistics
Counter: - Batch Requests/Sec
Preferred Value: - < 1000
Description: - Over 1000 batch requests per second indicate a very busy SQL Server.
Reference: -

Object: - SQLServer:SQL Statistics
Counter: - SQL Compilations/sec
Preferred Value: - < 10% of the number of Batch Requests / sec
Description: - The number of times per second that SQL Server compilations have occurred. This value needs to be as low as possible. If you see a high value such as over 100, then it’s an indication that there are lots or adhoc queries that are running, might cause CPU
Reference: -

Object: - SQLServer:SQL Statistics
Counter: - SQL Re-Compilations/sec
Preferred Value: - < 10% of the number of SQL Compilations/sec
Description: - This needs to be nil in our system as much as possible. A recompile can cause deadlocks and compile locks that are not compatible with any locking type.
Reference: -

Object: - SQL Server:Latches
Counter: - Average Latch Wait Time (ms)
Preferred Value: - < 300
Description: - Average latch wait time (milliseconds) for latch requests that had to wait.
Reference: -

Transaction Management

Object: - SQL Server:Locks
Counter: - Number of Deadlocks/sec
Preferred Value: - < 1
Description: - The number of lock requests that resulted in a deadlock.
Reference: -

Object: - SQL Server:Locks
Counter: - Lock Requests/sec
Preferred Value: - < 1000
Description: - Number of requests for a type of lock per second. Lock requests/sec > 1000 indicates that the queries are accessing large number of rows, the next step is to review high read queries. If you also see high Avg. Wait time, then it’s an indication of blocking, then review the blocking script output.
Reference: -

Object: - SQL Server:Locks
Counter: - Average Wait Time (ms)
Preferred Value: - < 500
Description: - This is the average wait time in milliseconds to acquire a lock. Lower the value the better it is. If the value goes higher then 500, there may be blocking going on; we need to run blocker script to identify blocking.
Reference: -

Script: finding the object_id from the page_id

 
I had a situation where there was a page split ( monitored through Extended Events) and the page_id was 46565645. I wanted to find the table so that I could work with the fill factor of the indexes. 

First I turned the following trace ON 

DBCC TRACEON (3604);

DBCC PAGE (6, 1, 295, 0);

DBCC TRACEOFF (3604);

GO

 
PAGE: (1:295)

BUFFER:

BUF @0x00000004FD8C7980

bpage = 0x00000004A2D14000          bhash = 0x0000000000000000          bpageno = (1:295)

bdbid = 6                           breferences = 0                     bcputicks = 0

bsampleCount = 0                    bUse1 = 55116                       bstat = 0x809

blog = 0x15ab215a                   bnext = 0x0000000000000000          

PAGE HEADER:

Page @0x00000004A2D14000

m_pageId = (1:295)                  m_headerVersion = 17                m_type = 17

m_typeFlagBits = 0x0                m_level = 0                         m_flagBits = 0x8200

m_objId (AllocUnitId.idObj) = 84    m_indexId (AllocUnitId.idInd) = 256

Metadata: AllocUnitId = 72057594043432960

Metadata: PartitionId = 72057594039042048                                Metadata: IndexId = 0

Metadata: ObjectId = 245575913      m_prevPage = (0:0)                  m_nextPage = (0:0)

pminlen = 8008                      m_slotCnt = 1                       m_freeCnt = 83

m_freeData = 8107                   m_reservedCnt = 0                   m_lsn = (35:200:9)

m_xactReserved = 0                  m_xdesId = (0:0)                    m_ghostRecCnt = 0

m_tornBits = 1093512791             DB Frag ID = 1                      

Allocation Status

GAM (1:2) = ALLOCATED               SGAM (1:3) = ALLOCATED

PFS (1:1) = 0x64 MIXED_EXT ALLOCATED 100_PCT_FULL                        DIFF (1:6) = CHANGED

ML (1:7) = NOT MIN_LOGGED

In the above result Metadata: ObjectId = 245575913    gave me the object_id 

I was able to find the table name with the page splits once I had the object_id .

Tuesday, December 1, 2015

Script: Performance Counters

--Buffer cache hit ratio : - close to 100%

--PLE : 300 or more

--Compilations : -- for every 10 batch requests / sec there is 1 compilations

--REcompilations : - less than 10% of compilations

-- Locks : -- Total ( 0)

--page splits : Less thann 20% of batch req/secs

DECLARE @old_cntr_value INT;

DECLARE @first_sample_date DATETIME;

　

SELECT object_name, counter_name, ((cntr_value*1)/1024)*1.00/1024 AS 'Total Server Memory (GB)'

FROM sys.dm_os_performance_counters

WHERE counter_name = 'Total Server Memory (KB)'

　

SELECT (a.cntr_value * 1.0 / b.cntr_value) * 100.0 [BufferCacheHitRatio]

FROM (SELECT * FROM sys.dm_os_performance_counters

WHERE counter_name = 'Buffer cache hit ratio'

AND object_name LIKE '%:Buffer Manager%') a

CROSS JOIN

(SELECT * FROM sys.dm_os_performance_counters

WHERE counter_name = 'Buffer cache hit ratio base'

and object_name LIKE '%:Buffer Manager%') b

　

　

　

SELECT @old_cntr_value = cntr_value,

@first_sample_date = getdate()

select *

FROM sys.dm_os_performance_counters

where counter_name = 'page splits/sec'

-- Time frame to wait before collecting second sample

WAITFOR DELAY '00:00:01'

select *

FROM sys.dm_os_performance_counters

where counter_name = 'page splits/sec'

-- Collect second sample and calculate per-second counter

SELECT (cntr_value - @old_cntr_value) /

DATEDIFF(ss,@first_sample_date, GETDATE()) as PageSplitsPerSec

FROM sys.dm_os_performance_counters

WHERE counter_name = 'page splits/sec'

　

SELECT @old_cntr_value = cntr_value,@first_sample_date = getdate()

FROM sys.dm_os_performance_counters

WHERE counter_name = 'Batch Requests/sec'

AND object_name LIKE '%SQL Statistics%';

WAITFOR DELAY '00:00:01'

　

SELECT (cntr_value - @old_cntr_value) /

DATEDIFF(ss,@first_sample_date, GETDATE()) as [Batch Requests/sec]

FROM sys.dm_os_performance_counters

WHERE counter_name = 'Batch Requests/sec'

AND object_name LIKE '%SQL Statistics%';

SELECT @old_cntr_value = cntr_value,@first_sample_date = getdate()

FROM sys.dm_os_performance_counters

WHERE counter_name = 'Batch Requests/sec'

AND object_name LIKE '%SQL Statistics%';

WAITFOR DELAY '00:00:01'

　

SELECT @old_cntr_value = cntr_value,@first_sample_date = getdate()

FROM sys.dm_os_performance_counters

WHERE counter_name = 'SQL Compilations/sec'

AND object_name LIKE '%SQL Statistics%';

WAITFOR DELAY '00:00:01'

SELECT (cntr_value - @old_cntr_value) /

DATEDIFF(ss,@first_sample_date, GETDATE()) as [compilations per sec]

FROM sys.dm_os_performance_counters

WHERE counter_name = 'SQL Compilations/sec'

AND object_name LIKE '%SQL Statistics%';

　

　

SELECT @old_cntr_value = cntr_value,@first_sample_date = getdate()

FROM sys.dm_os_performance_counters

WHERE counter_name = 'SQL Re-Compilations/Sec'

AND object_name LIKE '%SQL Statistics%';

WAITFOR DELAY '00:00:01'

SELECT (cntr_value - @old_cntr_value) /

DATEDIFF(ss,@first_sample_date, GETDATE()) as [RE-compilations per sec]

FROM sys.dm_os_performance_counters

WHERE counter_name = 'SQL Re-Compilations/Sec'

AND object_name LIKE '%SQL Statistics%';

　

　

SELECT @old_cntr_value = cntr_value,@first_sample_date = getdate()

FROM sys.dm_os_performance_counters

WHERE counter_name = 'Lock Waits/sec'

AND object_name LIKE '%SQLServer:Locks%'

and instance_name = '_Total' ;

WAITFOR DELAY '00:00:01'

　

　

　

　

SELECT (cntr_value - @old_cntr_value) /

DATEDIFF(ss,@first_sample_date, GETDATE()) as [Lock Waits/sec]

FROM sys.dm_os_performance_counters

WHERE counter_name = 'Lock Waits/sec'

AND object_name LIKE '%SQLServer:Locks%'

and instance_name = '_Total' ;

　

　

　

SELECT numa_node = ple.instance_name, ple_sec = ple.cntr_value, db_node_mem_GB = dnm.cntr_value/1048576,

ple_per_4gb = ple.cntr_value * 4194304 / dnm.cntr_value

FROM sys.dm_os_performance_counters ple join sys.dm_os_performance_counters dnm

on ple.instance_name = dnm.instance_name

and ple.counter_name='Page life expectancy' -- PLE per NUMA node

and dnm.counter_name='Database Node Memory (KB)' -- buffer pool size per NUMA node