Sql Server Alerts

miniDBA provides alerting features through sending emails, showing popups on the windows desktop, sending to Slack and writing to the windows event log

Each of the 30+ alerts can be configured differently - threshold values can be modified to raise minor and major alerts

In larger environments the configuration of the alerts on one server can be copied to another making administration straight-foward

Emails can be send to different addresses or groups for both major and minor alerts for each individual alert, giving you maximum flexibility.

Screen shot of alerts configuration screen:

The following are the alerts that can be raised by miniDBA for SQL Server - click each one for details
Database Off-line
Any databases are not in an online state
Free Disk Space
Free disk space has fallen below a certain threshold
This may mean that the drive will fill completely and the server will not be able to function correctly
Blocking Processes
Processes being blocked by other processes
A degree of this is to be expected in busy environments as locks are taken out on objects
The MaxThreshold represents the number of blocking processes that it takes to trigger an alert
Number of deadlocks detected in 1 minute
Processes deadlocked by other processes
Often caused by locks being held for too long, in turn caused by poor index design
Long Running Queries
Active processes that have been running the same sql batch for a long period of time
Huge queries may be consuming way too much of the servers resources
The MaxThreshold represents number of minutes the batch must run to trigger the alert
No Connection
The Sql Server cannot be contacted, either miniDBA does not have permission to connect or the server is down
Always On Enabled But Failed
The Always On feature is enabled but it failed to start properly and is an error state
Buffer Cache Hit Ratio
The percentage of reads that use the buffer cache, if dipped below will trigger the alarm
Plan Cache Hit Ratio
the percentage of queries that reuse execution plans from the plan cache, if dipped below will trigger the alarm
Maximum Connections
The number of concurrent client connections to the instance.
This may indicate periods of very high use or inefficient client application code.
Backup Restore
Backup/Restore in progress
This can negatively effect database performance of any databases that share the same drive as the database being backed up.
Can often occur at unexpected times and require better scheduling
File Size Limit
Data or Log file is near maximum size limit
Either the file will autogrow which may impact performance during the operation or it will run out of space if set to have a size limit
Instance Memory
Memory used by this instance (Mb)
Use this alert to tell you if the allocated memory is actually being used or not while tuning memory settings
Page Life Expectancy
If the life expectancy of memory in the buffer pool is very low it can indicate high memory pressure
Instance Cpu %
% of the Cpu used by this Sql Server instance
Sustained high usage may indicate poor T-SQL or index design
Non Instance Cpu %
% of the Cpu used by processes other than this instance
Ideally there should be minimal activity by any non SQL Server processes on the server.
Check virus scanners and other maintenance software is configured correctly
Disk Queue Length
IO items waiting to be serviced by the disk
High disk queue length can signify poor IO configuration.
Check if multiple databases sharing the same drive.
The thresholds are device dependant so experiment with the settings
Agent Not Running
SQL Server Agent is not running
No jobs will be able to run if the agent is not running
OS Memory State
The OS is lacking memory available for programs. SQL Server will attempt to trim its working set of memory pages
Failed Job
SQL Server Agent jobs that have failed
If the latest execution failed, it is raised as an alert
If this server is expected to have failing jobs then disable this alert
Tran Log % Used
Transaction Log file at high % use may need increasing in size
A full or near full transaction log may cause bad database performance and will likely have a run-away query with poor design causing it
File IO Stall Time
Average time in ms, waiting for the IO subsystem to perform operations
Lower waiting times are better - the IO Ms is calculated across total stall ms / read & write operations, giving an average rather than an on the spot calculation
Write Log Time
Time in ms, per second, the system has to wait for the IO subsystem to perform log writes
The figure is server wide not specific to any database and indicates a performance bottleneck around whichever drive the offending log file(s) are on
Error Log High Severity
Errors in the Sql Server error log of severity 17 or above
Error Log Medium Severity
Errors in the Sql Server error log of severity 15 or 16
These levels of severity covers user correctable errors such as problems with High Availability Groups or queries that throw errors
Failed Login
A connection was attempted to the SQL Server but the login was incorrect
Sort Warning
Indicates that sort operations do not fit into memory and are spilling to TempDb.
Performance of the query can be affected because multiple passes over the data are required to sort the data
File Auto Growths
Data or Log file auto growths occurred which will take longer to perform as the database gets bigger
Avoid autogrowths by giving the files a larger, more appropriate maximum size
Database Mirroring State Change
Indicates when the state of a mirrored database changes
Custom T-SQL Query
The alert is raised when the custom TSQL query is executed (every 1 minute) and the result is over a pre-defined threshold or boolean value
Always ON Failed To Start
The alert is raised when the HADR component os SQL Server is enabled but did not start properly
Always ON Failover
The alert is raised when the primary replica fails over from one cluster node to another
Always ON Replica Unhealthy
The alert is raised when replica synchronization state becomes unhealthy
Always ON Database Unhealthy
The alert is raised when database synchronization state becomes unhealthy
Always ON Transaction Delay Too Long
The alert is raised when the time a primary replica has to wait for secondary replicas to harden transmitted log data becomes too much.
IO (Checksum) Error
IO error (incorrect checksum) occurred when reading a data file. The read eventually completed successfully but encountered a checksum error. Check the device for other errors.