The RDS Forms the crucial part of the web application and any problem in it can lead to downtime in application, reduced performance, 5xx errors, degraded user performance. RDS Monitoring plays a important part in this sense. Below is the list of the parameters which can be monitored to measure the normal operations of the RDS. Some of the monitoring metrics are provided by the AWS and rest can be created using the custom scripts.
Thresholds depends upon the size of the RDS(Cpu cores, Memory etc). We are just providing some idea about the connections threshold.
1. Cpu Utilization:- Cpu utilization increases as the workload and processing on the RDS increases. Alert threshold if [Cpu Utilization] >= 80% for 5minutes .
2. Database Connections:- Database connections if increases beyond a limit should be alerted because if application doesn't get free connections than it will result in error as connectivity for those request would break. Alert threshold if [Database Connections] >= 10000 .
3. Disk Queue Depth:- The Disk queue depth can be significantly increased if your RDS is doing lot more I/O operations which would result in increased latency. Disk Queue represents the pending I/O operations for the volume. Adding more hard disk can be used to overcome this scenario.
4. Free Storage Space:- Represents the amount of storage space available on RDS. Alert threshold [Free Storage Space] < 2048 GB
5. Read Latency:- Read latency reflects the latency in the select statement on the RDS. If this is hire you would see the reduced application performance and frequent errors. Threshold depends upon the application performance.
6. Write Latency:- Write latency reflects the latency in the insert statement. Can be high during the high traffic. Should be minimized and threshold depends upon the application performance.
7. Freeable Memory:- The freeable memory includes the amount of physical memory left unused by the system plus the total amount of buffer or page cache memory that are free and available. This is a reflection of how your memory patterns are on RDS usage. [Freeable Memory] <= 8GB
8. Network Receive Throughput:- Amount of data received over Network.
9. Network:- Amount of data sent out.
10. ReadIOPS:- Read I/O operations on RDS. Alert threshold if [ReadIOPS] >= 11000
11. WriteIOPS:- Write I/O operations on RDS. Alert threshold if [WriteIOPS] >= 11000
12. Swap Usage:- If the RDS runs out of memory than the swap usage would increase which would severly affect the RDS instance. Alert threshold [Swap Usage] > 10GB
13. Read Throughput:- Number of bytes transferred from the disk.
14. Write Throughput:- Number of bytes transferred to the disk.
Than there are some custom Alerts in which you can create different scripts and execute on RDS to check for the performance, Consistency, Error detection from the RDS.
1. TABLESPACE STORAGE THRESHOLD LIMIT:- Amount/Size of the Tablespace threshold.
2. Session Blocked Continuously for more than 300 seconds
3. Session Continuously Active for more than 1 hour.
4. Scheduled job failed in last 15minutes.
5. Invalid objects found in database.
6. DB Error in the Alert Log File.
7. Problem Identified with Advance Queue(expired/enqueue/dequeue)
8. Problem Identified with Advance Queue(propagation)
Thresholds depends upon the size of the RDS(Cpu cores, Memory etc). We are just providing some idea about the connections threshold.
1. Cpu Utilization:- Cpu utilization increases as the workload and processing on the RDS increases. Alert threshold if [Cpu Utilization] >= 80% for 5minutes .
2. Database Connections:- Database connections if increases beyond a limit should be alerted because if application doesn't get free connections than it will result in error as connectivity for those request would break. Alert threshold if [Database Connections] >= 10000 .
3. Disk Queue Depth:- The Disk queue depth can be significantly increased if your RDS is doing lot more I/O operations which would result in increased latency. Disk Queue represents the pending I/O operations for the volume. Adding more hard disk can be used to overcome this scenario.
4. Free Storage Space:- Represents the amount of storage space available on RDS. Alert threshold [Free Storage Space] < 2048 GB
5. Read Latency:- Read latency reflects the latency in the select statement on the RDS. If this is hire you would see the reduced application performance and frequent errors. Threshold depends upon the application performance.
6. Write Latency:- Write latency reflects the latency in the insert statement. Can be high during the high traffic. Should be minimized and threshold depends upon the application performance.
7. Freeable Memory:- The freeable memory includes the amount of physical memory left unused by the system plus the total amount of buffer or page cache memory that are free and available. This is a reflection of how your memory patterns are on RDS usage. [Freeable Memory] <= 8GB
8. Network Receive Throughput:- Amount of data received over Network.
9. Network:- Amount of data sent out.
10. ReadIOPS:- Read I/O operations on RDS. Alert threshold if [ReadIOPS] >= 11000
11. WriteIOPS:- Write I/O operations on RDS. Alert threshold if [WriteIOPS] >= 11000
12. Swap Usage:- If the RDS runs out of memory than the swap usage would increase which would severly affect the RDS instance. Alert threshold [Swap Usage] > 10GB
13. Read Throughput:- Number of bytes transferred from the disk.
14. Write Throughput:- Number of bytes transferred to the disk.
Than there are some custom Alerts in which you can create different scripts and execute on RDS to check for the performance, Consistency, Error detection from the RDS.
1. TABLESPACE STORAGE THRESHOLD LIMIT:- Amount/Size of the Tablespace threshold.
2. Session Blocked Continuously for more than 300 seconds
3. Session Continuously Active for more than 1 hour.
4. Scheduled job failed in last 15minutes.
5. Invalid objects found in database.
6. DB Error in the Alert Log File.
7. Problem Identified with Advance Queue(expired/enqueue/dequeue)
8. Problem Identified with Advance Queue(propagation)
0 comments:
Post a Comment