In the part-1 we discussed about the executable RDS monitoring script which enabled you to pass any sql and than take output of the sql and fetch the result set to the cloudwatch and create the alarms which works as an custom metrics for the monitoring and will raise alarm whenever the threshold is crossed.
In our use case this result of the sql execution is 0 which denotes there is no error on the RDS. If there is any error than an error message will be displayed and the result will be non-zero which causes the cloudwatch to trigger an alarm.
Further the sql output is posted in the email body and sent to the DBA and devops DL.
In this post we are covering the configuration part to be used along with the previous executable script. Once you have configured like this you can schedule this script in the cron service on any server , use the awscli on it to create the alarms and trigger alerts on the rds.
In our use case this result of the sql execution is 0 which denotes there is no error on the RDS. If there is any error than an error message will be displayed and the result will be non-zero which causes the cloudwatch to trigger an alarm.
Further the sql output is posted in the email body and sent to the DBA and devops DL.
In this post we are covering the configuration part to be used along with the previous executable script. Once you have configured like this you can schedule this script in the cron service on any server , use the awscli on it to create the alarms and trigger alerts on the rds.
[oracle@ip-10-149-22-89 etc]$ vi alert.cfg
#### The username and password used by script to connect to the RDS
### Namespace to be created for the RDS monitoring in the cloudwatch
username=APPMONUSR
password='SuperSecurePassword'
namespace=APPRDS
limit=7
f1=" %9s"
declare -A proc_matrix
###-----
#Name of the Database and RDS Endpoint used by the executable script
declare -A db_connection
db_connection[1,1]=appprod00; db_connection[1,2]=appprod00.cr4aqbsm2zuc.ap-southeast-1.rds.amazonaws.com
db_connection[2,1]=appproda0; db_connection[2,2]=appproda0.cr4aqbsm2zuc.ap-southeast-1.rds.amazonaws.com
db_connection[3,1]=appproda1; db_connection[3,2]=appproda1.cr4aqbsm2zuc.ap-southeast-1.rds.amazonaws.com
db_connection[4,1]=appproda2; db_connection[4,2]=appproda2.cr4aqbsm2zuc.ap-southeast-1.rds.amazonaws.com
db_connection[5,1]=appproda0; db_connection[5,2]=appproda0.cr4aqbsm2zuc.ap-southeast-1.rds.amazonaws.com
db_connection[6,1]=appproda1; db_connection[6,2]=appproda1.cr4aqbsm2zuc.ap-southeast-1.rds.amazonaws.com
db_connection[7,1]=appproda2; db_connection[7,2]=appproda2.cr4aqbsm2zuc.ap-southeast-1.rds.amazonaws.com
###-----
# First Column contains metric-name and second Column contains service name to be monitored
# That is every row contains Metric name ans respective service name
# proc_matrix[1,1]=<metric-name>; proc_matrix[1,2]=<service_name>
# eg. proc_matrix[1,1]=SSHService; proc_matrix[1,2]=ssh
# First row is the metric-name displayed in cloudwatch and second row is sql file which executes on rds.
proc_matrix[1,1]=Error_Alertdblog; proc_matrix[1,2]=Error_Alertdblog.sql
proc_matrix[2,1]=blocked_session; proc_matrix[2,2]=blocked_session.sql
proc_matrix[3,1]=invalid_objects; proc_matrix[3,2]=invalid_objects.sql
####SQL Plus Connection String used by script to connect to database
declare -A db_matrix
db_matrix[1,1]=apakpda0; db_matrix[1,2]=tejsecprod00.cr8aqbsm1zuc.ap-southeast-1.rds.amazonaws.com
db_matrix[2,1]=apakpda1; db_matrix[2,2]=tejamkproda0.cr8aqbsm1zuc.ap-southeast-1.rds.amazonaws.com
db_matrix[3,1]=apakpda2; db_matrix[3,2]=tejamkproda1.cr8aqbsm1zuc.ap-southeast-1.rds.amazonaws.com
db_matrix[4,1]=apakpda3; db_matrix[4,2]=tejamkproda2.cr8aqbsm1zuc.ap-southeast-1.rds.amazonaws.com
db_matrix[5,1]=apakpda4; db_matrix[5,2]=tejfilproda0.cr8aqbsm1zuc.ap-southeast-1.rds.amazonaws.com
db_matrix[6,1]=apakpda5; db_matrix[6,2]=tejfilproda1.cr8aqbsm1zuc.ap-southeast-1.rds.amazonaws.com
db_matrix[7,1]=apakpda6; db_matrix[7,2]=tejfilproda2.cr8aqbsm1zuc.ap-southeast-1.rds.amazonaws.com
####Subject in the mail of the alert
declare -A alert_matrix
alert_matrix[1,1]="Alert log having ORA- errors in last 5 minutes";
alert_matrix[2,1]="Blocked session found over threshold time limit";
alert_matrix[3,1]="Invalid objects found";
0 comments:
Post a Comment