Quantcast
Channel: Thinking Out Loud
Viewing all articles
Browse latest Browse all 666

Monitor Linux Host Restart

$
0
0

The application is not RAC-aware and cannot handle ORA-3113, ORA-25402, or ORA-25409 properly.

Hence, there is requirement to notify the application team to restart the application when database server is restarted.

Initial implementation to monitor reboot was to use cronjob from oracle running every 5m to detect server restart.

While the implementation is effective, it’s not efficient. This was my first attempt.

The script detects if server was restarted X seconds ago by checking /proc/uptime.

If uptime is less than X seconds, then send notification server was restarted.

Here is high level example:

### Scripts accept paramenter with values for seconds
$ /home/oracle/scripts/last_reboot.sh
/home/oracle/scripts/last_reboot.sh: line 10: 1: ---> USAGE: /home/oracle/scripts/last_reboot.sh [in seconds]

### The heart of the script is to check /proc/uptime in seconds
$ egrep -o '^[0-9]+' /proc/uptime
2132607

### Scheduled cron tab to run every 5 minute to determine if server uptime is less that 540 seconds and send notification.
$ crontab -l|grep reboot
##### monitor node reboot #####
*/5 * * * * /home/oracle/scripts/last_reboot.sh 540 > /tmp/last_reboot.cron 2>&1

A more efficient implementation is to run a cronjob automatically after the server restart.

Here is high level example:

### When server is restarted, host_restart_alert.sh will be executed
[root@oracle-12201-vagrant ~]# crontab -l
@reboot su oracle -c '/home/oracle/host_restart_alert.sh' > /tmp/host_restart_alert.out 2>&1

### Here is host_restart_alert.sh
[oracle@oracle-12201-vagrant ~]$ cat host_restart_alert.sh
#!/bin/bash -x
# Script ie being called from root crontab
# uptime reports minutely and need to sleep for at least 60s after host restart
sleep 63
EMAILMESSAGE="$(hostname) was restarted `uptime -p| awk -F'up' '{print $2}'` ago at `uptime -s`"
echo $EMAILMESSAGE > /tmp/restart_$HOSTNAME.log
exit

### Comment from colleague:
### From a bash syntax perspective, it’s not wrong. It’s not great style (don’t use backticks)
printf -v EMAILMESSAGE '%s was restarted %s ago at %s' \
"$(hostname)" \
"$(uptime -p| awk -F'up' '{print $2}')" \
"$(uptime -s)"
echo $EMAILMESSAGE > /tmp/restart_$HOSTNAME.log

### Deconstructing uptime commands:
[oracle@oracle-12201-vagrant ~]$ uptime -p
up 17 hours, 28 minutes

[oracle@oracle-12201-vagrant ~]$ uptime -s
2021-02-15 18:00:51

### Deconstructing message sent:
[oracle@oracle-12201-vagrant ~]$ echo "$HOSTNAME was restarted `uptime -p| awk -F'up' '{print $2}'` ago at `uptime -s`"
oracle-12201-vagrant was restarted  17 hours, 28 minutes ago at 2021-02-15 18:00:51

### Demo:
[root@oracle-12201-vagrant ~]# date
 Tue Feb 16 14:51:18 -05 2021

[root@oracle-12201-vagrant ~]# uptime
  14:51:22 up 1 min,  1 user,  load average: 0.58, 0.23, 0.08

[root@oracle-12201-vagrant ~]# ls -l /tmp/restart
 -rw-r--r--. 1 root   root     271 Feb 16 14:51 /tmp/host_restart_alert.out
 -rw-r--r--. 1 oracle oinstall  71 Feb 16 14:51 /tmp/restart_oracle-12201-vagrant.log

[root@oracle-12201-vagrant ~]# cat /tmp/host_restart_alert.out
 sleep 63
 ++ hostname
 ++ uptime -p
 ++ awk -Fup '{print $2}'
 ++ uptime -s
 printf -v EMAILMESSAGE '%s was restarted %s ago at %s' oracle-12201-vagrant ' 1 minute' '2021-02-16 14:50:02'
 echo oracle-12201-vagrant was restarted 1 minute ago at 2021-02-16 14:50:02
 exit 

[root@oracle-12201-vagrant ~]# cat /tmp/restart_oracle-12201-vagrant.log
 oracle-12201-vagrant was restarted 1 minute ago at 2021-02-16 14:50:02
[root@oracle-12201-vagrant ~]#

Scripts were tested on Oracle Linux Server release 7.8  and 7.9.


Viewing all articles
Browse latest Browse all 666

Trending Articles