Quantcast
Channel: Thinking Out Loud
Viewing all articles
Browse latest Browse all 668

Linux Locking using flock

$
0
0

I am faced with a situation on how to create locking mechanism for RMAN backup.

Script dbf.sh backups database and can be run simutaneously if it’s not for the same database.

Hence dbf.sh sh running for PROD1 & PROD2 at the same time is VALID and running for PROD1 & PROD1 at the same time is NOT VALID.

While dbf.sh is running, arc.sh (backup archivelog) should not be running.

This was instigated by Laurent Schneider from his post on Can you restore from a full online backup?
http://laurentschneider.com/wordpress/2015/05/can-you-restore-from-a-full-online-backup.html

First test, dbf.sh is running and arc.sh should not.

The key is to have locking based on the database sid PROD1 and not the script.

$ crontab -l

#22 20 * * * /media/sf_working/sh/ogg_lag_sec.sh hawklas 0 > /tmp/ogg_lag_sec.sh.log 2>&1
#04 11 * * 2 [ $(date +\%d) -ge 07 ] && /home/oracle/t.sh > /tmp/t.log
* * * * * [ $(date +\%d) -ge 07 ] && /usr/bin/flock -w 3600 /tmp/PROD1 /home/oracle/dbf.sh PROD1 >> /tmp/dbfPROD1.log 2>&1
* * * * * [ $(date +\%d) -ge 07 ] && /usr/bin/flock -w 3600 /tmp/PROD2 /home/oracle/dbf.sh PROD2 >> /tmp/dbfPROD2.log 2>&1
* * * * * [ $(date +\%d) -ge 07 ] && /usr/bin/flock -w 0 /tmp/PROD1 /home/oracle/arc.sh PROD1 >> /tmp/arcPROD1.log 2>&1
* * * * * [ $(date +\%d) -ge 07 ] && /usr/bin/flock -w 0 /tmp/PROD2 /home/oracle/arc.sh PROD2 >> /tmp/arcPROD2.log 2>&1

Looks like dbf.sh was run and arc.sh was not.

$ ls -alrt /tmp/*PROD*

-rw-r--r--. 1 oracle oinstall  0 Jun 11 20:22 /tmp/PROD1
-rw-r--r--. 1 oracle oinstall  0 Jun 11 20:22 /tmp/PROD2
-rw-r--r--. 1 oracle oinstall  0 Jun 11 20:22 /tmp/arcPROD1.log
-rw-r--r--. 1 oracle oinstall 77 Jun 11 20:22 /tmp/dbfPROD1.log
-rw-r--r--. 1 oracle oinstall  0 Jun 11 20:22 /tmp/arcPROD2.log
-rw-r--r--. 1 oracle oinstall 77 Jun 11 20:22 /tmp/dbfPROD2.log

$ cat /tmp/dbfPROD1.log

Starting /home/oracle/dbf.sh PROD1 Thu Jun 11 20:22:01 PDT 2015
Sleeping 119

$ cat /tmp/dbfPROD2.log

Starting /home/oracle/dbf.sh PROD2 Thu Jun 11 20:22:01 PDT 2015
Sleeping 119

Continuing to monitor the process and arc.sh never ran since dbf.sh was always running.

$ ls -alrt /tmp/*PROD*

-rw-r--r--. 1 oracle oinstall   0 Jun 11 20:22 /tmp/PROD1
-rw-r--r--. 1 oracle oinstall   0 Jun 11 20:22 /tmp/PROD2
-rw-r--r--. 1 oracle oinstall   0 Jun 11 20:22 /tmp/arcPROD1.log
-rw-r--r--. 1 oracle oinstall   0 Jun 11 20:22 /tmp/arcPROD2.log
-rw-r--r--. 1 oracle oinstall 231 Jun 11 20:25 /tmp/dbfPROD2.log
-rw-r--r--. 1 oracle oinstall 231 Jun 11 20:25 /tmp/dbfPROD1.log

$ cat /tmp/dbfPROD1.log

Starting /home/oracle/dbf.sh PROD1 Thu Jun 11 20:22:01 PDT 2015
Sleeping 119
Starting /home/oracle/dbf.sh PROD1 Thu Jun 11 20:24:00 PDT 2015
Sleeping 119
Starting /home/oracle/dbf.sh PROD1 Thu Jun 11 20:25:59 PDT 2015
Sleeping 119

$ cat /tmp/dbfPROD2.log

Starting /home/oracle/dbf.sh PROD2 Thu Jun 11 20:22:01 PDT 2015
Sleeping 119
Starting /home/oracle/dbf.sh PROD2 Thu Jun 11 20:24:00 PDT 2015
Sleeping 119
Starting /home/oracle/dbf.sh PROD2 Thu Jun 11 20:25:59 PDT 2015
Sleeping 119

Looking good so far. But what happens when arc.sh is currently running and then dbf.sh is started?

It would be a shame to have dbf.sh backup died because arc.sh is running.

$ crontab -l

#22 20 * * * /media/sf_working/sh/ogg_lag_sec.sh hawklas 0 > /tmp/ogg_lag_sec.sh.log 2>&1
#04 11 * * 2 [ $(date +\%d) -ge 07 ] && /home/oracle/t.sh > /tmp/t.log
43 20 * * * [ $(date +\%d) -ge 07 ] && /usr/bin/flock -w 0 /tmp/PROD1 /home/oracle/arc.sh PROD1 >> /tmp/arcPROD1.log 2>&1
43 20 * * * [ $(date +\%d) -ge 07 ] && /usr/bin/flock -w 0 /tmp/PROD2 /home/oracle/arc.sh PROD2 >> /tmp/arcPROD2.log 2>&1
44 20 * * * [ $(date +\%d) -ge 07 ] && /usr/bin/flock -w 3600 /tmp/PROD1 /home/oracle/dbf.sh PROD1 >> /tmp/dbfPROD1.log 2>&1
44 20 * * * [ $(date +\%d) -ge 07 ] && /usr/bin/flock -w 3600 /tmp/PROD2 /home/oracle/dbf.sh PROD2 >> /tmp/dbfPROD2.log 2>&1

From /usr/bin/flock -w 3600, this means wait up to 3600s before aborting dbf.sh

Let’s test this.

$ date

Thu Jun 11 20:43:06 PDT 2015

$ ls -alrt /tmp/*PROD*

-rw-r--r--. 1 oracle oinstall  0 Jun 11 20:43 /tmp/PROD2
-rw-r--r--. 1 oracle oinstall  0 Jun 11 20:43 /tmp/PROD1
-rw-r--r--. 1 oracle oinstall 78 Jun 11 20:43 /tmp/arcPROD2.log
-rw-r--r--. 1 oracle oinstall 78 Jun 11 20:43 /tmp/arcPROD1.log

$ cat /tmp/arcPROD1.log

Starting /home/oracle/arc.sh PROD1 Thu Jun 11 20:43:01 PDT 2015
Sleeping 135s

$ cat /tmp/arcPROD2.log

Starting /home/oracle/arc.sh PROD2 Thu Jun 11 20:43:01 PDT 2015
Sleeping 135s

arc.sh started at 20:43 and is sleeping for 135s while dbf.sh is scheduled to run at 20:44

20:43 + 135s would take us to 20:45:15 which is well after the scheduled time for dbf.sh at 20:44

Let’s see if this works.

$ date

Thu Jun 11 20:45:33 PDT 2015

$ ls -alrt /tmp/*PROD*

-rw-r--r--. 1 oracle oinstall  0 Jun 11 20:43 /tmp/PROD2
-rw-r--r--. 1 oracle oinstall  0 Jun 11 20:43 /tmp/PROD1
-rw-r--r--. 1 oracle oinstall 78 Jun 11 20:43 /tmp/arcPROD2.log
-rw-r--r--. 1 oracle oinstall 78 Jun 11 20:43 /tmp/arcPROD1.log
-rw-r--r--. 1 oracle oinstall 75 Jun 11 20:45 /tmp/dbfPROD1.log
-rw-r--r--. 1 oracle oinstall 75 Jun 11 20:45 /tmp/dbfPROD2.log

$ cat /tmp/dbfPROD1.log

Starting /home/oracle/dbf.sh PROD1 Thu Jun 11 20:45:16 PDT 2015
Sleeping 1

$ cat /tmp/dbfPROD2.log

Starting /home/oracle/dbf.sh PROD2 Thu Jun 11 20:45:16 PDT 2015
Sleeping 1

dbf.sh started at 20:45:16 – 1 second after arc.sh completed.

Simple scripts used to test with and you will need to modify sleep time accordingly for each test case.

$ cat dbf.sh

echo "Starting $0 $*" `date`
echo "Sleeping 1"
sleep 1

$ cat arc.sh

echo "Starting $0 $*" `date`
echo "Sleeping 135s"
sleep 135

And there you have it.

Good Night.

Reference: https://ma.ttias.be/prevent-cronjobs-from-overlapping-in-linux/



Viewing all articles
Browse latest Browse all 668

Trending Articles