Often, I am quite baffle with Oracle’s implementations and documentations.
RAC GoldenGate DBFS implementation has been a nightmare and here is one example DBFS Nightmare
I am about to show you another.
In general, I find any implementation using ACTION_SCRIPT is good in theory, bad in practice, but I digress.
Getting ready to shutdown CRS for system patching to find out CRS failed to shutdown.
# crsctl stop crs CRS-2675: Stop of 'dbfs_mount' on 'host02' failed CRS-2675: Stop of 'dbfs_mount' on 'host02' failed CRS-2673: Attempting to stop 'dbfs_mount' on 'host02' CRS-2675: Stop of 'dbfs_mount' on 'host02' failed CRS-2673: Attempting to stop 'dbfs_mount' on 'host02' CRS-2675: Stop of 'dbfs_mount' on 'host02' failed CRS-2799: Failed to shut down resource 'dbfs_mount' on 'host02' CRS-2799: Failed to shut down resource 'ora.GG_PROD.dg' on 'host02' CRS-2799: Failed to shut down resource 'ora.asm' on 'host02' CRS-2799: Failed to shut down resource 'ora.dbfs.db' on 'host02' CRS-2799: Failed to shut down resource 'ora.host02.ASM2.asm' on 'host02' CRS-2794: Shutdown of Cluster Ready Services-managed resources on 'host02' has failed CRS-2675: Stop of 'ora.crsd' on 'host02' failed CRS-2799: Failed to shut down resource 'ora.crsd' on 'host02' CRS-2795: Shutdown of Oracle High Availability Services-managed resources on 'host02' has failed CRS-4687: Shutdown command has completed with errors. CRS-4000: Command Stop failed, or completed with errors.
Check /var/log/messages to find errors and there’s no clue for resolution.
# grep -i dbfs /var/log/messages Apr 17 19:42:26 host02 DBFS_/ggdata: unmounting DBFS from /ggdata Apr 17 19:42:26 host02 DBFS_/ggdata: umounting the filesystem using '/bin/fusermount -u /ggdata' Apr 17 19:42:26 host02 DBFS_/ggdata: Stop - stopped, but still mounted, error Apr 17 20:45:59 host02 DBFS_/ggdata: mount-dbfs.sh mounting DBFS at /ggdata from database DBFS Apr 17 20:45:59 host02 DBFS_/ggdata: /ggdata already mounted, use mount-dbfs.sh stop before attempting to start Apr 17 21:01:29 host02 DBFS_/ggdata: unmounting DBFS from /ggdata Apr 17 21:01:29 host02 DBFS_/ggdata: umounting the filesystem using '/bin/fusermount -u /ggdata' Apr 17 21:01:29 host02 DBFS_/ggdata: Stop - stopped, but still mounted, error Apr 17 21:01:36 host02 dbfs_client[71957]: OCI_ERROR 3114 - ORA-03114: not connected to ORACLE Apr 17 21:01:41 host02 dbfs_client[71957]: /FS1/dirdat/ih000247982 Block error RC:-5 Apr 17 21:03:06 host02 DBFS_/ggdata: unmounting DBFS from /ggdata Apr 17 21:03:06 host02 DBFS_/ggdata: umounting the filesystem using '/bin/fusermount -u /ggdata' Apr 17 21:03:06 host02 DBFS_/ggdata: Stop - stopped, now not mounted Apr 17 21:09:19 host02 DBFS_/ggdata: filesystem /ggdata not currently mounted, no need to stop Apr 17 22:06:16 host02 DBFS_/ggdata: mount-dbfs.sh mounting DBFS at /ggdata from database DBFS Apr 17 22:06:17 host02 DBFS_/ggdata: ORACLE_SID is DBFS2 Apr 17 22:06:17 host02 DBFS_/ggdata: doing mount /ggdata using SID DBFS2 with wallet now Apr 17 22:06:18 host02 DBFS_/ggdata: Start -- ONLINE
The messages in log of script agent show below error (MOS documentation).
Anyone know where script agent is located at?
2019-04-17 20:56:02.793903 : AGFW:3274315520: {1:53477:37077} Agent received the message: AGENT_HB[Engine] ID 12293:16017523 2019-04-17 20:56:19.124667 :CLSDYNAM:3276416768: [dbfs_mount]{1:53477:37077} [check] Executing action script: /u02/app/12.1.0/grid/crs/script/mount-dbfs.sh[check] 2019-04-17 20:56:19.176927 :CLSDYNAM:3276416768: [dbfs_mount]{1:53477:37077} [check] Checking status now 2019-04-17 20:56:19.176973 :CLSDYNAM:3276416768: [dbfs_mount]{1:53477:37077} [check] Check -- ONLINE 2019-04-17 20:56:32.794287 : AGFW:3274315520: {1:53477:37077} Agent received the message: AGENT_HB[Engine] ID 12293:16017529 2019-04-17 20:56:43.312534 : AGFW:3274315520: {2:37893:29307} Agent received the message: RESOURCE_STOP[dbfs_mount host02 1] ID 4099:16017535 2019-04-17 20:56:43.312574 : AGFW:3274315520: {2:37893:29307} Preparing STOP command for: dbfs_mount host02 1 2019-04-17 20:56:43.312584 : AGFW:3274315520: {2:37893:29307} dbfs_mount host02 1 state changed from: ONLINE to: STOPPING 2019-04-17 20:56:43.313088 :CLSDYNAM:3276416768: [dbfs_mount]{2:37893:29307} [stop] Executing action script: /u02/app/12.1.0/grid/crs/script/mount-dbfs.sh[stop] 2019-04-17 20:56:43.365201 :CLSDYNAM:3276416768: [dbfs_mount]{2:37893:29307} [stop] unmounting DBFS from /ggdata 2019-04-17 20:56:43.415516 :CLSDYNAM:3276416768: [dbfs_mount]{2:37893:29307} [stop] umounting the filesystem using '/bin/fusermount -u /ggdata' 2019-04-17 20:56:43.415541 :CLSDYNAM:3276416768: [dbfs_mount]{2:37893:29307} [stop] /bin/fusermount: failed to unmount /ggdata: Device or resource busy 2019-04-17 20:56:43.415552 :CLSDYNAM:3276416768: [dbfs_mount]{2:37893:29307} [stop] Stop - stopped, but still mounted, error 2019-04-17 20:56:43.415611 : AGFW:3276416768: {2:37893:29307} Command: stop for resource: dbfs_mount host02 1 completed with status: FAIL 2019-04-17 20:56:43.415929 :CLSFRAME:3449863744: TM [MultiThread] is changing desired thread # to 3. Current # is 2 2019-04-17 20:56:43.415970 :CLSDYNAM:3276416768: [dbfs_mount]{2:37893:29307} [check] Executing action script: /u02/app/12.1.0/grid/crs/script/mount-dbfs.sh[check] 2019-04-17 20:56:43.416033 : AGFW:3274315520: {2:37893:29307} Agent sending reply for: RESOURCE_STOP[dbfs_mount host02 1] ID 4099:16017535 2019-04-17 20:56:43.467939 :CLSDYNAM:3276416768: [dbfs_mount]{2:37893:29307} [check] Checking status now 2019-04-17 20:56:43.467964 :CLSDYNAM:3276416768: [dbfs_mount]{2:37893:29307} [check] Check -- ONLINE
ACTION_SCRIPT can be found using crsctl as shown if you have not found where log of script agent is at.
oracle@host02 ~ $ $GRID_HOME/bin/crsctl stat res -w "TYPE = local_resource" -p | grep mount-dbfs.sh ACTION_SCRIPT=/u02/app/12.1.0/grid/crs/script/mount-dbfs.sh ACTION_SCRIPT=/u02/app/12.1.0/grid/crs/script/mount-dbfs.sh
Here’s a test case to resolve “failed to unmount /ggdata: Device or resource busy”.
First thought was to use fuser and kill the process.
# fuser -vmM /ggdata/ USER PID ACCESS COMMAND /ggdata: root kernel mount /ggdata mdinh 85368 ..c.. bash mdinh 86702 ..c.. vim #
Second thought, might not be a good idea and better idea is to let the script handle this if it can.
Let’s see what options are available for mount-dbfs.sh:
oracle@host02 ~ $ /u02/app/12.1.0/grid/crs/script/mount-dbfs.sh -h Usage: mount-dbfs.sh { start | stop | check | status | restart | clean | abort | version } oracle@host02 ~ $ /u02/app/12.1.0/grid/crs/script/mount-dbfs.sh version 20160215 oracle@host02 ~
Stop DBFS failed as expected.
oracle@host02 ~ $ /u02/app/12.1.0/grid/crs/script/mount-dbfs.sh status Checking status now Check -- ONLINE oracle@host02 ~ $ /u02/app/12.1.0/grid/crs/script/mount-dbfs.sh stop unmounting DBFS from /ggdata umounting the filesystem using '/bin/fusermount -u /ggdata' /bin/fusermount: failed to unmount /ggdata: Device or resource busy Stop - stopped, but still mounted, error oracle@host02 ~ $
Stop DBFS using clean option. Notice PID killed is 40047 and not the same as (mdinh 86702 ..c.. vim)
Note: not all output displayed for brevity.
oracle@host02 ~ $ /bin/bash -x /u02/app/12.1.0/grid/crs/script/mount-dbfs.sh clean + msg='cleaning up DBFS nicely using (fusermount -u|umount)' + '[' info = info ']' + /bin/echo cleaning up DBFS nicely using '(fusermount' '-u|umount)' cleaning up DBFS nicely using (fusermount -u|umount) + /bin/logger -t DBFS_/ggdata -p user.info 'cleaning up DBFS nicely using (fusermount -u|umount)' + '[' 1 -eq 1 ']' + /bin/fusermount -u /ggdata /bin/fusermount: failed to unmount /ggdata: Device or resource busy + /bin/sleep 1 + FORCE_CLEANUP=0 + '[' 0 -gt 1 ']' + /u02/app/12.1.0/grid/crs/script/mount-dbfs.sh status + '[' 0 -eq 0 ']' + FORCE_CLEANUP=1 + '[' 1 -eq 1 ']' + logit error 'tried (fusermount -u|umount), still mounted, now cleaning with (fusermount -u -z|umount -f) and kill' + type=error + msg='tried (fusermount -u|umount), still mounted, now cleaning with (fusermount -u -z|umount -f) and kill' + '[' error = info ']' + '[' error = error ']' + /bin/echo tried '(fusermount' '-u|umount),' still mounted, now cleaning with '(fusermount' -u '-z|umount' '-f)' and kill tried (fusermount -u|umount), still mounted, now cleaning with (fusermount -u -z|umount -f) and kill + /bin/logger -t DBFS_/ggdata -p user.error 'tried (fusermount -u|umount), still mounted, now cleaning with (fusermount -u -z|umount -f) and kill' + '[' 1 -eq 1 ']' ================================================================================ + /bin/fusermount -u -z /ggdata ================================================================================ + '[' 1 -eq 1 ']' ++ /bin/ps -ef ++ /bin/grep -w /ggdata ++ /bin/grep dbfs_client ++ /bin/grep -v grep ++ /bin/awk '{print $2}' + PIDS=40047 + '[' -n 40047 ']' ================================================================================ + /bin/kill -9 40047 ================================================================================ ++ /bin/ps -ef ++ /bin/grep -w /ggdata ++ /bin/grep mount.dbfs ++ /bin/grep -v grep ++ /bin/awk '{print $2}' + PIDS= + '[' -n '' ']' + exit 1 oracle@host02 ~ $ /u02/app/12.1.0/grid/crs/script/mount-dbfs.sh status Checking status now Check -- OFFLINE oracle@host02 ~ $ /u02/app/12.1.0/grid/crs/script/mount-dbfs.sh status Checking status now Check -- OFFLINE oracle@host02 ~ $ df -h|grep /ggdata /dev/asm/acfs_vol-177 299G 2.8G 297G 1% /ggdata1 dbfs-@DBFS:/ 60G 1.4G 59G 3% /ggdata oracle@host02 ~ $ /u02/app/12.1.0/grid/crs/script/mount-dbfs.sh status Checking status now Check -- ONLINE oracle@host02 ~ $
What PID was killed when running mount-dbfs.sh clean?
It’s for dbfs_client.
mdinh@host02 ~ $ ps -ef|grep dbfs oracle 34865 1 0 Mar29 ? 00:02:43 oracle+ASM1_asmb_dbfs1 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq))) oracle 40047 1 0 Apr22 ? 00:00:10 /u01/app/oracle/product/12.1.0/db_1/bin/dbfs_client /@DBFS -o allow_other,direct_io,wallet /ggdata oracle 40081 1 0 Apr22 ? 00:00:27 oracle+ASM1_user40069_dbfs1 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq))) mdinh 88748 87565 0 13:30 pts/1 00:00:00 grep --color=auto dbfs mdinh@host02 ~ $
It would have been so much better for mount-dbfs.sh to provide the info as part of the kill versus having user go debug script and trace process.
If you have read this far, then it’s only fair to provide log of script agent.
$ grep mount-dbfs.sh $ORACLE_BASE/diag/crs/$(hostname -s)/crs/trace/crsd_scriptagent_oracle.trc | grep "2019-04-17 20:5"