某生产库,oracle linux,11.2.0.4 rac 一节点重启之后不能正常启动。
[root@test1 ~]# su - grid [grid@test1 ~]$ crsctl stat res -t CRS-4535: Cannot communicate with Cluster Ready Services CRS-4000: Command Status failed, or completed with errors.
正常节点如下:
[grid@test2 ~]$ crsctl stat res -t -------------------------------------------------------------------------------- NAME TARGET STATE SERVER STATE_DETAILS -------------------------------------------------------------------------------- Local Resources -------------------------------------------------------------------------------- ora.ARCH.dg ONLINE ONLINE test2 ora.DATA.dg ONLINE ONLINE test2 ora.LISTENER.lsnr ONLINE ONLINE test2 ora.OCR.dg ONLINE ONLINE test2 ora.asm ONLINE ONLINE test2 Started ora.gsd OFFLINE OFFLINE test2 ora.net1.network ONLINE ONLINE test2 ora.ons ONLINE ONLINE test2 -------------------------------------------------------------------------------- Cluster Resources -------------------------------------------------------------------------------- ora.LISTENER_SCAN1.lsnr 1 ONLINE ONLINE test2 ora.cvu 1 ONLINE ONLINE test2 ora.dgdb1.vip 1 ONLINE INTERMEDIATE test2 FAILED OVER ora.dgdb2.vip 1 ONLINE ONLINE test2 ora.oc4j 1 ONLINE ONLINE test2 ora.test.db 1 ONLINE OFFLINE 2 ONLINE ONLINE test2 Open ora.scan1.vip 1 ONLINE ONLINE test2
[grid@test1 grid]$ crsctl status resource -t CRS-4535: Cannot communicate with Cluster Ready Services CRS-4000: Command Status failed, or completed with errors.
检查css服务状态,可以看到连接失败。
[grid@test1 grid]$ crsctl check css CRS-4530: Communications failure contacting Cluster Synchronization Services daemon
检查cssd进程,可以看到没有启动
[grid@test1 grid]$ ps -ef |grep cssd root 22124 1 0 19:37 ? 00:00:00 /u01/app/11.2.0/grid/bin/cssdmonitor grid 22496 15743 0 19:40 pts/3 00:00:00 grep cssd [grid@test1 grid]$ crs_stat -p ora.cssd CRS-0184: Cannot communicate with the CRS daemon.
检查cssd.log
[root@dgdb1 grid]# tail -f /u01/app/11.2.0/grid/log/test1/cssd/ocssd.log 2016-11-21 16:51:34.869: [ SKGFD][2561705728]Fetching asmlib disk :ORCL:OCR1: 2016-11-21 16:51:34.869: [ SKGFD][2561705728]Fetching asmlib disk :ORCL:OCR2: 2016-11-21 16:51:34.869: [ SKGFD][2561705728]Fetching asmlib disk :ORCL:OCR3: 2016-11-21 16:51:34.869: [ SKGFD][2561705728]Fetching asmlib disk :ORCL:TEST_ARCH1: 2016-11-21 16:51:34.869: [ SKGFD][2561705728]Fetching asmlib disk :ORCL:TEST_DATA1: 2016-11-21 16:51:34.870: [ SKGFD][2561705728]Fetching asmlib disk :ORCL:TEST_DATA2: 2016-11-21 16:51:34.870: [ SKGFD][2561705728]ERROR: -15(asmlib ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so op asm_open error Operation not permitted ) 2016-11-21 16:51:34.870: [ SKGFD][2561705728]ERROR: -15(asmlib ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so op asm_open error Operation not permitted ) 2016-11-21 16:51:34.870: [ SKGFD][2561705728]ERROR: -15(asmlib ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so op asm_open error Operation not permitted ) 2016-11-21 16:51:34.870: [ SKGFD][2561705728]ERROR: -15(asmlib ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so op asm_open error Operation not permitted ) 2016-11-21 16:51:34.870: [ SKGFD][2561705728]ERROR: -15(asmlib ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so op asm_open error Operation not permitted ) 2016-11-21 16:51:34.870: [ SKGFD][2561705728]ERROR: -15(asmlib ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so op asm_open error Operation not permitted
上面的错误信息显示asmlib asm出错,没有操作权限,指定ASMLib在发现磁盘的时候需要忽略的盘和需要检查的盘。在我们的环境中是使用了Multipath来对多块磁盘做多路径处理,因此需要包括dm开头的磁盘,而忽略sd开头的磁盘。这样的问题也应该只会发生在使用了Multipath的磁盘上,修改/etc/sysconfig/oracleasm
[root@test bin]# vi /etc/sysconfig/oracleasm # # This is a configuration file for automatic loading of the Oracle # Automatic Storage Management library kernel driver. It is generated # By running /etc/init.d/oracleasm configure. Please use that method # to modify this file # # ORACLEASM_ENABLED: 'true' means to load the driver on boot. ORACLEASM_ENABLED=true # ORACLEASM_UID: Default user owning the /dev/oracleasm mount point. ORACLEASM_UID=grid # ORACLEASM_GID: Default group owning the /dev/oracleasm mount point. ORACLEASM_GID=asmadmin # ORACLEASM_SCANBOOT: 'true' means scan for ASM disks on boot. ORACLEASM_SCANBOOT=true # ORACLEASM_SCANORDER: Matching patterns to order disk scanning ORACLEASM_SCANORDER="dm" --指定要扫描的磁盘匹配格式 # ORACLEASM_SCANEXCLUDE: Matching patterns to exclude disks from scan ORACLEASM_SCANEXCLUDE="sd"--指定要排除扫描的磁盘匹配格式 # ORACLEASM_USE_LOGICAL_BLOCK_SIZE: 'true' means use the logical block size # reported by the underlying disk instead of the physical. The default # is 'false' ORACLEASM_USE_LOGICAL_BLOCK_SIZE=false
重新挂载asmlib
[root@test1 bin]# oracleasm exit Unmounting ASMlib driver filesystem: /dev/oracleasm Unloading module "oracleasm": oracleasm [root@test1 bin]# oracleasm init Loading module "oracleasm": oracleasm Configuring "oracleasm" to use device physical block size Mounting ASMlib driver filesystem: /dev/oracleasm
扫描磁盘
[root@test1 ~]# /etc/init.d/oracleasm scandisks Scanning the system for Oracle ASMLib disks: [ OK ] [root@test1 ~]# oracleasm listdisks OCR1 OCR2 OCR3 TEST_ARCH1 TEST_DATA1 TEST_DATA2
停止crs
root@test bin]# ./crsctl stop crs -f CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'dgdb1' CRS-2673: Attempting to stop 'ora.mdnsd' on 'dgdb1' CRS-2673: Attempting to stop 'ora.crf' on 'dgdb1' CRS-2677: Stop of 'ora.mdnsd' on 'dgdb1' succeeded CRS-2677: Stop of 'ora.crf' on 'dgdb1' succeeded CRS-2673: Attempting to stop 'ora.gipcd' on 'dgdb1' CRS-2677: Stop of 'ora.gipcd' on 'dgdb1' succeeded CRS-2673: Attempting to stop 'ora.gpnpd' on 'dgdb1' CRS-2677: Stop of 'ora.gpnpd' on 'dgdb1' succeeded CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'dgdb1' has completed CRS-4133: Oracle High Availability Services has been stopped.
启动crs
[root@test1 bin]# ./crsctl start crs CRS-4123: Oracle High Availability Services has been started.
[grid@test1 ~]$ crsctl stat res -t -------------------------------------------------------------------------------- NAME TARGET STATE SERVER STATE_DETAILS -------------------------------------------------------------------------------- Local Resources -------------------------------------------------------------------------------- ora.ARCH.dg ONLINE ONLINE test1 ONLINE ONLINE test2 ora.DATA.dg ONLINE ONLINE test1 ONLINE ONLINE test2 ora.LISTENER.lsnr ONLINE ONLINE test1 ONLINE ONLINE test2 ora.OCR.dg ONLINE ONLINE test1 ONLINE ONLINE test2 ora.asm ONLINE ONLINE test1 Started ONLINE ONLINE test2 Started ora.gsd OFFLINE OFFLINE test1 OFFLINE OFFLINE test2 ora.net1.network ONLINE ONLINE test1 ONLINE ONLINE test2 ora.ons ONLINE ONLINE test1 ONLINE ONLINE test2 -------------------------------------------------------------------------------- Cluster Resources -------------------------------------------------------------------------------- ora.LISTENER_SCAN1.lsnr 1 ONLINE ONLINE test2 ora.cvu 1 ONLINE ONLINE test2 ora.test1.vip 1 ONLINE ONLINE test1 ora.test2.vip 1 ONLINE ONLINE test2 ora.oc4j 1 ONLINE ONLINE test2 ora.test.db 1 ONLINE ONLINE test1 Open 2 ONLINE ONLINE test2 Open ora.scan1.vip 1 ONLINE ONLINE test2
到此该节点所有服务正常启动