oracle – 第5页 – 但行好事莫问前程

私有IP丢失造成Oracle 12C RAC集群节点不能启动

Posted on 2020年2月20日 by YongJing(敬勇)

某生产环境Oracle Linux 7.1,Oracle 12C RAC显示节点1的集群资源没有启动，信息如下:

[grid@cs2 ~]$ crsctl stat res -t
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details       
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.ASMNET1LSNR_ASM.lsnr
               ONLINE  ONLINE       cs2                      STABLE
ora.CRS.dg
               ONLINE  ONLINE       cs2                      STABLE
ora.DATA.dg
               ONLINE  ONLINE       cs2                      STABLE
ora.LISTENER.lsnr
               ONLINE  ONLINE       cs2                      STABLE
ora.chad
               ONLINE  OFFLINE      cs2                      STABLE
ora.net1.network
               ONLINE  ONLINE       cs2                      STABLE
ora.ons
               ONLINE  OFFLINE      cs2                      STABLE
ora.proxy_advm
               OFFLINE OFFLINE      cs2                      STABLE
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       cs2                      STABLE
ora.LISTENER_SCAN2.lsnr
      1        ONLINE  ONLINE       cs2                      STABLE
ora.LISTENER_SCAN3.lsnr
      1        ONLINE  ONLINE       cs2                      STABLE
ora.MGMTLSNR
      1        ONLINE  ONLINE       cs2                      169.254.255.92 88.88
                                                             .88.191,STABLE
ora.asm
      1        ONLINE  OFFLINE                               STABLE
      2        ONLINE  ONLINE       cs2                      Started,STABLE
      3        OFFLINE OFFLINE                               STABLE
ora.cs.db
      1        ONLINE  OFFLINE                               STABLE
      2        ONLINE  OFFLINE                               Instance Shutdown,ST
                                                             ABLE
ora.cs1.vip
      1        ONLINE  INTERMEDIATE cs2                      FAILED OVER,STABLE
ora.cs2.vip
      1        ONLINE  ONLINE       cs2                      STABLE
ora.cvu
      1        ONLINE  ONLINE       cs2                      STABLE
ora.gns
      1        ONLINE  OFFLINE                               STABLE
ora.gns.vip
      1        ONLINE  OFFLINE                               STABLE
ora.mgmtdb
      1        ONLINE  OFFLINE                               Instance Shutdown,ST
                                                             ABLE
ora.qosmserver
      1        ONLINE  ONLINE       cs2                      STABLE
ora.scan1.vip
      1        ONLINE  ONLINE       cs2                      STABLE
ora.scan2.vip
      1        ONLINE  ONLINE       cs2                      STABLE
ora.scan3.vip
      1        ONLINE  ONLINE       cs2                      STABLE
--------------------------------------------------------------------------------

如果查看节点1上的alert.log文件，信息如下:

[root@cs1 soft]# tail -100 /u01/app/grid/diag/crs/cs1/crs/trace/alert.log

2020-01-14 20:45:50.226 [CVUD(2366CRS-10051: CVU found following errors with Clusterware setup : PRVE-3191 :
(DESCRIPTION = (LOAD_BALANCE=on)  (ADDRESS = (PROTOCOL = TCP)(HOST = cs-cluster-scan.cs-cluster.jy.net)(PORT = 1521)) (CONNECT_DATA =(SERVER = DEDICATED)(SERVICE_NAME = cs)))
Listener refused the connection with the following error:
ORA-12514, TNS:listener does not currently know of service requested in connect descriptor
 

2020-02-17 14:49:28.563 [ORAROOTAGENT(15500)]CRS-5822: Agent '/u01/app/product/12.2.0/crs/bin/orarootagent_root' disconnected from server. Details at (:CRSAGF00117:) {0:5:63832} in /u01/app/grid/diag/crs/cs1/crs/trace/crsd_orarootagent_root.trc.
2020-02-17 14:49:28.584 [ORAAGENT(15466)]CRS-5822: Agent '/u01/app/product/12.2.0/crs/bin/oraagent_grid' disconnected from server. Details at (:CRSAGF00117:) {0:1:36} in /u01/app/grid/diag/crs/cs1/crs/trace/crsd_oraagent_grid.trc.
2020-02-17 14:49:29.607 [GPNPD(3293)]CRS-2329: GPNPD on node cs1 shut down. 
2020-02-17 14:49:30.123 [OSYSMOND(6077)]CRS-8504: Oracle Clusterware OSYSMOND process with operating system process ID 6077 is exiting
2020-02-17 14:49:31.377 [MDNSD(3254)]CRS-5602: mDNS service stopping by request.
2020-02-17 14:49:31.747 [MDNSD(3254)]CRS-8504: Oracle Clusterware MDNSD process with operating system process ID 3254 is exiting
2020-02-17 14:49:46.650 [OCTSSD(5883)]CRS-2405: The Cluster Time Synchronization Service on host cs1 is shutdown by user
2020-02-17 14:49:46.651 [OCTSSD(5883)]CRS-8504: Oracle Clusterware OCTSSD process with operating system process ID 5883 is exiting
2020-02-17 14:49:47.651 [OCSSD(3616)]CRS-1603: CSSD on node cs1 has been shut down.
2020-02-17 14:49:47.958 [OCSSD(3616)]CRS-1660: The CSS daemon shutdown has completed
2020-02-17 14:49:47.959 [OCSSD(3616)]CRS-8504: Oracle Clusterware OCSSD process with operating system process ID 3616 is exiting
2020-02-17 14:49:49.726 [ORAAGENT(3235)]CRS-5822: Agent '/u01/app/product/12.2.0/crs/bin/oraagent_grid' disconnected from server. Details at (:CRSAGF00117:) {0:9:818} in /u01/app/grid/diag/crs/cs1/crs/trace/ohasd_oraagent_grid.trc.
2020-02-17 14:49:49.727 [ORAROOTAGENT(3029)]CRS-5822: Agent '/u01/app/product/12.2.0/crs/bin/orarootagent_root' disconnected from server. Details at (:CRSAGF00117:) {0:1:21} in /u01/app/grid/diag/crs/cs1/crs/trace/ohasd_orarootagent_root.trc.

根据上面的错误信息提示查看crsd_oraagent_grid.trc跟踪文件:

[root@cs1 ~]# more /u01/app/grid/diag/crs/cs1/crs/trace/crsd_oraagent_grid.trc
Trace file /u01/app/grid/diag/crs/cs1/crs/trace/crsd_oraagent_grid.trc
Oracle Database 12c Clusterware Release 12.2.0.1.0 - Production Copyright 1996, 2016 Oracle. All rights reserved.

*** TRACE CONTINUED FROM FILE /u01/app/grid/diag/crs/cs1/crs/trace/crsd_oraagent_grid_910.trc ***

2020-02-17 14:39:22.129 :CLSDYNAM:2462013184: [ora.ASMNET1LSNR_ASM.lsnr]{0:1:18} [check] LsnrAgent::regEndpoint 110 reset regEndPointDone:0
2020-02-17 14:39:22.129 :CLSDYNAM:2462013184: [ora.ASMNET1LSNR_ASM.lsnr]{0:1:18} [check] generateEndPointStrings:generate endpoints
2020-02-17 14:39:22.129 :CLSDYNAM:2462013184: [ora.ASMNET1LSNR_ASM.lsnr]{0:1:18} [check] VendorType=0
2020-02-17 14:39:22.129 :CLSDYNAM:2462013184: [ora.ASMNET1LSNR_ASM.lsnr]{0:1:18} [check] VendorType=0
2020-02-17 14:39:22.129 :CLSDYNAM:2462013184: [ora.ASMNET1LSNR_ASM.lsnr]{0:1:18} [check] VendorType=0
2020-02-17 14:39:22.129 :CLSDYNAM:2462013184: [ora.ASMNET1LSNR_ASM.lsnr]{0:1:18} [check] VendorType=0
2020-02-17 14:39:22.129 :CLSDYNAM:2462013184: [ora.ASMNET1LSNR_ASM.lsnr]{0:1:18} [check] generateEndPointStrings:No IP matching the subnet 88.88.88.0
2020-02-17 14:39:22.130 :CLSDYNAM:2462013184: [ora.ASMNET1LSNR_ASM.lsnr]{0:1:18} [check] Ocr Context init default level 1814629536
2020-02-17 14:39:22.133 :CLSDYNAM:2462013184: [ora.ASMNET1LSNR_ASM.lsnr]{0:1:18} [check] Agent::getDeploymentPlatformId return 
2020-02-17 14:39:22.133 :CLSDYNAM:2462013184: [ora.ASMNET1LSNR_ASM.lsnr]{0:1:18} [check] LsnrAgent::generateEndPoints 000 entry {nonPriv
2020-02-17 14:39:22.133 :CLSDYNAM:2462013184: [ora.ASMNET1LSNR_ASM.lsnr]{0:1:18} [check] LsnrAgent::generateEndPoints 000 entry { lsnrResName: endpAttr:TCP:1526 type:4
2020-02-17 14:39:22.133 :CLSDYNAM:2462013184: [ora.ASMNET1LSNR_ASM.lsnr]{0:1:18} [check] LsnrAgent::generateEndPoints 040 Listener ResName:
2020-02-17 14:39:22.133 :CLSDYNAM:2462013184: [ora.ASMNET1LSNR_ASM.lsnr]{0:1:18} [check] LsnrAgent::removeDuplicates 
2020-02-17 14:39:22.133 :CLSDYNAM:2462013184: [ora.ASMNET1LSNR_ASM.lsnr]{0:1:18} [check] LsnrAgent::generateEndPoints 999 exit nonPriv}
2020-02-17 14:39:22.133 :CLSDYNAM:2462013184: [ora.ASMNET1LSNR_ASM.lsnr]{0:1:18} [check] LsnrAgent::regEndpoint 200 endpStrings empty
2020-02-17 14:39:22.134 :CLSDYNAM:2462013184: [ora.ASMNET1LSNR_ASM.lsnr]{0:1:18} [check] AsmLsnrAgent::init
2020-02-17 14:39:22.154 :CLSDYNAM:2462013184: [ora.ASMNET1LSNR_ASM.lsnr]{0:1:18} [check] VendorType=0
2020-02-17 14:39:22.154 :CLSDYNAM:2462013184: [ora.ASMNET1LSNR_ASM.lsnr]{0:1:18} [check] VendorType=0
2020-02-17 14:39:22.154 :CLSDYNAM:2462013184: [ora.ASMNET1LSNR_ASM.lsnr]{0:1:18} [check] VendorType=0
2020-02-17 14:39:22.154 :CLSDYNAM:2462013184: [ora.ASMNET1LSNR_ASM.lsnr]{0:1:18} [check] VendorType=0
2020-02-17 14:39:22.154 :CLSDYNAM:2462013184: [ora.ASMNET1LSNR_ASM.lsnr]{0:1:18} [check] Cannot find an IP address matching the subnet 88.88.88.0
2020-02-17 14:39:22.154 :CLSDYNAM:2462013184: [ora.ASMNET1LSNR_ASM.lsnr]{0:1:18} [check] Ocr Context init default level 1814629536
2020-02-17 14:39:22.157 :CLSDYNAM:2462013184: [ora.ASMNET1LSNR_ASM.lsnr]{0:1:18} [check] Agent::getDeploymentPlatformId return 
2020-02-17 14:39:22.158 : CLSCEVT:2462013184: (:CLSCE0028:)clsce_unsubscribe 0x7f725c004510 successfully unsubscribed : 974833
2020-02-17 14:39:22.158 : USRTHRD:2462013184: {0:1:18} WorkerThread::removeWorker total workers: 2
2020-02-17 14:39:22.158 : USRTHRD:2462013184: {0:1:18} Worker::schedule adding worker 'ReconnSub-LsnrAgentSub-ASMNET1LSNR_ASM' to thread 'ReconnectingSubscribers'
2020-02-17 14:39:22.158 : USRTHRD:2462013184: {0:1:18} WorkerThread::addWorker total workers: 3
2020-02-17 14:39:22.158 :CLSDYNAM:2462013184: [ora.ASMNET1LSNR_ASM.lsnr]{0:1:18} [check] Skipping CSS Initialization for rebootless recovery
2020-02-17 14:39:22.159 : CLSCEVT:2478823168: clsce_subscribe 0x7f72600d06b0 filter='(^CRS_RESOURCE_PROFILE_CHANGE.*RESOURCE_CLASS='(scan_vip|vip)')|(^CRS_RESOURCE_PROFILE_CHANGE.*TYPE='ora\.network\.type')', flags=1, handler=0xc6ad0c, 
arg=0x7f726016de10

从上面的错误信息中可以找到Cannot find an IP address matching the subnet 88.88.88.0，这说明私有IP丢失了。

如是查看网络IP地址
节点2私有IP正常

[grid@cs2 ~]$ ifconfig -a
ens160: flags=4163  mtu 1500
        inet 10.13.13.191  netmask 255.255.255.0  broadcast 10.13.13.255
        inet6 fe80::250:56ff:fea0:92af  prefixlen 64  scopeid 0x20
        ether 00:50:56:a0:92:af  txqueuelen 1000  (Ethernet)
        RX packets 127560233  bytes 10186683662 (9.4 GiB)
        RX errors 0  dropped 353  overruns 0  frame 0
        TX packets 18954431  bytes 1554204413 (1.4 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

ens160:1: flags=4163  mtu 1500
        inet 10.13.13.130  netmask 255.255.255.0  broadcast 10.13.13.255
        ether 00:50:56:a0:92:af  txqueuelen 1000  (Ethernet)

ens160:2: flags=4163  mtu 1500
        inet 10.13.13.147  netmask 255.255.255.0  broadcast 10.13.13.255
        ether 00:50:56:a0:92:af  txqueuelen 1000  (Ethernet)

ens160:3: flags=4163  mtu 1500
        inet 10.13.13.141  netmask 255.255.255.0  broadcast 10.13.13.255
        ether 00:50:56:a0:92:af  txqueuelen 1000  (Ethernet)

ens160:4: flags=4163  mtu 1500
        inet 10.13.13.138  netmask 255.255.255.0  broadcast 10.13.13.255
        ether 00:50:56:a0:92:af  txqueuelen 1000  (Ethernet)

ens160:5: flags=4163  mtu 1500
        inet 10.13.13.137  netmask 255.255.255.0  broadcast 10.13.13.255
        ether 00:50:56:a0:92:af  txqueuelen 1000  (Ethernet)

ens160:6: flags=4163  mtu 1500
        inet 10.13.13.136  netmask 255.255.255.0  broadcast 10.13.13.255
        ether 00:50:56:a0:92:af  txqueuelen 1000  (Ethernet)

ens192: flags=4163  mtu 1500
        inet 88.88.88.191  netmask 255.255.255.0  broadcast 88.88.88.255
        inet6 fe80::250:56ff:fea0:a05c  prefixlen 64  scopeid 0x20
        ether 00:50:56:a0:a0:5c  txqueuelen 1000  (Ethernet)
        RX packets 294716840  bytes 215309349272 (200.5 GiB)
        RX errors 0  dropped 355  overruns 0  frame 0
        TX packets 63703465  bytes 363393728693 (338.4 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

ens192:1: flags=4163  mtu 1500
        inet 169.254.255.92  netmask 255.255.0.0  broadcast 169.254.255.255
        ether 00:50:56:a0:a0:5c  txqueuelen 1000  (Ethernet)

lo: flags=73  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10
        loop  txqueuelen 0  (Local Loopback)
        RX packets 90029207  bytes 223833448058 (208.4 GiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 90029207  bytes 223833448058 (208.4 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

virbr0: flags=4099  mtu 1500
        inet 192.168.122.1  netmask 255.255.255.0  broadcast 192.168.122.255
        ether 52:54:00:d0:6e:0b  txqueuelen 0  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

virbr0-nic: flags=4098  mtu 1500
        ether 52:54:00:d0:6e:0b  txqueuelen 500  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0


[grid@cs2 ~]$ cat /etc/hosts

127.0.0.1      localhost
10.13.13.190  cs1
10.13.13.191  cs2

#10.13.13.192 cs1-vip.jy.net cs1-vip
#10.13.13.193 cs2-vip.jy.net cs2-vip

88.88.88.190   cs1-priv.jy.net cs1-priv
88.88.88.191   cs2-priv.jy.net cs2-priv


10.13.13.141 cs-cluster-scan 
10.13.13.142 cs-cluster-scan 
10.13.13.143 cs-cluster-scan

节点1的私有IP丢失

[root@cs1 ~]# ifconfig -a
ens160: flags=4163  mtu 1500
        inet 10.13.13.190  netmask 255.255.255.0  broadcast 10.13.13.255
        inet6 fe80::250:56ff:fea0:4e69  prefixlen 64  scopeid 0x20
        inet6 2018::2  prefixlen 64  scopeid 0x0
        ether 00:50:56:a0:4e:69  txqueuelen 1000  (Ethernet)
        RX packets 53266410  bytes 3937179306 (3.6 GiB)
        RX errors 0  dropped 143  overruns 0  frame 0
        TX packets 2988621  bytes 284679938 (271.4 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

ens192: flags=4163  mtu 1500
        inet6 fe80::250:56ff:fea0:17dc  prefixlen 64  scopeid 0x20
        ether 00:50:56:a0:17:dc  txqueuelen 1000  (Ethernet)
        RX packets 70361859  bytes 5602168856 (5.2 GiB)
        RX errors 0  dropped 141  overruns 0  frame 0
        TX packets 24902418  bytes 195063342601 (181.6 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10
        loop  txqueuelen 0  (Local Loopback)
        RX packets 30235303  bytes 10958756943 (10.2 GiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 30235303  bytes 10958756943 (10.2 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

virbr0: flags=4099  mtu 1500
        inet 192.168.122.1  netmask 255.255.255.0  broadcast 192.168.122.255
        ether 52:54:00:d0:6e:0b  txqueuelen 0  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

virbr0-nic: flags=4098  mtu 1500
        ether 52:54:00:d0:6e:0b  txqueuelen 500  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

设置私有IP

[root@cs1 network-scripts]# cat ifcfg-Ethernet_connection_ens192
TYPE=Ethernet
BOOTPROTO=none
DEFROUTE=yes
IPV4_FAILURE_FATAL=no
IPV6INIT=no
NAME=ens192
UUID=8ec45d90-c1fb-4924-84a0-72f58fc9b82f
DEVICE=ens192
ONBOOT=yes
IPADDR=88.88.88.190
PREFIX=24
NETMASK=255.255.255.0

[root@cs1 network-scripts]# ifconfig -a
ens160: flags=4163  mtu 1500
        inet 10.138.130.190  netmask 255.255.255.0  broadcast 10.138.130.255
        inet6 fe80::250:56ff:fea0:4e69  prefixlen 64  scopeid 0x20
        inet6 2018::2  prefixlen 64  scopeid 0x0
        ether 00:50:56:a0:4e:69  txqueuelen 1000  (Ethernet)
        RX packets 53277081  bytes 3937945677 (3.6 GiB)
        RX errors 0  dropped 143  overruns 0  frame 0
        TX packets 2989099  bytes 284806540 (271.6 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

ens192: flags=4163  mtu 1500
        inet 88.88.88.190  netmask 255.255.255.0  broadcast 88.88.88.255
        inet6 fe80::250:56ff:fea0:17dc  prefixlen 64  scopeid 0x20
        ether 00:50:56:a0:17:dc  txqueuelen 1000  (Ethernet)
        RX packets 70371906  bytes 5602881459 (5.2 GiB)
        RX errors 0  dropped 141  overruns 0  frame 0
        TX packets 24902428  bytes 195063343317 (181.6 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10
        loop  txqueuelen 0  (Local Loopback)
        RX packets 30236949  bytes 10959017396 (10.2 GiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 30236949  bytes 10959017396 (10.2 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

virbr0: flags=4099  mtu 1500
        inet 192.168.122.1  netmask 255.255.255.0  broadcast 192.168.122.255
        ether 52:54:00:d0:6e:0b  txqueuelen 0  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

virbr0-nic: flags=4098  mtu 1500
        ether 52:54:00:d0:6e:0b  txqueuelen 500  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

[root@cs1 ~]# crsctl stop crs -f
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on ‘cs2’
CRS-2673: Attempting to stop ‘ora.mdnsd’ on ‘cs2’
CRS-2673: Attempting to stop ‘ora.gpnpd’ on ‘cs2’
CRS-2673: Attempting to stop ‘ora.crf’ on ‘cs2’
CRS-2673: Attempting to stop ‘ora.ctssd’ on ‘cs2’
CRS-2673: Attempting to stop ‘ora.evmd’ on ‘cs2’
CRS-2673: Attempting to stop ‘ora.asm’ on ‘cs2’
CRS-2673: Attempting to stop ‘ora.drivers.acfs’ on ‘cs2’
CRS-2677: Stop of ‘ora.drivers.acfs’ on ‘cs2’ succeeded
CRS-2677: Stop of ‘ora.ctssd’ on ‘cs2’ succeeded
CRS-2677: Stop of ‘ora.crf’ on ‘cs2’ succeeded
CRS-2677: Stop of ‘ora.gpnpd’ on ‘cs2’ succeeded
CRS-2677: Stop of ‘ora.evmd’ on ‘cs2’ succeeded
CRS-2677: Stop of ‘ora.mdnsd’ on ‘cs2’ succeeded
CRS-2677: Stop of ‘ora.asm’ on ‘cs2’ succeeded
CRS-2673: Attempting to stop ‘ora.cluster_interconnect.haip’ on ‘cs2’
CRS-2677: Stop of ‘ora.cluster_interconnect.haip’ on ‘cs2’ succeeded
CRS-2673: Attempting to stop ‘ora.cssd’ on ‘cs2’
CRS-2677: Stop of ‘ora.cssd’ on ‘cs2’ succeeded
CRS-2673: Attempting to stop ‘ora.gipcd’ on ‘cs2’
CRS-2673: Attempting to stop ‘ora.driver.afd’ on ‘cs2’
CRS-2677: Stop of ‘ora.driver.afd’ on ‘cs2’ succeeded
CRS-2677: Stop of ‘ora.gipcd’ on ‘cs2’ succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on ‘cs2’ has completed
CRS-4133: Oracle High Availability Services has been stopped.
[root@cs1 ~]# crsctl start crs
CRS-4123: Oracle High Availability Services has been started.

手动启动节点1的集群软件

[root@cs1 ~]# crsctl stat res -t
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details       
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.ASMNET1LSNR_ASM.lsnr
               ONLINE  ONLINE       cs1                      STABLE
               ONLINE  ONLINE       cs2                      STABLE
ora.CRS.dg
               ONLINE  ONLINE       cs1                      STABLE
               ONLINE  ONLINE       cs2                      STABLE
ora.DATA.dg
               ONLINE  ONLINE       cs1                      STABLE
               ONLINE  ONLINE       cs2                      STABLE
ora.DN.dg
               OFFLINE OFFLINE      cs1                      STABLE
               OFFLINE OFFLINE      cs2                      STABLE
ora.LISTENER.lsnr
               ONLINE  ONLINE       cs1                      STABLE
               ONLINE  ONLINE       cs2                      STABLE
ora.chad
               ONLINE  ONLINE       cs1                      STABLE
               ONLINE  ONLINE       cs2                      STABLE
ora.net1.network
               ONLINE  ONLINE       cs1                      STABLE
               ONLINE  ONLINE       cs2                      STABLE
ora.ons
               ONLINE  ONLINE       cs1                      STABLE
               ONLINE  ONLINE       cs2                      STABLE
ora.proxy_advm
               OFFLINE OFFLINE      cs1                      STABLE
               OFFLINE OFFLINE      cs2                      STABLE
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       cs1                      STABLE
ora.LISTENER_SCAN2.lsnr
      1        ONLINE  ONLINE       cs2                      STABLE
ora.LISTENER_SCAN3.lsnr
      1        ONLINE  ONLINE       cs1                      STABLE
ora.MGMTLSNR
      1        ONLINE  ONLINE       cs1                      169.254.207.24 88.88
                                                             .88.190,STABLE
ora.asm
      1        ONLINE  ONLINE       cs1                      Started,STABLE
      2        ONLINE  ONLINE       cs2                      Started,STABLE
      3        OFFLINE OFFLINE                               STABLE
ora.cs.db
      1        ONLINE  ONLINE       cs1                      Open,HOME=/u01/app/o
                                                             racle/product/12.2.0
                                                             /db,STABLE
      2        ONLINE  ONLINE       cs2                      Open,HOME=/u01/app/o
                                                             racle/product/12.2.0
                                                             /db,STABLE
ora.cs1.vip
      1        ONLINE  ONLINE       cs1                      STABLE
ora.cs2.vip
      1        ONLINE  ONLINE       cs2                      STABLE
ora.cvu
      1        ONLINE  ONLINE       cs1                      STABLE
ora.gns
      1        ONLINE  ONLINE       cs2                      STABLE
ora.gns.vip
      1        ONLINE  ONLINE       cs2                      STABLE
ora.mgmtdb
      1        ONLINE  OFFLINE      cs1                      Open,STABLE
ora.qosmserver
      1        ONLINE  ONLINE       cs1                      STABLE
ora.scan1.vip
      1        ONLINE  ONLINE       cs1                      STABLE
ora.scan2.vip
      1        ONLINE  ONLINE       cs2                      STABLE
ora.scan3.vip
      1        ONLINE  ONLINE       cs1                      STABLE
--------------------------------------------------------------------------------

到此恢复正常状态。

PLSQL通过Oracle 11g客户端连接Oracle 12c服务器错误 ORA-28040

Posted on 2020年2月17日 by YongJing(敬勇)

PLSQL通过Oracle 11g客户端连接Oracle 12c服务器错误 ORA-28040
环境描述:
oracle服务器端版本：oracle 12.2.0.1.0
oracle客户端版本：oracle 11.2.0.1.0
PLSQL是11.4

因为PLSQL连接数据库也是要通过Oracle客户端，那么使用11g客户端访问oracle 12c应该也会得到如下错误：

C:\Users\Administrator>sqlplus sys/abcd@shardcat as sysdba

SQL*Plus: Release 11.2.0.1.0 Production on Sat Dec 20 22:22:07 2014

Copyright (c) 1982, 2010, Oracle.  All rights reserved.

ERROR:
ORA-28040: No matching authentication protocol

查看关于错误的详细描述:

[oracle@shard1 admin]$ oerr ora 28040
28040, 0000, "No matching authentication protocol"
// *Cause:  There was no acceptable authentication protocol for
//          either client or server.
// *Action: The administrator should set the values of the
//          SQLNET.ALLOWED_LOGON_VERSION_SERVER and
//          SQLNET.ALLOWED_LOGON_VERSION_CLIENT parameters, on both the
//          client and on the server, to values that match the minimum
//          version software supported in the system.
//          This error is also raised when the client is authenticating to
//          a user account which was created without a verifier suitable for
//          the client software version. In this situation, that account's
//          password must be reset, in order for the required verifier to
//          be generated and allow authentication to proceed successfully.

在服务器端查看表：dba_users，PASSWORD_VERSIONS 列是：11G 12C

SQL> select username, account_status,password_versions from dba_users where account_status='OPEN';

USERNAME                       ACCOUNT_STATUS                                                   PASSWORD_VERSIONS
------------------------------ ---------------------------------------------------------------- ----------------------------------
SYS                            OPEN                                                             11G 12C
SYSTEM                         OPEN                                                             11G 12C
GSMCATUSER                     OPEN                                                             11G 12C
JY                             OPEN                                                             11G 12C
MYGDSADMIN                     OPEN                                                             11G 12C
APP_SCHEMA                     OPEN                                                             11G 12C

6 rows selected.

服务器端：修改 sqlnet.ora 配置：（配置修改后，不需要重启oracle服务器）我这里是新创建的sqlnet.ora，因为原来没有创建

[oracle@shard1 admin]$ ls -lrt
总用量 12
-rw-r--r--. 1 oracle oinstall 1441 8月  28 2015 shrept.lst
drwxr-xr-x. 2 oracle oinstall   61 10月 12 2017 samples
-rw-r-----. 1 oracle oinstall  960 1月  18 2018 tnsnames.ora
-rw-r--r--  1 oracle oinstall  504 2月  17 12:58 listener.ora
[oracle@shard1 admin]$ vi sqlnet.ora
SQLNET.ALLOWED_LOGON_VERSION_SERVER=11
SQLNET.ALLOWED_LOGON_VERSION_CLIENT=11

客户端：再次尝试连接，提示用户名密码错误：

C:\Users\Administrator>sqlplus sys/abcd@shardcat as sysdba

SQL*Plus: Release 11.2.0.1.0 Production on Sat Dec 20 22:16:06 2014

Copyright (c) 1982, 2010, Oracle.  All rights reserved.

ERROR:
ORA-01017: invalid username/password; logon denied

修改服务器端 sqlnet.ora 后，需要重新登录sqlplus，再修改用户密码，否则修改用户密码后，标记的密码版本仍然为11G 12C；重新登录sqlplus，修改scott用户密码，并查看 PASSWORD_VERSIONS，多了一个 10G

[oracle@shard1 ~]$ sqlplus sys/abcd@shardcat as sysdba

SQL*Plus: Release 12.2.0.1.0 Production on Mon Feb 17 13:07:28 2020

Copyright (c) 1982, 2016, Oracle.  All rights reserved.


Connected to:
Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production

SQL> alter user sys identified by "abcd";

User altered.


SQL> select username, account_status,password_versions from dba_users where account_status='OPEN';

USERNAME                       ACCOUNT_STATUS                                                   PASSWORD_VERSIONS
------------------------------ ---------------------------------------------------------------- ----------------------------------
SYS                            OPEN                                                             10G 11G 12C
SYSTEM                         OPEN                                                             11G 12C
GSMCATUSER                     OPEN                                                             11G 12C
JY                             OPEN                                                             11G 12C
MYGDSADMIN                     OPEN                                                             11G 12C
APP_SCHEMA                     OPEN                                                             11G 12C

6 rows selected.

客户端：再次尝试登录oracle 12c，成功:

C:\Users\Administrator>sqlplus sys/abcd@shardcat as sysdba

SQL*Plus: Release 11.2.0.1.0 Production on Sat Dec 20 22:22:39 2014

Copyright (c) 1982, 2010, Oracle.  All rights reserved.


Connected to:
Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production

SQL>

总结:oracle11g 客户端连接 oracle 12c服务器，需要在服务器端配置 sqlnet.ora，并重新修改用户密码。

Oracle 12C Formatted Block Dump

Posted on 2020年2月7日 by YongJing(敬勇)

Oracle 12C Formatted Block Dump
.可以从RDBMS中dump数据块
.在用户dump目录中会生成一个跟踪文件包含dump信息

对7号文件中的135号数据块执行dump

SQL> oradebug setmypid
Statement processed.
SQL> oradebug tracefile_name
/u01/app/oracle/diag/rdbms/shardcat/shardcat/trace/shardcat_ora_13182.trc
SQL> alter system dump datafile 7 block 135;

System altered.

对于任何数据块dump都有一个数据块头信息。它决定了剩余dump的格式。在任何Oracle块dump的头部应该可以看到块头信息:
Rdba:数据块的相对DBA
Scn:SCN序列号
Seq:序列号:
SEQ->0 /* non-logged changes – do not advance seq# */
SEQ->(UB1MAXVAL-1) /* maximum possible sequence number */
SEQ->(UB1MAXVAL) /* seq# to indicate a block is corrupt, equal to FF */

Flg: Flag(在kcbh.h中定义)
Tail:一致性数据用来验证数据块的开始与结束是否是相同版本(它由SCNBase的低阶两字节加上数据块类型加上SCN的序列号)
Frmt:数据块格式:在Oracle 12c中是2
Chkval数据块的选项检查值:如果db_block_checksum=true
Type:数据块类型(在kcb.h中定义)
Rdba:相对数据块地址
Seg/Obj:Seg/Obj ID dump输出的Seg/Obj为0x1af27

SQL> select to_number('1af27','xxxxxx') from dual;

TO_NUMBER('1AF27','XXXXXX')
---------------------------
                     110375

SQL> select object_id,object_name from dba_objects where owner='JY' and object_name='T1';

 OBJECT_ID OBJECT_NAME
---------- ------------------------------
    110375 T1

Csc:最近数据块清除
itc: itl插槽数量
Flg: E
Typ: 1=DATA;2=INDEX
Itl:相关事务列表索引(ITLs由initrans和maxtrans来决定)
Xid:事务ID(UndoSeg.Slot.Wrap)
Uba:Undo段地址(UndoDBA.SeqNo.RecordNo)
Flg:C=Committed;U=Commit Upper Bound;T=Active at CSC;B=Rollback of this UBA gives before image of the ITL.
Lck:这个事务所影响的行数
Scn/Fsc:Scn=提交事务的SCN号;Fsc=Free space credit(bytes)
例如:

*** 2020-02-07T14:35:53.082746+08:00
Start dump data blocks tsn: 4 file#:7 minblk 135 maxblk 135
Block dump from cache:
Dump of buffer cache at level 4 for pdb=0 tsn=4 rdba=29360263
Block dump from disk:
buffer tsn: 4 rdba: 0x01c00087 (7/135)
scn: 0x1e1174a seq: 0x09 flg: 0x06 tail: 0x174a0609
frmt: 0x02 chkval: 0xba74 type: 0x06=trans data
Hex dump of block: st=0, typ_found=1
......省略......

Block header dump:  0x01c00087
 Object id on Block? Y
 seg/obj: 0x1af27  csc:  0x0000000001e11748  itc: 2  flg: E  typ: 1 - DATA
     brn: 0  bdba: 0x1c00080 ver: 0x01 opc: 0
     inc: 0  exflg: 0

 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0008.003.00006d30  0x010001c2.2499.09  --U-    8  fsc 0x0000.01e1174a
0x02   0x0008.01b.00003a9a  0x010002b5.114f.20  C---    0  scn  0x0000000001286184
bdba: 0x01c00087

Tsiz:总的数据区域大小

SQL> select to_number('1f98','xxxxxx') from dual;

TO_NUMBER('1F98','XXXXXX')
--------------------------
                      8088

Hsiz:数据块头大小

SQL> select to_number('22','xxxxxx') from dual;

TO_NUMBER('22','XXXXXX')
------------------------
                      34

Pbl:持有数据块的缓存指针
Flag:N=pctfree hit(clusters);F=do not put on free list;K=flushable cluster keys
Ntab:表的数量(在clusters中大于1)
Nrow:行记录数
Frre:第一个可用行索引项;-1=没有添加
Fsbo:可用空间开始偏移量
Fseo:可用空间结束偏移量
Avsp:数据块中可用空间
Tosp:当所有事务提交后总的可用空间
Nrow:第一个表的行数
例如:

data_block_dump,data header at 0x7f799676f064
===============
tsiz: 0x1f98
hsiz: 0x22
pbl: 0x7f799676f064
     76543210
flag=--------
ntab=1
nrow=8
frre=-1
fsbo=0x22
fseo=0x1efa
avsp=0x1f29
tosp=0x1f2b
0xe:pti[0]      nrow=8  offs=0
0x12:pri[0]     offs=0x1f47
0x14:pri[1]     offs=0x1f3d
0x16:pri[2]     offs=0x1f33
0x18:pri[3]     offs=0x1f29
0x1a:pri[4]     offs=0x1f1f
0x1c:pri[5]     offs=0x1f15
0x1e:pri[6]     offs=0x1f08
0x20:pri[7]     offs=0x1efa

Oracle Formatted Block Dump: Data Layer
Tab: tab 0,row 0,offset
Cc:在这个行片段中的列数
Lb:锁字节:这个行被锁定的ITL条目
Fb:Flag字节
H=行片段头;K=集簇键;C=集簇表成员;D=删除的行记录;F=第一个数据片;L=最后数据片;P=紧接前一个数据片的第一个列
N=紧接下一个数据片的最后一个列
Tl:行大小(字节数加上数据)
Col:列数据
例如:

block_row_dump:
tab 0, row 0, @0x1f47
tl: 10 fb: --H-FL-- lb: 0x1  cc: 2
col  0: [ 2]  c1 02
col  1: [ 3]  6d 5f 41
tab 0, row 1, @0x1f3d
tl: 10 fb: --H-FL-- lb: 0x1  cc: 2
col  0: [ 2]  c1 03
col  1: [ 3]  6d 5f 42
tab 0, row 2, @0x1f33
tl: 10 fb: --H-FL-- lb: 0x1  cc: 2
col  0: [ 2]  c1 04
col  1: [ 3]  6d 5f 43
tab 0, row 3, @0x1f29
tl: 10 fb: --H-FL-- lb: 0x1  cc: 2
col  0: [ 2]  c1 05
col  1: [ 3]  6d 5f 44
tab 0, row 4, @0x1f1f
tl: 10 fb: --H-FL-- lb: 0x1  cc: 2
col  0: [ 2]  c1 06
col  1: [ 3]  6d 5f 45
tab 0, row 5, @0x1f15
tl: 10 fb: --H-FL-- lb: 0x1  cc: 2
col  0: [ 2]  c1 07
col  1: [ 3]  6d 5f 46
tab 0, row 6, @0x1f08
tl: 13 fb: --H-FL-- lb: 0x1  cc: 2
col  0: [ 2]  c1 08
col  1: [ 6]  6d 5f 4a 59 48 59
tab 0, row 7, @0x1efa
tl: 14 fb: --H-FL-- lb: 0x1  cc: 2
col  0: [ 2]  c1 09
col  1: [ 7]  6d 5f 4a 59 59 59 4c
end_of_block_dump
End dump data blocks tsn: 4 file#: 7 minblk 135 maxblk 135

Oracle数据块格式

Posted on 2020年1月17日2020年1月17日 by YongJing(敬勇)

数据块结构
Oracle 数据块有三部分:
Cache layer
Transaction layer
Data layer

Oracle Data Block的结构简图如下，其中从Data header到Row Data部分合称Data Layer:

———————
– Cache Layer –
———————
– Transaction Layer –
———————
– Data Header –
———————
– Table Directory –
———————
– Row Directory –
———————
– Free Space –
———————
– Row Data –
———————
– Tailchk –
———————
下面将使用bbed工具来显示相关结构信息

BBED> map
 File: /u01/app/oracle/oradata/shardcat/users01.dbf (7)
 Block: 135                                   Dba:0x01c00087
------------------------------------------------------------
 KTB Data Block (Table/Cluster)

 struct kcbh, 20 bytes                      @0

 struct ktbbh, 72 bytes                     @20

 struct kdbh, 14 bytes                      @100

 struct kdbt[1], 4 bytes                    @114

 sb2 kdbr[8]                                @118

 ub1 freespace[7983]                        @134

 ub1 rowdata[71]                            @8117

 ub4 tailchk                                @8188

数据块组件
Oracle数据块三层C结构，它被映射到SGA kcbh(内核缓存数据块头)中的数据块。Cache Layer包含关于块格式,类型(数据，索引，头等)信息和序列数据。

BBED> p kcbh
struct kcbh, 20 bytes                       @0
   ub1 type_kcbh                            @0        0x06
   ub1 frmt_kcbh                            @1        0xa2
   ub2 wrp2_kcbh                            @2        0x0000
   ub4 rdba_kcbh                            @4        0x01c00087
   ub4 bas_kcbh                             @8        0x01286184
   ub2 wrp_kcbh                             @12       0x0000
   ub1 seq_kcbh                             @14       0x01
   ub1 flg_kcbh                             @15       0x06 (KCBHFDLC, KCBHFCKV)
   ub2 chkval_kcbh                          @16       0x0137
   ub2 spare3_kcbh                          @18       0x0000

Cache Layer：Block的第一部分，长度为20字节，内部数据结构名为kcbh，包括
type_kcbh:块类型(table/index,rollback segment,temporary segment等)
frmt_kcbh:块格式(v6,v7,v8)
rdba_kcbh:块地址DBA
bas_kcbh/wrp_kcbh:SCN
seq_kcbh:块的序列号
flg_kcbh:块的标志

事务层存储了关于数据块的事务信息
Transaction Layer：内部结构名ktbbh。分成两部分，第一部分为固定长度，长度为24字节，包含事务相关的一些基本信息。第二部分为可变长度，包含itl，长度根据itl条目的个数变化，每个itl长度为24字节，内部结构名ktbbhitl

BBED> p ktbbh
struct ktbbh, 72 bytes                      @20
   ub1 ktbbhtyp                             @20       0x01 (KDDBTDATA)
   union ktbbhsid, 4 bytes                  @24
      ub4 ktbbhsg1                          @24       0x0001af27
      ub4 ktbbhod1                          @24       0x0001af27
   struct ktbbhcsc, 8 bytes                 @28
      ub4 kscnbas                           @28       0x01286182
      ub2 kscnwrp                           @32       0x8000
      ub2 kscnwrp2                          @34       0x0000
   sb2 ktbbhict                             @36       7938
   ub1 ktbbhflg                             @38       0x32 (NONE)
   ub1 ktbbhfsl                             @39       0x00
   ub4 ktbbhfnx                             @40       0x01c00080
   struct ktbbhitl[0], 24 bytes             @44
      struct ktbitxid, 8 bytes              @44
         ub2 kxidusn                        @44       0x0006
         ub2 kxidslt                        @46       0x0012
         ub4 kxidsqn                        @48       0x000038f6
      struct ktbituba, 8 bytes              @52
         ub4 kubadba                        @52       0x010002d9
         ub2 kubaseq                        @56       0x10c9
         ub1 kubarec                        @58       0x09
      ub2 ktbitflg                          @60       0x8000 (KTBFCOM)
      union _ktbitun, 2 bytes               @62
         sb2 _ktbitfsc                      @62      -32768
         ub2 _ktbitwrp                      @62       0x8000
      ub4 ktbitbas                          @64       0x01228dbb
   struct ktbbhitl[1], 24 bytes             @68
      struct ktbitxid, 8 bytes              @68
         ub2 kxidusn                        @68       0x0008
         ub2 kxidslt                        @70       0x001b
         ub4 kxidsqn                        @72       0x00003a9a
      struct ktbituba, 8 bytes              @76
         ub4 kubadba                        @76       0x010002b5
         ub2 kubaseq                        @80       0x114f
         ub1 kubarec                        @82       0x20
      ub2 ktbitflg                          @84       0x2001 (KTBFUPB)
      union _ktbitun, 2 bytes               @86
         sb2 _ktbitfsc                      @86       10
         ub2 _ktbitwrp                      @86       0x000a
      ub4 ktbitbas                          @88       0x01286184

这种结构出现在数据库每个数据块的开头部分。它甚至出现在不由redo改变的排序块中。它也会出现在数据文件头块和控制文件头块的开头部分。缓存层提供了对坏数据的规模。它也用来确保正确的数据块被读取并且数据块没有破裂或损坏。所谓破裂的数据块就是只有一部分被写入磁盘，数据块的一部分保留了之前的版本。

Data Layer：包括Data Header，Table Directory，Row Directory，Free Space和Row Data。

Data Header：长度14字节，内部数据结构名kdbh

BBED> p kdbh
struct kdbh, 14 bytes                       @100
   ub1 kdbhflag                             @100      0x00 (NONE)
   sb1 kdbhntab                             @101      1
   sb2 kdbhnrow                             @102      8
   sb2 kdbhfrre                             @104     -1
   sb2 kdbhfsbo                             @106      34
   sb2 kdbhfseo                             @108      8017
   sb2 kdbhavsp                             @110      7977
   sb2 kdbhtosp                             @112      7989

其中kdbhnrow是存储在数据块中的记录数为8,而表中确实有8条记录。(从ROWID可以判断出来)

SQL> select dbms_rowid.rowid_block_number(rowid),t1.t_id,t1.t_name from jy.t1;

DBMS_ROWID.ROWID_BLOCK_NUMBER(ROWID)       T_ID T_NAME
------------------------------------ ---------- ----------------------------------------
                                 134          9 YYL
                                 135          1 A
                                 135          2 B
                                 135          3 C
                                 135          4 D
                                 135          5 E
                                 135          6 F
                                 135          7 JYHY
                                 135          8 JYYYL

9 rows selected.

Table Directory: 一般table只有一个条目，cluster则有一个或多个条目。每个条目长4字节，内部数据结构名kdbt

BBED> p kdbt
struct kdbt[0], 4 bytes                     @114
   sb2 kdbtoffs                             @114      0
   sb2 kdbtnrow                             @116      8

Row Directory：数目由块中数据的行数决定，每个条目长2字节，内部数据结构名kdbr

BBED> p kdbr
sb2 kdbr[0]                                 @118      8080
sb2 kdbr[1]                                 @120      8072
sb2 kdbr[2]                                 @122      8064
sb2 kdbr[3]                                 @124      8056
sb2 kdbr[4]                                 @126      8048
sb2 kdbr[5]                                 @128      8040
sb2 kdbr[6]                                 @130      8029
sb2 kdbr[7]                                 @132      8017

查看表中的记录数据,一个重复计数也可以被指定用来重复执行examine命令来检查后续的行记录。下面的例子显示了先使用print命令来设置最后一行记录的偏移量，然后检查后面的8行记录的操作.

BBED> p *kdbr[7]
rowdata[0]
----------
ub1 rowdata[0]                              @8117     0x2c

BBED> x /8rnc
rowdata[0]                                  @8117
----------
flag@8117: 0x2c (KDRHFL, KDRHFF, KDRHFH)
lock@8118: 0x02
cols@8119:    2

col    0[2] @8120: 8
col    1[5] @8123: JYYYL

rowdata[12]                                 @8129
-----------
flag@8129: 0x2c (KDRHFL, KDRHFF, KDRHFH)
lock@8130: 0x00
cols@8131:    2

col    0[2] @8132: 7
col    1[4] @8135: JYHY

rowdata[23]                                 @8140
-----------
flag@8140: 0x2c (KDRHFL, KDRHFF, KDRHFH)
lock@8141: 0x00
cols@8142:    2

col    0[2] @8143: 6
col    1[1] @8146: F

rowdata[31]                                 @8148
-----------
flag@8148: 0x2c (KDRHFL, KDRHFF, KDRHFH)
lock@8149: 0x00
cols@8150:    2

col    0[2] @8151: 5
col    1[1] @8154: E

rowdata[39]                                 @8156
-----------
flag@8156: 0x2c (KDRHFL, KDRHFF, KDRHFH)
lock@8157: 0x00
cols@8158:    2

col    0[2] @8159: 4
col    1[1] @8162: D

rowdata[47]                                 @8164
-----------
flag@8164: 0x2c (KDRHFL, KDRHFF, KDRHFH)
lock@8165: 0x00
cols@8166:    2

col    0[2] @8167: 3
col    1[1] @8170: C

rowdata[55]                                 @8172
-----------
flag@8172: 0x2c (KDRHFL, KDRHFF, KDRHFH)
lock@8173: 0x00
cols@8174:    2

col    0[2] @8175: 2
col    1[1] @8178: B

rowdata[63]                                 @8180
-----------
flag@8180: 0x2c (KDRHFL, KDRHFF, KDRHFH)
lock@8181: 0x00
cols@8182:    2

col    0[2] @8183: 1
col    1[1] @8186: A

Free Space：表示数据块中可用空间，内部数据结构名freespace

Row Data：表示实际的数据，内部数据结构名rowdata

BBED> p rowdata[0]
ub1 rowdata[0]                              @8117     0x2c

BBED> d /v offset 8117
 File: /u01/app/oracle/oradata/shardcat/users01.dbf (7)
 Block: 135     Offsets: 8117 to 8191  Dba:0x01c00087
-------------------------------------------------------
 2c020202 c109054a 5959594c 2c000202 l ,......JYYYL,...
 c108044a 5948592c 000202c1 0701462c l ...JYHY,......F,
 000202c1 0601452c 000202c1 0501442c l ......E,......D,
 000202c1 0401432c 000202c1 0301422c l ......C,......B,
 000202c1 02014101 068461            l ......A...a

 <16 bytes per line>

Tailchk:保存在块结尾用于校验的数据，长度4个字节，内部结构名tailchk。所有Oracle块的最后四个字节都是tail check(结尾检查)。
对于一个Oracle 8以上版本的数据块的tail它是由SCN base的低位两字节的内容，数据块的类型与SCN序列号组成的。例如，如果SCN base为 0x01286184，数据块类型为0x06，SCN序列号为0x01，那么tail check将是0x61840601

BBED> p tailchk
ub4 tailchk                                 @8188     0x61840601

BBED> p kcbh
struct kcbh, 20 bytes                       @0
   ub1 type_kcbh                            @0        0x06
   ub1 frmt_kcbh                            @1        0xa2
   ub2 wrp2_kcbh                            @2        0x0000
   ub4 rdba_kcbh                            @4        0x01c00087
   ub4 bas_kcbh                             @8        0x01286184
   ub2 wrp_kcbh                             @12       0x0000
   ub1 seq_kcbh                             @14       0x01
   ub1 flg_kcbh                             @15       0x06 (KCBHFDLC, KCBHFCKV)
   ub2 chkval_kcbh                          @16       0x0137
   ub2 spare3_kcbh                          @18       0x0000

虽然tail check的值通常是由这三个组件级成，Oracle会对最终的值作为一个值(4字节)以单无符号整数来存储。在小字节序编码(little-endian)的构架机器中，比如Intel，这个值将以低位字节优先的方式来存储。因此如果使用标准块编辑器或dump命令来检查数据块的tail check时，字节顺序可能不一样。一个tail check为0x61840601，在Intel机器上它将以”01068461″的形式存储在磁盘。

BBED> dump /v offset 8188
 File: /u01/app/oracle/oradata/shardcat/users01.dbf (7)
 Block: 135     Offsets: 8188 to 8191  Dba:0x01c00087
-------------------------------------------------------
 01068461                            l ...a

 <16 bytes per line>

Oracle Linux 6.7中 Oracle 11.2.0.4 RAC集群CRS异常处理

Posted on 2020年1月4日 by YongJing(敬勇)

最近一个月单位上的Oracle RAC集群CRS异常导致集群命令无法使用，执行crsctl stat res -t命令出现如下错误:

[grid@db1 ~]$ crsctl stat res -t
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4000: Command Status failed, or completed with errors.

[grid@db2 ~]$ crsctl stat res -t
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4000: Command Status failed, or completed with errors.

但是数据库可以正常访问，业务系统也运行正常

[root@db1 ~]# ps -ef | grep pmon
root       8024 204594  0 21:11 pts/0    00:00:00 grep pmon
grid      77120      1  0 Dec21 ?        00:04:21 asm_pmon_+ASM1
oracle    77790      1  0 Dec21 ?        00:05:18 ora_pmon_CAIWU1
oracle    77794      1  0 Dec21 ?        00:05:08 ora_pmon_dadb1
oracle    77848      1  0 Dec21 ?        00:05:39 ora_pmon_chdyl1
oracle    77910      1  0 Dec21 ?        00:07:47 ora_pmon_RLZY1

[root@db2 ~]# ps -ef | grep pmon
grid      27745      1  0 Dec21 ?        00:04:21 asm_pmon_+ASM2
oracle    28393      1  0 Dec21 ?        00:05:21 ora_pmon_dadb2
oracle    28569      1  0 Dec21 ?        00:04:58 ora_pmon_CAIWU2
oracle    28573      1  0 Dec21 ?        00:05:36 ora_pmon_chdyl2
oracle    28583      1  0 Dec21 ?        00:07:49 ora_pmon_RLZY2

查看ASM磁盘组的状态，发现OCR磁盘组确实offline了

[grid@db1 ~]$ asmcmd lsdg
State    Type    Rebal  Sector  Block       AU  Total_MB  Free_MB  Req_mir_free_MB  Usable_file_MB  Offline_disks  Voting_files  Name
MOUNTED  EXTERN  N         512   4096  1048576   3072000  2679522                0         2679522              0             N  ARCH/
MOUNTED  EXTERN  N         512   4096  1048576    204800   151138                0          151138              0             N  CWDATA/
MOUNTED  EXTERN  N         512   4096  1048576    512000   472546                0          472546              0             N  DADATA/
MOUNTED  EXTERN  N         512   4096  1048576   3072000   595334                0          595334              0             N  DATA/
MOUNTED  EXTERN  N         512   4096  1048576   1843200   609953                0          609953              0             N  SBDATA/

[grid@db2 ~]$ asmcmd lsdg
State    Type    Rebal  Sector  Block       AU  Total_MB  Free_MB  Req_mir_free_MB  Usable_file_MB  Offline_disks  Voting_files  Name
MOUNTED  EXTERN  N         512   4096  1048576   3072000  2679522                0         2679522              0             N  ARCH/
MOUNTED  EXTERN  N         512   4096  1048576    204800   151138                0          151138              0             N  CWDATA/
MOUNTED  EXTERN  N         512   4096  1048576    512000   472546                0          472546              0             N  DADATA/
MOUNTED  EXTERN  N         512   4096  1048576   3072000   595334                0          595334              0             N  DATA/
MOUNTED  EXTERN  N         512   4096  1048576   1843200   609953                0          609953              0             N  SBDATA/

手工将crsdg上线，命令能够执行成功，但执行crsctl stat res -t命令仍然报错。

[grid@db1 ~]$ sqlplus / as sysasm

SQL*Plus: Release 11.2.0.4.0 Production on Mon Dec 30 21:15:33 2019

Copyright (c) 1982, 2013, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options

SQL> alter diskgroup ocr mount;

Diskgroup altered.

SQL> exit
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options

[grid@db1 ~]$ crsctl stat res -t
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4000: Command Status failed, or completed with errors.

[grid@db2 ~]$ sqlplus / as sysasm

SQL*Plus: Release 11.2.0.4.0 Production on Mon Dec 30 21:15:05 2019

Copyright (c) 1982, 2013, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options

SQL> alter diskgroup ocr mount;

Diskgroup altered.

SQL> exit
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options

[grid@db2 ~]$ crsctl stat res -t
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4000: Command Status failed, or completed with errors.

检查节点db1的alert_+ASM1.log有如下报错，说是不能访问OCR磁盘组的相关磁盘而强制dismount了OCR磁盘，但使用dd命令测试是能够访问的

Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_77212.trc:
ORA-15078: ASM diskgroup was forcibly dismounted
WARNING: requested mirror side 1 of virtual extent 0 logical extent 0 offset 102400 is not allocated; I/O request failed
WARNING: requested mirror side 2 of virtual extent 0 logical extent 1 offset 102400 is not allocated; I/O request failed
Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_77212.trc:
ORA-15078: ASM diskgroup was forcibly dismounted
ORA-15078: ASM diskgroup was forcibly dismounted
Sat Dec 28 05:30:48 2019
SQL> alter diskgroup OCR check /* proxy */ 
ORA-15032: not all alterations performed
ORA-15001: diskgroup "OCR" does not exist or is not mounted
ERROR: alter diskgroup OCR check /* proxy */
NOTE: client exited [77184]
Sat Dec 28 05:30:49 2019
NOTE: [crsd.bin@db1 (TNS V1-V3) 35285] opening OCR file
Sat Dec 28 05:30:51 2019
NOTE: [crsd.bin@db1 (TNS V1-V3) 35305] opening OCR file
Sat Dec 28 05:30:53 2019
NOTE: [crsd.bin@db1 (TNS V1-V3) 35322] opening OCR file
Sat Dec 28 05:30:55 2019
NOTE: [crsd.bin@db1 (TNS V1-V3) 35346] opening OCR file
Sat Dec 28 05:30:57 2019
NOTE: [crsd.bin@db1 (TNS V1-V3) 35363] opening OCR file
Sat Dec 28 05:31:00 2019
NOTE: [crsd.bin@db1 (TNS V1-V3) 35459] opening OCR file
Sat Dec 28 05:31:02 2019
NOTE: [crsd.bin@db1 (TNS V1-V3) 35481] opening OCR file
Sat Dec 28 05:31:04 2019
NOTE: [crsd.bin@db1 (TNS V1-V3) 35520] opening OCR file
Sat Dec 28 05:31:06 2019
NOTE: [crsd.bin@db1 (TNS V1-V3) 35539] opening OCR file
Sat Dec 28 05:31:08 2019
NOTE: [crsd.bin@db1 (TNS V1-V3) 35557] opening OCR file
Sat Dec 28 21:00:10 2019
Warning: VKTM detected a time drift.
Time drifts can result in an unexpected behavior such as time-outs. Please check trace file for more details.

检查错误日志

[root@db1 ~]# more /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_77212.trc
Trace file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_77212.trc
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options
ORACLE_HOME = /u01/app/11.2.0/grid
System name:    Linux
Node name:      db1
Release:        3.8.13-68.3.4.el6uek.x86_64
Version:        #2 SMP Tue Jul 14 15:03:36 PDT 2015
Machine:        x86_64
Instance name: +ASM1
Redo thread mounted by this instance: 0 
Oracle process number: 24
Unix process pid: 77212, image: oracle@db1 (TNS V1-V3)


*** 2019-12-28 05:30:44.894
*** SESSION ID:(2929.3) 2019-12-28 05:30:44.894
*** CLIENT ID:() 2019-12-28 05:30:44.894
*** SERVICE NAME:() 2019-12-28 05:30:44.894
*** MODULE NAME:(crsd.bin@db1 (TNS V1-V3)) 2019-12-28 05:30:44.894
*** ACTION NAME:() 2019-12-28 05:30:44.894
 
Received ORADEBUG command (#1) 'CLEANUP_KFK_FD' from process 'Unix process pid: 35253, image: '

*** 2019-12-28 05:30:44.895
Finished processing ORADEBUG command (#1) 'CLEANUP_KFK_FD'

*** 2019-12-28 05:30:48.235
WARNING:failed xlate 1 
ORA-15078: ASM diskgroup was forcibly dismounted
ksfdrfms:Mirror Read file=+OCR.255.4294967295 fob=0x9b00e5d8 bufp=0x7f5dd012ba00 blkno=25 nbytes=4096
WARNING:failed xlate 1 
WARNING: requested mirror side 1 of virtual extent 0 logical extent 0 offset 102400 is not allocated; I/O request failed
ksfdrfms:Read failed from mirror side=1 logical extent number=0 dskno=65535
WARNING:failed xlate 1 
WARNING: requested mirror side 2 of virtual extent 0 logical extent 1 offset 102400 is not allocated; I/O request failed
ksfdrfms:Read failed from mirror side=2 logical extent number=1 dskno=65535
ORA-15078: ASM diskgroup was forcibly dismounted
ORA-15078: ASM diskgroup was forcibly dismounted

检查节点db1的alertdb1.log有如下报错,也是说不能访问OCR磁盘组的相关磁盘

2019-12-28 05:30:48.468: 
[/u01/app/11.2.0/grid/bin/oraagent.bin(77466)]CRS-5822:Agent '/u01/app/11.2.0/grid/bin/oraagent_grid' disconnected from server. Details at (:CRSAGF00117:) {0:1:4} in /u01/app/11.2.0/grid/log/db1/agent/crsd/oraagent_grid/oraagent_grid.lo
g.
2019-12-28 05:30:48.468: 
[/u01/app/11.2.0/grid/bin/oraagent.bin(77684)]CRS-5822:Agent '/u01/app/11.2.0/grid/bin/oraagent_oracle' disconnected from server. Details at (:CRSAGF00117:) {0:7:332} in /u01/app/11.2.0/grid/log/db1/agent/crsd/oraagent_oracle/oraagent_o
racle.log.
2019-12-28 05:30:48.471: 
[/u01/app/11.2.0/grid/bin/orarootagent.bin(77482)]CRS-5822:Agent '/u01/app/11.2.0/grid/bin/orarootagent_root' disconnected from server. Details at (:CRSAGF00117:) {0:5:11497} in /u01/app/11.2.0/grid/log/db1/agent/crsd/orarootagent_root/
orarootagent_root.log.
2019-12-28 05:30:48.480: 
[ohasd(33022)]CRS-2765:Resource 'ora.crsd' has failed on server 'db1'.
2019-12-28 05:30:50.003: 
[crsd(35285)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/db1/crsd/crsd.log.
2019-12-28 05:30:50.021: 
[crsd(35285)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/db1/crsd/crsd.log.
2019-12-28 05:30:50.520: 
[ohasd(33022)]CRS-2765:Resource 'ora.crsd' has failed on server 'db1'.
2019-12-28 05:30:51.918: 
[crsd(35305)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/db1/crsd/crsd.log.
2019-12-28 05:30:51.929: 
[crsd(35305)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/db1/crsd/crsd.log.
2019-12-28 05:30:52.557: 
[ohasd(33022)]CRS-2765:Resource 'ora.crsd' has failed on server 'db1'.
2019-12-28 05:30:53.945: 
[crsd(35322)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/db1/crsd/crsd.log.
2019-12-28 05:30:53.956: 
[crsd(35322)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/db1/crsd/crsd.log.
2019-12-28 05:30:54.595: 
[ohasd(33022)]CRS-2765:Resource 'ora.crsd' has failed on server 'db1'.
2019-12-28 05:30:55.976: 
[crsd(35346)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/db1/crsd/crsd.log.
2019-12-28 05:30:55.988: 
[crsd(35346)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/db1/crsd/crsd.log.
2019-12-28 05:30:56.633: 
[ohasd(33022)]CRS-2765:Resource 'ora.crsd' has failed on server 'db1'.
2019-12-28 05:30:58.010: 
[crsd(35363)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/db1/crsd/crsd.log.
2019-12-28 05:30:58.020: 
[crsd(35363)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/db1/crsd/crsd.log.
2019-12-28 05:30:58.669: 
[ohasd(33022)]CRS-2765:Resource 'ora.crsd' has failed on server 'db1'.
2019-12-28 05:31:00.043: 
[crsd(35459)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/db1/crsd/crsd.log.
2019-12-28 05:31:00.054: 
[crsd(35459)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/db1/crsd/crsd.log.
2019-12-28 05:31:00.706: 
[ohasd(33022)]CRS-2765:Resource 'ora.crsd' has failed on server 'db1'.
2019-12-28 05:31:02.093: 
[crsd(35481)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/db1/crsd/crsd.log.
2019-12-28 05:31:02.103: 
[crsd(35481)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/db1/crsd/crsd.log.
2019-12-28 05:31:02.742: 
[ohasd(33022)]CRS-2765:Resource 'ora.crsd' has failed on server 'db1'.
2019-12-28 05:31:04.109: 
[crsd(35520)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/db1/crsd/crsd.log.
2019-12-28 05:31:04.119: 
[crsd(35520)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/db1/crsd/crsd.log.
2019-12-28 05:31:04.777: 
[ohasd(33022)]CRS-2765:Resource 'ora.crsd' has failed on server 'db1'.
2019-12-28 05:31:06.141: 
[crsd(35539)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/db1/crsd/crsd.log.
2019-12-28 05:31:06.151: 
[crsd(35539)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/db1/crsd/crsd.log.
2019-12-28 05:31:06.810: 
[ohasd(33022)]CRS-2765:Resource 'ora.crsd' has failed on server 'db1'.
2019-12-28 05:31:08.181: 
[crsd(35557)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/db1/crsd/crsd.log.
2019-12-28 05:31:08.191: 
[crsd(35557)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/db1/crsd/crsd.log.
2019-12-28 05:31:08.846: 
[ohasd(33022)]CRS-2765:Resource 'ora.crsd' has failed on server 'db1'.
2019-12-28 05:31:08.847: 
[ohasd(33022)]CRS-2771:Maximum restart attempts reached for resource 'ora.crsd'; will not restart.
2019-12-28 05:31:08.848: 
[ohasd(33022)]CRS-2769:Unable to failover resource 'ora.crsd'.

检查节点db1的oraagent_grid.log文件有如下报错，显示OCR磁盘组的状态被改变为offline了。

2019-12-28 05:30:16.531: [    AGFW][511039232]{1:30746:2} Agent received the message: AGENT_HB[Engine] ID 12293:113720
2019-12-28 05:30:37.808: [    AGFW][511039232]{1:30746:9373} Agent received the message: RESOURCE_STOP[ora.OCR.dg db1 1] ID 4099:113730
2019-12-28 05:30:37.808: [    AGFW][511039232]{1:30746:9373} Preparing STOP command for: ora.OCR.dg db1 1
2019-12-28 05:30:37.808: [    AGFW][511039232]{1:30746:9373} ora.OCR.dg db1 1 state changed from: ONLINE to: STOPPING
2019-12-28 05:30:37.809: [ora.OCR.dg][513140480]{1:30746:9373} [stop] (:CLSN00108:) clsn_agent::stop {
2019-12-28 05:30:37.810: [ora.OCR.dg][513140480]{1:30746:9373} [stop] DgpAgent::stop: enter { 
2019-12-28 05:30:37.810: [ora.OCR.dg][513140480]{1:30746:9373} [stop] getResAttrib: attrib name USR_ORA_OPI value true len 4
2019-12-28 05:30:37.810: [ora.OCR.dg][513140480]{1:30746:9373} [stop] Agent::flagUsrOraOpiIsSet(true) reason not dependency
2019-12-28 05:30:37.810: [ora.OCR.dg][513140480]{1:30746:9373} [stop] DgpAgent::stop: tha exit }
2019-12-28 05:30:37.810: [ora.OCR.dg][513140480]{1:30746:9373} [stop] DgpAgent::stopSingle status:2 }
2019-12-28 05:30:37.810: [ora.OCR.dg][513140480]{1:30746:9373} [stop] (:CLSN00108:) clsn_agent::stop }
2019-12-28 05:30:37.810: [    AGFW][513140480]{1:30746:9373} Command: stop for resource: ora.OCR.dg db1 1 completed with status: SUCCESS
2019-12-28 05:30:37.810: [ora.OCR.dg][513140480]{1:30746:9373} [check] CrsCmd::ClscrsCmdData::stat entity 1 statflag 33 useFilter 0
2019-12-28 05:30:37.811: [    AGFW][511039232]{1:30746:9373} Agent sending reply for: RESOURCE_STOP[ora.OCR.dg db1 1] ID 4099:113730
2019-12-28 05:30:37.838: [ora.OCR.dg][513140480]{1:30746:9373} [check] DgpAgent::runCheck: asm stat asmRet 0
2019-12-28 05:30:37.839: [ora.OCR.dg][513140480]{1:30746:9373} [check] DgpAgent::getConnxn connected
2019-12-28 05:30:37.844: [ora.OCR.dg][513140480]{1:30746:9373} [check] DgpAgent::queryDgStatus excp no data found
2019-12-28 05:30:37.844: [ora.OCR.dg][513140480]{1:30746:9373} [check] DgpAgent::queryDgStatus no data found in v$asm_diskgroup_stat
2019-12-28 05:30:37.844: [ora.OCR.dg][513140480]{1:30746:9373} [check] DgpAgent::queryDgStatus dgName OCR ret 1
2019-12-28 05:30:37.845: [    AGFW][511039232]{1:30746:9373} ora.OCR.dg db1 1 state changed from: STOPPING to: OFFLINE
2019-12-28 05:30:37.845: [    AGFW][511039232]{1:30746:9373} Agent sending last reply for: RESOURCE_STOP[ora.OCR.dg db1 1] ID 4099:113730
2019-12-28 05:30:43.889: [ora.asm][503641856]{1:30746:2} [check] CrsCmd::ClscrsCmdData::stat entity 1 statflag 33 useFilter 0
2019-12-28 05:30:43.920: [ora.asm][503641856]{1:30746:2} [check] AsmProxyAgent::check clsagfw_res_status 0
2019-12-28 05:30:48.465: [ CRSCOMM][521545472] IpcC: IPC client connection 6c to member 0 has been removed
2019-12-28 05:30:48.465: [CLSFRAME][521545472] Removing IPC Member:{Relative|Node:0|Process:0|Type:1}
2019-12-28 05:30:48.465: [CLSFRAME][521545472] Disconnected from CRSD:db1 process: {Relative|Node:0|Process:0|Type:1}
2019-12-28 05:30:48.474: [   AGENT][511039232]{0:1:4} {0:1:4} Created alert : (:CRSAGF00117:) :  Disconnected from server, Agent is shutting down.
2019-12-28 05:30:48.474: [    AGFW][511039232]{0:1:4} Agent is exiting with exit code: 1

检查节点db2的alert_+ASM2.log有如下报错，出现了类似的”Waited 15 secs for write IO to PST disk 0 in group 1″信息，这说明对OCR磁盘组执行写操作时超时了15秒.

Sat Dec 28 03:02:51 2019
WARNING: Waited 15 secs for write IO to PST disk 0 in group 1.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 1.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 3.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 3.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 5.
WARNING: Waited 15 secs for write IO to PST disk 1 in group 5.
WARNING: Waited 15 secs for write IO to PST disk 2 in group 5.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 5.
WARNING: Waited 15 secs for write IO to PST disk 1 in group 5.
WARNING: Waited 15 secs for write IO to PST disk 2 in group 5.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 6.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 6.
Sat Dec 28 03:02:51 2019
NOTE: process _b000_+asm2 (45488) initiating offline of disk 0.3916160907 (OCR1) with mask 0x7e in group 5
NOTE: process _b000_+asm2 (45488) initiating offline of disk 1.3916160906 (OCR2) with mask 0x7e in group 5
NOTE: process _b000_+asm2 (45488) initiating offline of disk 2.3916160905 (OCR3) with mask 0x7e in group 5
NOTE: checking PST: grp = 5
GMON checking disk modes for group 5 at 19 for pid 27, osid 45488
ERROR: no read quorum in group: required 2, found 0 disks
NOTE: checking PST for grp 5 done.
NOTE: initiating PST update: grp = 5, dsk = 0/0xe96bdf8b, mask = 0x6a, op = clear
NOTE: initiating PST update: grp = 5, dsk = 1/0xe96bdf8a, mask = 0x6a, op = clear
NOTE: initiating PST update: grp = 5, dsk = 2/0xe96bdf89, mask = 0x6a, op = clear
GMON updating disk modes for group 5 at 20 for pid 27, osid 45488
ERROR: no read quorum in group: required 2, found 0 disks
Sat Dec 28 03:02:51 2019
NOTE: cache dismounting (not clean) group 5/0x8F5B2F9F (OCR) 
NOTE: messaging CKPT to quiesce pins Unix process pid: 45490, image: oracle@db2 (B001)
Sat Dec 28 03:02:51 2019
NOTE: halting all I/Os to diskgroup 5 (OCR)
Sat Dec 28 03:02:52 2019
NOTE: LGWR doing non-clean dismount of group 5 (OCR)
NOTE: LGWR sync ABA=23.100 last written ABA 23.100
WARNING: Offline for disk OCR1 in mode 0x7f failed.
WARNING: Offline for disk OCR2 in mode 0x7f failed.
WARNING: Offline for disk OCR3 in mode 0x7f failed.
Sat Dec 28 03:02:52 2019
kjbdomdet send to inst 1
detach from dom 5, sending detach message to inst 1
Sat Dec 28 03:02:52 2019
List of instances:
 1 2
Dirty detach reconfiguration started (new ddet inc 1, cluster inc 36)
 Global Resource Directory partially frozen for dirty detach
* dirty detach - domain 5 invalid = TRUE 
 0 GCS resources traversed, 0 cancelled
Dirty Detach Reconfiguration complete
Sat Dec 28 03:02:52 2019
WARNING: dirty detached from domain 5
NOTE: cache dismounted group 5/0x8F5B2F9F (OCR) 
SQL> alter diskgroup OCR dismount force /* ASM SERVER:2405117855 */ 
Sat Dec 28 03:02:52 2019
NOTE: cache deleting context for group OCR 5/0x8f5b2f9f
GMON dismounting group 5 at 21 for pid 28, osid 45490
NOTE: Disk OCR1 in mode 0x7f marked for de-assignment
NOTE: Disk OCR2 in mode 0x7f marked for de-assignment
NOTE: Disk OCR3 in mode 0x7f marked for de-assignment
NOTE:Waiting for all pending writes to complete before de-registering: grpnum 5
Sat Dec 28 03:03:03 2019
WARNING: Waited 27 secs for write IO to PST disk 0 in group 1.
WARNING: Waited 27 secs for write IO to PST disk 0 in group 1.
WARNING: Waited 24 secs for write IO to PST disk 0 in group 2.
WARNING: Waited 24 secs for write IO to PST disk 0 in group 2.
WARNING: Waited 27 secs for write IO to PST disk 0 in group 3.
WARNING: Waited 27 secs for write IO to PST disk 0 in group 3.
WARNING: Waited 21 secs for write IO to PST disk 0 in group 4.
WARNING: Waited 21 secs for write IO to PST disk 0 in group 4.
WARNING: Waited 27 secs for write IO to PST disk 0 in group 6.
WARNING: Waited 27 secs for write IO to PST disk 0 in group 6.
Sat Dec 28 03:03:03 2019
ASM Health Checker found 1 new failures
Sat Dec 28 03:03:22 2019
SUCCESS: diskgroup OCR was dismounted
SUCCESS: alter diskgroup OCR dismount force /* ASM SERVER:2405117855 */
SUCCESS: ASM-initiated MANDATORY DISMOUNT of group OCR
Sat Dec 28 03:03:22 2019
NOTE: diskgroup resource ora.OCR.dg is offline
Sat Dec 28 03:03:22 2019
Errors in file /u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_27831.trc:
ORA-15078: ASM diskgroup was forcibly dismounted
Errors in file /u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_27831.trc:
ORA-15078: ASM diskgroup was forcibly dismounted
Errors in file /u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_27831.trc:
ORA-15078: ASM diskgroup was forcibly dismounted
Errors in file /u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_27831.trc:
ORA-15078: ASM diskgroup was forcibly dismounted
Errors in file /u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_27831.trc:
ORA-15078: ASM diskgroup was forcibly dismounted
Errors in file /u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_27831.trc:
ORA-15078: ASM diskgroup was forcibly dismounted
Errors in file /u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_27831.trc:
ORA-15078: ASM diskgroup was forcibly dismounted
Errors in file /u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_27831.trc:
ORA-15078: ASM diskgroup was forcibly dismounted
Errors in file /u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_27831.trc:
ORA-15078: ASM diskgroup was forcibly dismounted
Sat Dec 28 05:30:34 2019
WARNING: Waited 15 secs for write IO to PST disk 0 in group 1.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 1.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 6.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 6.
Sat Dec 28 05:30:37 2019
 Received dirty detach msg from inst 1 for dom 5
Sat Dec 28 05:30:37 2019
List of instances:
 1 2
Dirty detach reconfiguration started (new ddet inc 2, cluster inc 36)
 Global Resource Directory partially frozen for dirty detach
* dirty detach - domain 5 invalid = TRUE 
 0 GCS resources traversed, 0 cancelled
freeing rdom 5
Dirty Detach Reconfiguration complete
Sat Dec 28 05:30:37 2019
Errors in file /u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_27831.trc:
ORA-15078: ASM diskgroup was forcibly dismounted
Errors in file /u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_27831.trc:
ORA-15078: ASM diskgroup was forcibly dismounted
Errors in file /u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_27831.trc:
ORA-15078: ASM diskgroup was forcibly dismounted
WARNING: requested mirror side 1 of virtual extent 5 logical extent 0 offset 704512 is not allocated; I/O request failed
WARNING: requested mirror side 2 of virtual extent 5 logical extent 1 offset 704512 is not allocated; I/O request failed
Errors in file /u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_27831.trc:
ORA-15078: ASM diskgroup was forcibly dismounted
ORA-15078: ASM diskgroup was forcibly dismounted
Sat Dec 28 05:30:37 2019
SQL> alter diskgroup OCR check /* proxy */ 
ORA-15032: not all alterations performed
ORA-15001: diskgroup "OCR" does not exist or is not mounted
ERROR: alter diskgroup OCR check /* proxy */
Sat Dec 28 05:30:44 2019
WARNING: Waited 20 secs for write IO to PST disk 0 in group 2.
WARNING: Waited 20 secs for write IO to PST disk 0 in group 2.
Sat Dec 28 05:30:48 2019
NOTE: client exited [27819]
Sat Dec 28 05:30:49 2019
NOTE: [crsd.bin@db2 (TNS V1-V3) 142641] opening OCR file
Sat Dec 28 05:30:51 2019
NOTE: [crsd.bin@db2 (TNS V1-V3) 142660] opening OCR file
Sat Dec 28 05:30:53 2019
NOTE: [crsd.bin@db2 (TNS V1-V3) 142678] opening OCR file
Sat Dec 28 05:30:55 2019
NOTE: [crsd.bin@db2 (TNS V1-V3) 142696] opening OCR file
Sat Dec 28 05:30:57 2019
NOTE: [crsd.bin@db2 (TNS V1-V3) 142723] opening OCR file
Sat Dec 28 05:30:59 2019
NOTE: [crsd.bin@db2 (TNS V1-V3) 142744] opening OCR file
Sat Dec 28 05:31:01 2019
NOTE: [crsd.bin@db2 (TNS V1-V3) 142773] opening OCR file
Sat Dec 28 05:31:03 2019
NOTE: [crsd.bin@db2 (TNS V1-V3) 142792] opening OCR file
Sat Dec 28 05:31:05 2019
NOTE: [crsd.bin@db2 (TNS V1-V3) 142806] opening OCR file
Sat Dec 28 05:31:07 2019
NOTE: [crsd.bin@db2 (TNS V1-V3) 142821] opening OCR file
Sat Dec 28 06:18:42 2019
WARNING: Waited 15 secs for write IO to PST disk 0 in group 1.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 1.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 2.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 2.

检查节点db2的alertdb2.log有如下报错,也是说不能访问OCR磁盘组的相关磁盘，但使用dd命令测试是可以访问的。

2019-12-28 05:30:48.019: 
[/u01/app/11.2.0/grid/bin/oraagent.bin(28268)]CRS-5822:Agent '/u01/app/11.2.0/grid/bin/oraagent_oracle' disconnected from server. Details at (:CRSAGF00117:) {0:7:73} in /u01/app/11.2.0/grid/log/db2/agent/crsd/oraagent_oracle/oraagent_oracle.log.
2019-12-28 05:30:48.019: 
[/u01/app/11.2.0/grid/bin/scriptagent.bin(37953)]CRS-5822:Agent '/u01/app/11.2.0/grid/bin/scriptagent_grid' disconnected from server. Details at (:CRSAGF00117:) {0:9:8} in /u01/app/11.2.0/grid/log/db2/agent/crsd/scriptagent_grid/scriptagent_grid.log.
2019-12-28 05:30:48.020: 
[/u01/app/11.2.0/grid/bin/oraagent.bin(28009)]CRS-5822:Agent '/u01/app/11.2.0/grid/bin/oraagent_grid' disconnected from server. Details at (:CRSAGF00117:) {0:1:10} in /u01/app/11.2.0/grid/log/db2/agent/crsd/oraagent_grid/oraagent_grid.log.
2019-12-28 05:30:48.021: 
[/u01/app/11.2.0/grid/bin/orarootagent.bin(28025)]CRS-5822:Agent '/u01/app/11.2.0/grid/bin/orarootagent_root' disconnected from server. Details at (:CRSAGF00117:) {0:5:373} in /u01/app/11.2.0/grid/log/db2/agent/crsd/orarootagent_root/orarootagent_root.log.
2019-12-28 05:30:48.024: 
[ohasd(13034)]CRS-2765:Resource 'ora.crsd' has failed on server 'db2'.
2019-12-28 05:30:49.410: 
[crsd(142641)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/db2/crsd/crsd.log.
2019-12-28 05:30:49.420: 
[crsd(142641)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/db2/crsd/crsd.log.
2019-12-28 05:30:50.063: 
[ohasd(13034)]CRS-2765:Resource 'ora.crsd' has failed on server 'db2'.
2019-12-28 05:30:51.442: 
[crsd(142660)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/db2/crsd/crsd.log.
2019-12-28 05:30:51.451: 
[crsd(142660)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/db2/crsd/crsd.log.
2019-12-28 05:30:52.100: 
[ohasd(13034)]CRS-2765:Resource 'ora.crsd' has failed on server 'db2'.
2019-12-28 05:30:53.471: 
[crsd(142678)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/db2/crsd/crsd.log.
2019-12-28 05:30:53.480: 
[crsd(142678)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/db2/crsd/crsd.log.
2019-12-28 05:30:54.138: 
[ohasd(13034)]CRS-2765:Resource 'ora.crsd' has failed on server 'db2'.
2019-12-28 05:30:55.507: 
[crsd(142696)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/db2/crsd/crsd.log.
2019-12-28 05:30:55.517: 
[crsd(142696)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/db2/crsd/crsd.log.
2019-12-28 05:30:56.176: 
[ohasd(13034)]CRS-2765:Resource 'ora.crsd' has failed on server 'db2'.
2019-12-28 05:30:57.551: 
[crsd(142723)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/db2/crsd/crsd.log.
2019-12-28 05:30:57.560: 
[crsd(142723)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/db2/crsd/crsd.log.
2019-12-28 05:30:58.216: 
[ohasd(13034)]CRS-2765:Resource 'ora.crsd' has failed on server 'db2'.
2019-12-28 05:30:59.592: 
[crsd(142744)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/db2/crsd/crsd.log.
2019-12-28 05:30:59.602: 
[crsd(142744)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/db2/crsd/crsd.log.
2019-12-28 05:31:00.253: 
[ohasd(13034)]CRS-2765:Resource 'ora.crsd' has failed on server 'db2'.
2019-12-28 05:31:01.627: 
[crsd(142773)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/db2/crsd/crsd.log.
2019-12-28 05:31:01.636: 
[crsd(142773)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/db2/crsd/crsd.log.
2019-12-28 05:31:02.290: 
[ohasd(13034)]CRS-2765:Resource 'ora.crsd' has failed on server 'db2'.
2019-12-28 05:31:03.658: 
[crsd(142792)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/db2/crsd/crsd.log.
2019-12-28 05:31:03.668: 
[crsd(142792)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/db2/crsd/crsd.log.
2019-12-28 05:31:04.327: 
[ohasd(13034)]CRS-2765:Resource 'ora.crsd' has failed on server 'db2'.
2019-12-28 05:31:05.701: 
[crsd(142806)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/db2/crsd/crsd.log.
2019-12-28 05:31:05.711: 
[crsd(142806)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/db2/crsd/crsd.log.
2019-12-28 05:31:06.365: 
[ohasd(13034)]CRS-2765:Resource 'ora.crsd' has failed on server 'db2'.
2019-12-28 05:31:07.726: 
[crsd(142821)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/db2/crsd/crsd.log.
2019-12-28 05:31:07.735: 
[crsd(142821)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/db2/crsd/crsd.log.
2019-12-28 05:31:08.402: 
[ohasd(13034)]CRS-2765:Resource 'ora.crsd' has failed on server 'db2'.
2019-12-28 05:31:08.402: 
[ohasd(13034)]CRS-2771:Maximum restart attempts reached for resource 'ora.crsd'; will not restart.
2019-12-28 05:31:08.403: 
[ohasd(13034)]CRS-2769:Unable to failover resource 'ora.crsd'.

检查节点db2的oraagent_grid.log文件有如下报错:

2019-12-28 05:29:59.329: [    AGFW][3601811200]{2:6928:2} Agent received the message: AGENT_HB[Engine] ID 12293:273304
2019-12-28 05:30:17.162: [ora.LISTENER_SCAN2.lsnr][3592312576]{1:34166:403} [check] Utils:execCmd action = 3 flags = 38 ohome = (null) cmdname = lsnrctl. 
2019-12-28 05:30:17.267: [ora.LISTENER_SCAN2.lsnr][3592312576]{1:34166:403} [check] execCmd ret = 0
2019-12-28 05:30:17.267: [ora.LISTENER_SCAN2.lsnr][3592312576]{1:34166:403} [check] CrsCmd::ClscrsCmdData::stat entity 5 statflag 32 useFilter 1
2019-12-28 05:30:17.298: [ora.LISTENER_SCAN2.lsnr][3592312576]{1:34166:403} [check] ScanLsnrAgent::checkDependentVipResource: scanVipResource = ora.scan2.vip, statRet = 0
2019-12-28 05:30:17.881: [ora.LISTENER_SCAN3.lsnr][2950686464]{1:34166:403} [check] Utils:execCmd action = 3 flags = 38 ohome = (null) cmdname = lsnrctl. 
2019-12-28 05:30:17.986: [ora.LISTENER_SCAN3.lsnr][2950686464]{1:34166:403} [check] execCmd ret = 0
2019-12-28 05:30:17.987: [ora.LISTENER_SCAN3.lsnr][2950686464]{1:34166:403} [check] CrsCmd::ClscrsCmdData::stat entity 5 statflag 32 useFilter 1
2019-12-28 05:30:18.019: [ora.LISTENER_SCAN3.lsnr][2950686464]{1:34166:403} [check] ScanLsnrAgent::checkDependentVipResource: scanVipResource = ora.scan3.vip, statRet = 0
2019-12-28 05:30:27.292: [ora.asm][2950686464]{2:6928:2} [check] CrsCmd::ClscrsCmdData::stat entity 1 statflag 33 useFilter 0
2019-12-28 05:30:27.319: [ora.asm][2950686464]{2:6928:2} [check] AsmProxyAgent::check clsagfw_res_status 0
2019-12-28 05:30:34.522: [ora.ons][2950686464]{2:6928:2} [check] getOracleHomeAttrib: oracle_home = /u01/app/11.2.0/grid
2019-12-28 05:30:34.522: [ora.ons][2950686464]{2:6928:2} [check] Utils:execCmd action = 3 flags = 6 ohome = /u01/app/11.2.0/grid/opmn/ cmdname = onsctli. 
2019-12-28 05:30:34.627: [ora.ons][2950686464]{2:6928:2} [check] (:CLSN00010:)ons is running ...
2019-12-28 05:30:34.627: [ora.ons][2950686464]{2:6928:2} [check] (:CLSN00010:)
2019-12-28 05:30:34.628: [ora.ons][2950686464]{2:6928:2} [check] execCmd ret = 0
2019-12-28 05:30:37.858: [ USRTHRD][3575973632]{1:30748:9373} Processing the event CRS_RESOURCE_STATE_CHANGE
2019-12-28 05:30:38.652: [ora.LISTENER.lsnr][3594413824]{2:6928:2} [check] Utils:execCmd action = 3 flags = 38 ohome = (null) cmdname = lsnrctl. 
2019-12-28 05:30:38.757: [ora.LISTENER.lsnr][3594413824]{2:6928:2} [check] execCmd ret = 0
2019-12-28 05:30:48.017: [ CRSCOMM][3612317440] IpcC: IPC client connection 6c to member 0 has been removed
2019-12-28 05:30:48.017: [CLSFRAME][3612317440] Removing IPC Member:{Relative|Node:0|Process:0|Type:1}
2019-12-28 05:30:48.017: [CLSFRAME][3612317440] Disconnected from CRSD:db2 process: {Relative|Node:0|Process:0|Type:1}
2019-12-28 05:30:48.020: [   AGENT][3601811200]{0:1:10} {0:1:10} Created alert : (:CRSAGF00117:) :  Disconnected from server, Agent is shutting down.
2019-12-28 05:30:48.020: [    AGFW][3601811200]{0:1:10} Agent is exiting with exit code: 1

检查2个节点的/var/log/messages日志，发现2个节点均有多路径相关的错误信息，但相关的磁盘是用于备份的，不是用于生产数据库所使用的

Dec 30 05:25:31 db1 multipathd: backup2: sdcr - emc_clariion_checker: query command indicates error
Dec 30 05:25:31 db1 multipathd: checker failed path 69:240 in map backup2
Dec 30 05:25:31 db1 kernel: device-mapper: multipath: Failing path 69:240.
Dec 30 05:25:31 db1 multipathd: backup2: remaining active paths: 3
Dec 30 05:25:37 db1 multipathd: backup2: sdcr - emc_clariion_checker: Active path is healthy.
Dec 30 05:25:37 db1 multipathd: 69:240: reinstated
Dec 30 05:25:37 db1 multipathd: backup2: remaining active paths: 4
Dec 30 05:25:37 db1 kernel: sd 5:0:3:2: emc: ALUA failover mode detected
Dec 30 05:25:37 db1 kernel: sd 5:0:3:2: emc: at SP A Port 5 (owned, default SP A)
Dec 30 05:26:03 db1 kernel: qla2xxx [0000:05:00.1]-801c:5: Abort command issued nexus=5:3:4 --  1 2002.
Dec 30 06:03:35 db1 CLSD: The clock on host db1 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.
Dec 30 06:15:23 db1 multipathd: backup3: sdcq - emc_clariion_checker: Read error for WWN 600601608b203300d563752524c1e611.  Sense data are 0x0/0x0/0x0.
Dec 30 06:15:23 db1 kernel: qla2xxx [0000:05:00.1]-801c:5: Abort command issued nexus=5:3:1 --  1 2002.
Dec 30 06:15:23 db1 kernel: device-mapper: multipath: Failing path 69:224.
Dec 30 06:15:23 db1 multipathd: checker failed path 69:224 in map backup3
Dec 30 06:15:23 db1 multipathd: backup3: remaining active paths: 3
Dec 30 06:15:28 db1 multipathd: backup3: sdcq - emc_clariion_checker: Active path is healthy.
Dec 30 06:15:28 db1 multipathd: 69:224: reinstated
Dec 30 06:15:28 db1 multipathd: backup3: remaining active paths: 4
Dec 30 06:15:28 db1 kernel: sd 5:0:3:1: emc: ALUA failover mode detected
Dec 30 06:15:28 db1 kernel: sd 5:0:3:1: emc: at SP A Port 5 (owned, default SP A)
Dec 30 06:59:29 db1 CLSD: The clock on host db1 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.
Dec 30 07:53:22 db1 CLSD: The clock on host db1 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.
Dec 30 07:55:11 db1 multipathd: sdct: couldn't get asymmetric access state
Dec 30 07:55:11 db1 multipathd: backup4: load table [0 2147483648 multipath 2 queue_if_no_path retain_attached_hw_handler 1 emc 2 1 round-robin 0 2 1 70:16 1 66:240 1 round-robin 0 2 1 8:64 1 67:80 1]
Dec 30 07:55:11 db1 kernel: sd 5:0:3:4: emc: ALUA failover mode detected
Dec 30 07:55:11 db1 kernel: sd 5:0:3:4: emc: at SP A Port 5 (owned, default SP A)
Dec 30 07:55:11 db1 kernel: sd 4:0:3:4: emc: ALUA failover mode detected
Dec 30 07:55:11 db1 kernel: sd 4:0:3:4: emc: at SP A Port 4 (owned, default SP A)
Dec 30 07:55:35 db1 multipathd: backup2: sdcr - emc_clariion_checker: Read error for WWN 600601608b203300d663752524c1e611.  Sense data are 0x0/0x0/0x0.
Dec 30 07:55:35 db1 multipathd: checker failed path 69:240 in map backup2
Dec 30 07:55:35 db1 multipathd: backup2: remaining active paths: 3
Dec 30 07:55:35 db1 kernel: device-mapper: multipath: Failing path 69:240.
Dec 30 07:55:40 db1 multipathd: backup2: sdcr - emc_clariion_checker: Active path is healthy.
Dec 30 07:55:40 db1 multipathd: 69:240: reinstated
Dec 30 07:55:40 db1 multipathd: backup2: remaining active paths: 4
Dec 30 07:55:40 db1 kernel: sd 5:0:3:2: emc: ALUA failover mode detected
Dec 30 07:55:40 db1 kernel: sd 5:0:3:2: emc: at SP A Port 5 (owned, default SP A)
Dec 30 08:39:47 db1 CLSD: The clock on host db1 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.
Dec 30 08:43:36 db1 multipathd: mpathb: load table [0 20971520 multipath 2 queue_if_no_path retain_attached_hw_handler 1 emc 2 1 round-robin 0 2 1 69:208 1 66:176 1 round-robin 0 2 1 8:0 1 67:16 1]
Dec 30 08:43:36 db1 kernel: sd 5:0:3:0: emc: ALUA failover mode detected
Dec 30 08:43:36 db1 kernel: sd 5:0:3:0: emc: at SP A Port 5 (owned, default SP A)
Dec 30 08:43:36 db1 kernel: sd 4:0:3:0: emc: ALUA failover mode detected
Dec 30 08:43:36 db1 kernel: sd 4:0:3:0: emc: at SP A Port 4 (owned, default SP A)
Dec 30 09:24:04 db1 CLSD: The clock on host db1 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.
Dec 30 10:13:09 db1 CLSD: The clock on host db1 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.
Dec 30 11:06:07 db1 CLSD: The clock on host db1 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.
Dec 30 12:07:36 db1 CLSD: The clock on host db1 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.
Dec 30 13:08:58 db1 CLSD: The clock on host db1 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.
Dec 30 14:00:19 db1 CLSD: The clock on host db1 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.
Dec 30 14:52:20 db1 CLSD: The clock on host db1 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.
Dec 30 15:40:45 db1 CLSD: The clock on host db1 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.
Dec 30 16:34:38 db1 CLSD: The clock on host db1 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.
Dec 30 17:09:56 db1 auditd[15975]: Audit daemon rotating log files
Dec 30 17:38:16 db1 CLSD: The clock on host db1 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.
Dec 30 18:59:38 db1 CLSD: The clock on host db1 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.
Dec 30 19:54:43 db1 CLSD: The clock on host db1 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.

经查看集群相关日志可以确定，由于存储磁盘出现IO问题（或光线闪断、或IO延迟），导致集群CRS异常宕机。但是，比较奇怪的是，虽然CSR掉线了，ASM实例和DB实例却好好的，还可以正常使用。查询oracle support发现一篇文章1581864.1?提到ASM CRS仲裁盘访问超时与隐藏参数_asm_hbeatiowait有关系，而ASM的隐藏参数_asm_hbeatiowait由于操作系统多路径Multipath配置的polling_interval有关，具体的故障原因是操作系统盘的判断访问超时远大于数据库ASM仲裁盘访问超时，导致ORACLE RAC判定ASM中仲裁盘无法访问从而将仲裁盘强制Offline。解决的思路是：首先，确定操作系统polling_interval参数与数据库ASM隐藏参数值_asm_hbeatiowait，将_asm_hbeatiowait的值调整到比polling_interval值大即可。
下面是具体的解决操作:
1、查看数据库RAC ASM的_asm_hbeatiowait值（默认是15秒）：

SQL> col ksppinm for a20
SQL> col ksppstvl for a40
SQL> col ksppdesc for a80
SQL> SELECT   ksppinm, ksppstvl, ksppdesc
  2     FROM   x$ksppi x, x$ksppcv y
  3    WHERE   x.indx = y.indx AND  ksppinm = '_asm_hbeatiowait';

KSPPINM              KSPPSTVL                                 KSPPDESC
-------------------- ---------------------------------------- --------------------------------------------------------------------------------
_asm_hbeatiowait     15                                       number of secs to wait for PST Async Hbeat IO return

2、查看操作存储盘访问超时时间（Oracle Linux 6.7默认是30秒）

[root@db1 ~]# cat /sys/block/sdb/device/timeout 
30

[root@db2 ~]# cat /sys/block/sdb/device/timeout 
30

3、将_asm_hbeatiowait 的值调整为45秒（该参数是静态参数，需要重启集群）

SQL>  alter system set "_asm_hbeatiowait"=45 scope=spfile sid='*';

System altered.

4.重启集群

5.检查集群状态

[grid@db1 ~]$ crsctl stat res -t
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS       
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.ARCH.dg
               ONLINE  ONLINE       db1                                          
               ONLINE  ONLINE       db2                                          
ora.CWDATA.dg
               ONLINE  ONLINE       db1                                          
               ONLINE  ONLINE       db2                                          
ora.DADATA.dg
               ONLINE  ONLINE       db1                                          
               ONLINE  ONLINE       db2                                          
ora.DATA.dg
               ONLINE  ONLINE       db1                                          
               ONLINE  ONLINE       db2                                          
ora.LISTENER.lsnr
               ONLINE  ONLINE       db1                                          
               ONLINE  ONLINE       db2                                          
ora.OCR.dg
               ONLINE  ONLINE       db1                                          
               ONLINE  ONLINE       db2                                          
ora.SBKDATA.dg
               ONLINE  ONLINE       db1                                          
               ONLINE  ONLINE       db2                                          
ora.asm
               ONLINE  ONLINE       db1                      Started             
               ONLINE  ONLINE       db2                      Started             
ora.gsd
               OFFLINE OFFLINE      db1                                          
               OFFLINE OFFLINE      db2                                          
ora.net1.network
               ONLINE  ONLINE       db1                                          
               ONLINE  ONLINE       db2                                          
ora.ons
               ONLINE  ONLINE       db1                                          
               ONLINE  ONLINE       db2                                          
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       db2                                          
ora.LISTENER_SCAN2.lsnr
      1        ONLINE  ONLINE       db1                                          
ora.LISTENER_SCAN3.lsnr
      1        ONLINE  ONLINE       db1                                          
ora.caiwu.db
      1        ONLINE  ONLINE       db1                      Open                
      2        ONLINE  ONLINE       db2                      Open                
ora.chdyl.db
      1        ONLINE  ONLINE       db1                      Open                
      2        ONLINE  ONLINE       db2                      Open                
ora.cvu
      1        ONLINE  ONLINE       db1                                          
ora.dadb.db
      1        ONLINE  ONLINE       db1                      Open                
      2        ONLINE  ONLINE       db2                      Open                
ora.db1.vip
      1        ONLINE  ONLINE       db1                                          
ora.db2.vip
      1        ONLINE  ONLINE       db2                                          
ora.oc4j
      1        ONLINE  ONLINE       db1                                          
ora.rlzy.db
      1        ONLINE  ONLINE       db1                      Open                
      2        ONLINE  ONLINE       db2                      Open                
ora.scan1.vip
      1        ONLINE  ONLINE       db2                                          
ora.scan2.vip
      1        ONLINE  ONLINE       db1                                          
ora.scan3.vip
      1        ONLINE  ONLINE       db1

到此处理完成。

RMAN删除归档日志出现RMAN-0813错误的处理

Posted on 2020年1月4日 by YongJing(敬勇)

数据库备份日志报错：RMAN-08137: WARNING: archived log not deleted, needed for standby or upstream capture process，DB版本为11.2.0.4，根据备策略，每天备份归档，并且自动删除档，保留一个星期的归档，手功执如下命令：

delete noprompt archivelog until time 'sysdate-7'；

依然报错，信息如下：

archived log file name=+CWDATA/sjjhdg/archivelog/2019_12_24/thread_1_seq_2328.1241.1027887401 thread=1 sequence=2328
RMAN-08137: WARNING: archived log not deleted, needed for standby or upstream capture process
archived log file name=+CWDATA/sjjhdg/archivelog/2019_12_24/thread_1_seq_2329.1242.1027887719 thread=1 sequence=2329
RMAN-08137: WARNING: archived log not deleted, needed for standby or upstream capture process
archived log file name=+CWDATA/sjjhdg/archivelog/2019_12_24/thread_1_seq_2330.1243.1027887813 thread=1 sequence=2330
RMAN-08137: WARNING: archived log not deleted, needed for standby or upstream capture process
archived log file name=+CWDATA/sjjhdg/archivelog/2019_12_24/thread_1_seq_2331.1244.1027887873 thread=1 sequence=2331
RMAN-08137: WARNING: archived log not deleted, needed for standby or upstream capture process
archived log file name=+CWDATA/sjjhdg/archivelog/2019_12_25/thread_1_seq_2332.1245.1027902623 thread=1 sequence=2332
RMAN-08137: WARNING: archived log not deleted, needed for standby or upstream capture process
archived log file name=+CWDATA/sjjhdg/archivelog/2019_12_25/thread_1_seq_2333.1246.1027902719 thread=1 sequence=2333
RMAN-08137: WARNING: archived log not deleted, needed for standby or upstream capture process
archived log file name=+CWDATA/sjjhdg/archivelog/2019_12_25/thread_1_seq_2334.1247.1027908493 thread=1 sequence=2334
RMAN-08137: WARNING: archived log not deleted, needed for standby or upstream capture process
archived log file name=+CWDATA/sjjhdg/archivelog/2019_12_25/thread_1_seq_2335.1248.1027911721 thread=1 sequence=2335
RMAN-08137: WARNING: archived log not deleted, needed for standby or upstream capture process
archived log file name=+CWDATA/sjjhdg/archivelog/2019_12_26/thread_1_seq_2336.1249.1027989005 thread=1 sequence=2336
RMAN-08137: WARNING: archived log not deleted, needed for standby or upstream capture process
archived log file name=+CWDATA/sjjhdg/archivelog/2019_12_26/thread_1_seq_2337.1250.1027989067 thread=1 sequence=2337
RMAN-08137: WARNING: archived log not deleted, needed for standby or upstream capture process
archived log file name=+CWDATA/sjjhdg/archivelog/2019_12_26/thread_1_seq_2338.1251.1027990617 thread=1 sequence=2338
RMAN-08137: WARNING: archived log not deleted, needed for standby or upstream capture process
archived log file name=+CWDATA/sjjhdg/archivelog/2019_12_26/thread_1_seq_2339.1252.1028041713 thread=1 sequence=2339
RMAN-08137: WARNING: archived log not deleted, needed for standby or upstream capture process
archived log file name=+CWDATA/sjjhdg/archivelog/2019_12_27/thread_1_seq_2340.1253.1028075413 thread=1 sequence=2340
...............

在MOS搜索发现这是11g中的一个bug，描述的原因是在11g中，RMAN查看的是v$database中的MIN_REQUIRED_CAPTURE_CHANGE#，而不是DBA_CAPTURE。默认情况下，该值每6小时更新一次。因此，在默认情况下，v$database中的值将比当前值晚6个小时。
给出的解决方法是需要在delete 后面加上force。
手动执行delete noprompt force archivelog until time ‘sysdate-7’, 成功将归档删除。

deleted archived log
archived log file name=+CWDATA/sjjhdg/archivelog/2019_12_24/thread_1_seq_2330.1243.1027887813 RECID=18420 STAMP=1027887814
deleted archived log
archived log file name=+CWDATA/sjjhdg/archivelog/2019_12_24/thread_1_seq_2331.1244.1027887873 RECID=18421 STAMP=1027887875
deleted archived log
archived log file name=+CWDATA/sjjhdg/archivelog/2019_12_25/thread_1_seq_2332.1245.1027902623 RECID=18422 STAMP=1027902628
deleted archived log
archived log file name=+CWDATA/sjjhdg/archivelog/2019_12_25/thread_1_seq_2333.1246.1027902719 RECID=18423 STAMP=1027902721
deleted archived log
archived log file name=+CWDATA/sjjhdg/archivelog/2019_12_25/thread_1_seq_2334.1247.1027908493 RECID=18424 STAMP=1027908500
deleted archived log
archived log file name=+CWDATA/sjjhdg/archivelog/2019_12_25/thread_1_seq_2335.1248.1027911721 RECID=18425 STAMP=1027911722
deleted archived log
archived log file name=+CWDATA/sjjhdg/archivelog/2019_12_26/thread_1_seq_2336.1249.1027989005 RECID=18426 STAMP=1027989010
deleted archived log
archived log file name=+CWDATA/sjjhdg/archivelog/2019_12_26/thread_1_seq_2337.1250.1027989067 RECID=18427 STAMP=1027989069
deleted archived log
archived log file name=+CWDATA/sjjhdg/archivelog/2019_12_26/thread_1_seq_2338.1251.1027990617 RECID=18428 STAMP=1027990618
deleted archived log
archived log file name=+CWDATA/sjjhdg/archivelog/2019_12_26/thread_1_seq_2339.1252.1028041713 RECID=18429 STAMP=1028041715
...........
Deleted 800 objects

Oracle 11G RAC One Node’s Instance Arise ORA-01105 ORA-01606

Posted on 2019年12月12日 by YongJing(敬勇)

生产环境，Oracle Linux 6.4 RAC 11.2.4.0，今天出现了grid集群因为OCR磁盘组不能访问而不能通过命令来显示集群状态信息，在手动mount OCR磁盘组后，准备重启节点2时出现了ORA-01105,ORA-01606错误，具体信息如下:

SQL> startup
ORACLE instance started.

Total System Global Area 5.3447E+10 bytes
Fixed Size                  2265864 bytes
Variable Size            1.3019E+10 bytes
Database Buffers         4.0265E+10 bytes
Redo Buffers              160698368 bytes
ORA-01105: mount is incompatible with mounts by other instances
ORA-01606: parameter not identical to that of another mounted instance

查看错误详细说明，根据错误描述可知是由于2号实例与1号实例由于某些参数设置一样所导致的

[oracle@db2 dbs]$ oerr ora 1105
01105, 00000, "mount is incompatible with mounts by other instances"
// *Cause:  An attempt to mount the database discovered that another instance
//         mounted a database by the same name, but the mount is not
//         compatible. Additional errors are reported explaining why.
// *Action: See accompanying errors.
[oracle@db2 dbs]$ oerr ora 1606
01606, 00000, "parameter not identical to that of another mounted instance"
// *Cause:  A parameter was different on two instances.
// *Action: Modify the initialization parameter and restart.

使用spfile文件来创建文本格式的pfile文件

SQL> create pfile='rlcs.ora' from spfile;

File created.

[oracle@db2 dbs]$ cat rlcs.ora

*._serial_direct_read='AUTO'
*._swrf_mmon_flush=TRUE
*._use_adaptive_log_file_sync='FALSE'
*.audit_file_dest='/u01/app/oracle/admin/RL/adump'
*.audit_trail='NONE'
*.cluster_database=true
*.compatible='11.2.0.4.0'
*.control_files='+DATA/rl/controlfile/current.260.926786537','+ARCH/rl/controlfile/current.256.926786537'
*.db_block_size=8192
*.db_create_file_dest='+DATA'
*.db_domain=''
*.db_file_name_convert='+data/rl/','+data/rldg/'
*.db_name='RL'
*.db_recovery_file_dest='+ARCH'
*.db_recovery_file_dest_size=10737418240
*.diagnostic_dest='/u01/app/oracle'
*.dispatchers='(PROTOCOL=TCP) (SERVICE=RLZYXDB)'
*.fal_server='yb_st'
RL1.instance_number=1
RL2.instance_number=2
*.listener_networks=''
*.log_archive_config='dg_config=(rl,rldg)'
*.log_archive_dest_1='location=+ARCH  valid_for=(all_logfiles,all_roles) db_unique_name=rl'
*.log_archive_dest_2='service=yb_st valid_for=(online_logfiles,primary_role) db_unique_name=rldg'
*.log_archive_dest_state_1='ENABLE'
*.log_archive_dest_state_2='ENABLE'
*.log_archive_format='%t_%s_%r.arch'
*.log_file_name_convert='+data/rl/','+data/rldg/','+arch/rl/','+arch/rldg/'
*.open_cursors=300
*.parallel_adaptive_multi_user=TRUE
*.parallel_degree_policy='AUTO'
*.parallel_force_local=FALSE
*.pga_aggregate_target=21474836480
*.processes=2000
*.remote_listener='scan-ip:1521'
*.remote_login_passwordfile='exclusive'
*.service_names='rl'
*.sessions=2205
*.sga_max_size=53687091200
*.sga_target=53687091200
*.standby_file_management='manual'
RLZY2.thread=2
RLZY1.thread=1
*.undo_retention=7200
RLZY2.undo_tablespace='UNDOTBS2'
RLZY1.undo_tablespace='UNDOTBS1'
*.utl_file_dir='/rmanbak/utl'

从文本参数文件看不出来有什么参数是两个实例不一致的。

通过执行下面的语句来查看2号实例与gc相关的参数

SQL> set linesize 333
SQL> col name for a35
SQL> col description for a66
SQL> col value for a30
SQL> SELECT   i.ksppinm name,  
  2     i.ksppdesc description,  
  3     CV.ksppstvl VALUE
  4  FROM   sys.x$ksppi i, sys.x$ksppcv CV  
  5     WHERE   i.inst_id = USERENV ('Instance')  
  6     AND CV.inst_id = USERENV ('Instance')  
  7     AND i.indx = CV.indx  
  8     AND i.ksppinm LIKE '/_gc%' ESCAPE '/'  
  9  ORDER BY   REPLACE (i.ksppinm, '_', '');  

NAME                                DESCRIPTION                                                        VALUE
----------------------------------- ------------------------------------------------------------------ ------------------------------
_gc_affinity_locking                if TRUE, enable object affinity                                    TRUE
_gc_affinity_locks                  if TRUE, get affinity locks                                        TRUE
_gc_affinity_ratio                  dynamic object affinity ratio                                      50
_gc_async_memcpy                    if TRUE, use async memcpy                                          FALSE
_gc_bypass_readers                  if TRUE, modifications bypass readers                              TRUE
_gc_check_bscn                      if TRUE, check for stale blocks                                    TRUE
_gc_coalesce_recovery_reads         if TRUE, coalesce recovery reads                                   TRUE
_gc_cpu_time                        if TRUE, record the gc cpu time                                    FALSE
_gc_cr_server_read_wait             if TRUE, cr server waits for a read to complete                    TRUE
_gc_defer_ping_index_only           if TRUE, restrict deferred ping to index blocks only               TRUE
_gc_defer_time                      how long to defer pings for hot buffers in milliseconds            0

NAME                                DESCRIPTION                                                        VALUE
----------------------------------- ------------------------------------------------------------------ ------------------------------
_gc_delta_push_compression          if delta >= K bytes, compress before push                          3072
_gc_delta_push_max_level            max delta level for delta push                                     100
_gc_delta_push_objects              objects which use delta push                                       0
_gc_disable_s_lock_brr_ping_check   if TRUE, disable S lock BRR ping check for lost write protect      TRUE
_gc_down_convert_after_keep         if TRUE, down-convert lock after recovery                          TRUE
_gc_element_percent                 global cache element percent                                       110
_gc_escalate_bid                    if TRUE, escalates create a bid                                    TRUE
_gc_fg_merge                        if TRUE, merge pi buffers in the foreground                        TRUE
_gc_flush_during_affinity           if TRUE, flush during affinity                                     TRUE
_gc_fusion_compression              compress fusion blocks if there is free space                      1024
_gc_global_checkpoint_scn           if TRUE, enable global checkpoint scn                              TRUE

NAME                                DESCRIPTION                                                        VALUE
----------------------------------- ------------------------------------------------------------------ ------------------------------
_gc_global_cpu                      global cpu checks                                                  TRUE
_gc_global_lru                      turn global lru off, make it automatic, or turn it on              AUTO
_gc_global_lru_touch_count          global lru touch count                                             5
_gc_global_lru_touch_time           global lru touch time in seconds                                   60
_gc_integrity_checks                set the integrity check level                                      1
_gc_keep_recovery_buffers           if TRUE, make single instance crash recovery buffers current       TRUE
_gc_latches                         number of latches per LMS process                                  8
_gc_log_flush                       if TRUE, flush redo log before a current block transfer            TRUE
_gc_long_query_threshold            threshold for long running query                                   0
_gc_max_downcvt                     maximum downconverts to process at one time                        256
_gc_maximum_bids                    maximum number of bids which can be prepared                       0

NAME                                DESCRIPTION                                                        VALUE
----------------------------------- ------------------------------------------------------------------ ------------------------------
_gc_no_fairness_for_clones          if TRUE, no fairness if we serve a clone                           TRUE
_gc_object_queue_max_length         maximum length for an object queue                                 0
_gc_override_force_cr               if TRUE, try to override force-cr requests                         TRUE
_gc_persistent_read_mostly          if TRUE, enable persistent read-mostly locking                     TRUE
_gc_policy_minimum                  dynamic object policy minimum activity per minute                  1500
_gc_policy_time                     how often to make object policy decisions in minutes               10
_gc_read_mostly_flush_check         if TRUE, optimize flushes for read mostly objects                  FALSE
_gc_read_mostly_locking             if TRUE, enable read-mostly locking                                FALSE
_gcr_enable_high_cpu_kill           if TRUE, GCR may kill foregrounds under high load                  FALSE
_gcr_enable_high_cpu_rm             if TRUE, GCR may enable a RM plan under high load                  FALSE
_gcr_enable_high_cpu_rt             if TRUE, GCR may boost bg priority under high load                 FALSE

NAME                                DESCRIPTION                                                        VALUE
----------------------------------- ------------------------------------------------------------------ ------------------------------
_gcr_high_cpu_threshold             minimum amount of CPU process must consume to be kill target       10
_gcr_use_css                        if FALSE, GCR wont register with CSS nor use any CSS feature       TRUE
_gc_sanity_check_cr_buffers         if TRUE, sanity check CR buffers                                   FALSE
_gcs_disable_remote_handles         disable remote client/shadow handles                               FALSE
_gcs_disable_skip_close_remastering if TRUE, disable skip close optimization in remastering            FALSE
_gc_serve_high_pi_as_current        if TRUE, use a higher clone scn when serving a pi                  TRUE
_gcs_fast_reconfig                  if TRUE, enable fast reconfiguration for gcs locks                 TRUE
_gcs_latches                        number of gcs resource hash latches to be allocated per LMS proces 64
                                    s

_gcs_pkey_history                   number of pkey remastering history                                 4000

NAME                                DESCRIPTION                                                        VALUE
----------------------------------- ------------------------------------------------------------------ ------------------------------
_gcs_process_in_recovery            if TRUE, process gcs requests during instance recovery             TRUE
_gcs_resources                      number of gcs resources to be allocated
_gcs_res_per_bucket                 number of gcs resource per hash bucket                             4
_gcs_shadow_locks                   number of pcm shadow locks to be allocated
_gc_statistics                      if TRUE, kcl statistics are maintained                             TRUE
_gcs_testing                        GCS testing parameter                                              0
_gc_transfer_ratio                  dynamic object read-mostly transfer ratio                          2
_gc_undo_affinity                   if TRUE, enable dynamic undo affinity                              TRUE
_gc_undo_block_disk_reads           if TRUE, enable undo block disk reads                              TRUE
_gc_use_cr                          if TRUE, allow CR pins on PI and WRITING buffers                   TRUE
_gc_vector_read                     if TRUE, vector read current buffers                               TRUE

64 rows selected.

查看1号实例与gc相关的参数

SQL> set linesize 333
SQL> col name for a35
SQL> col description for a66
SQL> col value for a30
SQL> SELECT   i.ksppinm name,  
  2     i.ksppdesc description,  
  3     CV.ksppstvl VALUE
  4  FROM   sys.x$ksppi i, sys.x$ksppcv CV  
  5     WHERE   i.inst_id = USERENV ('Instance')  
  6     AND CV.inst_id = USERENV ('Instance')  
  7     AND i.indx = CV.indx  
  8     AND i.ksppinm LIKE '/_gc%' ESCAPE '/'  
  9  ORDER BY   REPLACE (i.ksppinm, '_', '');  

NAME                                DESCRIPTION                                                        VALUE
----------------------------------- ------------------------------------------------------------------ ------------------------------
_gc_affinity_locking                if TRUE, enable object affinity                                    TRUE
_gc_affinity_locks                  if TRUE, get affinity locks                                        TRUE
_gc_affinity_ratio                  dynamic object affinity ratio                                      50
_gc_async_memcpy                    if TRUE, use async memcpy                                          FALSE
_gc_bypass_readers                  if TRUE, modifications bypass readers                              TRUE
_gc_check_bscn                      if TRUE, check for stale blocks                                    TRUE
_gc_coalesce_recovery_reads         if TRUE, coalesce recovery reads                                   TRUE
_gc_cpu_time                        if TRUE, record the gc cpu time                                    FALSE
_gc_cr_server_read_wait             if TRUE, cr server waits for a read to complete                    TRUE
_gc_defer_ping_index_only           if TRUE, restrict deferred ping to index blocks only               TRUE
_gc_defer_time                      how long to defer pings for hot buffers in milliseconds            0

NAME                                DESCRIPTION                                                        VALUE
----------------------------------- ------------------------------------------------------------------ ------------------------------
_gc_delta_push_compression          if delta >= K bytes, compress before push                          3072
_gc_delta_push_max_level            max delta level for delta push                                     100
_gc_delta_push_objects              objects which use delta push                                       0
_gc_disable_s_lock_brr_ping_check   if TRUE, disable S lock BRR ping check for lost write protect      TRUE
_gc_down_convert_after_keep         if TRUE, down-convert lock after recovery                          TRUE
_gc_element_percent                 global cache element percent                                       110
_gc_escalate_bid                    if TRUE, escalates create a bid                                    TRUE
_gc_fg_merge                        if TRUE, merge pi buffers in the foreground                        TRUE
_gc_flush_during_affinity           if TRUE, flush during affinity                                     TRUE
_gc_fusion_compression              compress fusion blocks if there is free space                      1024
_gc_global_checkpoint_scn           if TRUE, enable global checkpoint scn                              TRUE

NAME                                DESCRIPTION                                                        VALUE
----------------------------------- ------------------------------------------------------------------ ------------------------------
_gc_global_cpu                      global cpu checks                                                  TRUE
_gc_global_lru                      turn global lru off, make it automatic, or turn it on              AUTO
_gc_global_lru_touch_count          global lru touch count                                             5
_gc_global_lru_touch_time           global lru touch time in seconds                                   60
_gc_integrity_checks                set the integrity check level                                      1
_gc_keep_recovery_buffers           if TRUE, make single instance crash recovery buffers current       TRUE
_gc_latches                         number of latches per LMS process                                  8
_gc_log_flush                       if TRUE, flush redo log before a current block transfer            TRUE
_gc_long_query_threshold            threshold for long running query                                   0
_gc_max_downcvt                     maximum downconverts to process at one time                        256
_gc_maximum_bids                    maximum number of bids which can be prepared                       0

NAME                                DESCRIPTION                                                        VALUE
----------------------------------- ------------------------------------------------------------------ ------------------------------
_gc_no_fairness_for_clones          if TRUE, no fairness if we serve a clone                           TRUE
_gc_object_queue_max_length         maximum length for an object queue                                 0
_gc_override_force_cr               if TRUE, try to override force-cr requests                         TRUE
_gc_persistent_read_mostly          if TRUE, enable persistent read-mostly locking                     TRUE
_gc_policy_minimum                  dynamic object policy minimum activity per minute                  1500
_gc_policy_time                     how often to make object policy decisions in minutes               10
_gc_read_mostly_flush_check         if TRUE, optimize flushes for read mostly objects                  FALSE
_gc_read_mostly_locking             if TRUE, enable read-mostly locking                                TRUE
_gcr_enable_high_cpu_kill           if TRUE, GCR may kill foregrounds under high load                  FALSE
_gcr_enable_high_cpu_rm             if TRUE, GCR may enable a RM plan under high load                  FALSE
_gcr_enable_high_cpu_rt             if TRUE, GCR may boost bg priority under high load                 FALSE

NAME                                DESCRIPTION                                                        VALUE
----------------------------------- ------------------------------------------------------------------ ------------------------------
_gcr_high_cpu_threshold             minimum amount of CPU process must consume to be kill target       10
_gcr_use_css                        if FALSE, GCR wont register with CSS nor use any CSS feature       TRUE
_gc_sanity_check_cr_buffers         if TRUE, sanity check CR buffers                                   FALSE
_gcs_disable_remote_handles         disable remote client/shadow handles                               FALSE
_gcs_disable_skip_close_remastering if TRUE, disable skip close optimization in remastering            FALSE
_gc_serve_high_pi_as_current        if TRUE, use a higher clone scn when serving a pi                  TRUE
_gcs_fast_reconfig                  if TRUE, enable fast reconfiguration for gcs locks                 TRUE
_gcs_latches                        number of gcs resource hash latches to be allocated per LMS proces 64
                                    s

_gcs_pkey_history                   number of pkey remastering history                                 4000

NAME                                DESCRIPTION                                                        VALUE
----------------------------------- ------------------------------------------------------------------ ------------------------------
_gcs_process_in_recovery            if TRUE, process gcs requests during instance recovery             TRUE
_gcs_resources                      number of gcs resources to be allocated
_gcs_res_per_bucket                 number of gcs resource per hash bucket                             4
_gcs_shadow_locks                   number of pcm shadow locks to be allocated
_gc_statistics                      if TRUE, kcl statistics are maintained                             TRUE
_gcs_testing                        GCS testing parameter                                              0
_gc_transfer_ratio                  dynamic object read-mostly transfer ratio                          2
_gc_undo_affinity                   if TRUE, enable dynamic undo affinity                              TRUE
_gc_undo_block_disk_reads           if TRUE, enable undo block disk reads                              TRUE
_gc_use_cr                          if TRUE, allow CR pins on PI and WRITING buffers                   TRUE
_gc_vector_read                     if TRUE, vector read current buffers                               TRUE

64 rows selected.

通过对比发现_gc_read_mostly_locking参数在1号实例中为true，2号实例为false

将所有实例中的_gc_read_mostly_locking参数设置为true

SQL> alter system set "_gc_read_mostly_locking"=true scope=spfile sid='*';

System altered.

重启2号实例成功

SQL> shutdown immediate
ORA-01507: database not mounted


ORACLE instance shut down.
SQL> startup
ORACLE instance started.

Total System Global Area 5.3447E+10 bytes
Fixed Size                  2265864 bytes
Variable Size            1.3019E+10 bytes
Database Buffers         4.0265E+10 bytes
Redo Buffers              160698368 bytes
Database mounted.
Database opened.

原因是因为之前有人修改过_gc_read_mostly_locking隐藏参数，只是只修改了1号实例。

ORA-00600 qosdExpStatRead expcnt mismatch

Posted on 2019年10月21日 by YongJing(敬勇)

Oracle 12.2.0.1最近出现了ORA–0600错误，其详细信息如下:

ORA-00600: 内部错误代码, 参数: [qosdExpStatRead: expcnt mismatch], [], [], [], [], [], [], [], [], [], [], []
ORA-06512: 在 "SYS.DBMS_STATS", line 1976
ORA-06512: 在 "SYS.DBMS_STATS", line 46935
ORA-06512: 在 "SYS.DBMS_STATS", line 47168
ORA-00600: 内部错误代码, 参数: [qosdExpStatRead: expcnt mismatch], [], [], [], [], [], [], [], [], [], [], []
ORA-06512: 在 "SYS.DBMS_STATS", line 1976
ORA-06512: 在 "SYS.DBMS_STATS", line 46935
ORA-06512: 在 "SYS.DBMS_STATS", line 47168

根据Oracle官方给出的检查SQL语句，其执行结果如下:

引起访问题的原因是由于sys.exp_obj$.exp_cnt与sys.exp_stat$的行数据不匹配所造成的

解决方案
1.备份数据库涉及到修改数据字典

2. 修复问题

SQL>update sys.exp_obj$ a set exp_cnt=(select count(*) from sys.exp_stat$ b where
a.objn=b.objn and a.snapshot_id=b.snapshot_id ) where a.objn=124789;

SQL>commit;

3. 修改后，执行如下SQL确认是否还有记录，如果没有记录，说明已经修改成功，提交。

SQL>With b as (
select count(*) cnt,objn,snapshot_id from sys.exp_stat$ es group by objn,snapshot_id)
select * from sys.exp_obj$ a, b where a.objn=b.objn and a.snapshot_id=b.snapshot_id
and a.EXP_CNT<>b.CNT;

修复后，通过一天的观察没有出现该问题了。

oracle rac 单个实例不能生成awr报告的问题

Posted on 2019年8月19日 by YongJing(敬勇)

同事对rac集群生成性能报告时发现rac集群有一个实例没有生成awr快照，另一个实例快照正常。下面是具体处理步骤。
1号实例没有生成awr快照

SQL> select SNAP_ID,END_INTERVAL_TIME,instance_number from dba_hist_snapshot where instance_number=1;

no rows selected

2号实快照正常

SQL> set long 200   
SQL> set linesize 200
SQL> select * from ( select SNAP_ID,END_INTERVAL_TIME,instance_number from dba_hist_snapshot where instance_number=2 order by SNAP_ID desc) where rownum < =10;

   SNAP_ID END_INTERVAL_TIME                                                           INSTANCE_NUMBER
---------- --------------------------------------------------------------------------- ---------------
     24405 17-AUG-19 07.00.47.595 PM                                                                 2
     24404 17-AUG-19 06.00.42.150 PM                                                                 2
     24403 17-AUG-19 05.00.37.041 PM                                                                 2
     24402 17-AUG-19 04.00.31.774 PM                                                                 2
     24401 17-AUG-19 03.00.26.414 PM                                                                 2
     24400 17-AUG-19 02.00.21.176 PM                                                                 2
     24399 17-AUG-19 01.00.16.316 PM                                                                 2
     24398 17-AUG-19 12.00.10.997 PM                                                                 2
     24397 17-AUG-19 11.00.05.446 AM                                                                 2
     24396 17-AUG-19 10.00.59.801 AM                                                                 2

10 rows selected.

mmon进程与awr快照相关，mmnl与ash相关，如是查看两个实例的mmon与mmnl进程
2号实例

[root@db2 ~]# ps -ef | grep mmon
root     128329 127956  0 18:11 pts/2    00:00:00 grep mmon
oracle   201527      1  0  2018 ?        17:17:11 ora_mmon_RLZY2
[root@db2 ~]# ps -ef | grep mmnl
root     131772 127956  0 18:17 pts/2    00:00:00 grep mmnl
oracle   201531      1  0  2018 ?        1-06:06:24 ora_mmnl_RLZY2

1号实例

[root@db1 ~]# ps -ef | grep mmon
root     239020 238963  0 18:52 pts/2    00:00:00 grep mmon
[root@db1 ~]# ps -ef | grep mmnl
root     239052 238963  0 18:52 pts/2    00:00:00 grep mmnl

可以看到1号实例没有mmon与mmnl进程了。

如是查看1号实例的mmon进程的跟踪文件

[root@db1 trace]# ls -lrt *mmon*.trc
-rw-r----- 1 oracle asmadmin 1351052 Jan 19  2018 RLZY1_mmon_20073.trc
-rw-r----- 1 oracle asmadmin  173031 Jan 22  2018 RLZY1_mmon_49119.trc
[root@db1 trace]# more RLZY1_mmon_49119.trc
Trace file /u01/app/oracle/diag/rdbms/rlzy/RLZY1/trace/RLZY1_mmon_49119.trc
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
ORACLE_HOME = /u01/app/oracle/product/11.2.0/db_1
System name:    Linux
Node name:      db1
Release:        3.8.13-68.3.4.el6uek.x86_64
Version:        #2 SMP Tue Jul 14 15:03:36 PDT 2015
Machine:        x86_64
Instance name: RLZY1
Redo thread mounted by this instance: 1
Oracle process number: 36
Unix process pid: 49119, image: oracle@db1 (MMON)


*** 2018-01-19 13:55:20.030
*** SESSION ID:(1369.1) 2018-01-19 13:55:20.030
*** CLIENT ID:() 2018-01-19 13:55:20.030
*** SERVICE NAME:() 2018-01-19 13:55:20.030
*** MODULE NAME:() 2018-01-19 13:55:20.030
*** ACTION NAME:() 2018-01-19 13:55:20.030
 
minact-scn slave-status: grec-scn:0x0000.00000000 gmin-scn:0x0000.00000000 gcalc-scn:0x0000.00000000

*** 2018-01-19 14:00:20.643
minact-scn master-status: grec-scn:0x0e0e.55ad96ef gmin-scn:0x0e0e.55abf256 gcalc-scn:0x0e0e.55ac2a0a

..........
KEBM: MMON action policy violation. 'Block Cleanout Optim, Undo Segment Scan' viol=1; err=12751
minact-scn master-status: grec-scn:0x0e0e.5f0ebf8c gmin-scn:0x0e0e.5f0eac2e gcalc-scn:0x0e0e.5f0ead91
DDE rules only execution for: ORA 12751

*** 2018-01-22 07:06:04.060
----- START Event Driven Actions Dump ----
---- END Event Driven Actions Dump ----
----- START DDE Actions Dump -----
Executing SYNC actions
Executing ASYNC actions
----- START DDE Action: 'ORA_12751_DUMP' (Sync) -----
Runtime exceeded 300 seconds
Time limit violation detected at:
ksedsts()+465< -kspol_12751_dump()+145<-dbgdaExecuteAction()+1065<-dbgerRunAction()+109<-dbgerRunActions()+4134<-dbgexPhaseII()+1873<-dbgexProcessError()+2680<-dbgeExecuteForError()+88<-dbgePostErrorKGE()+2136<-dbkePostKGE_kgsf()+71<-kge
selv()+276<-ksesecl0()+162
<-ksucin()+147<-kcbzwb()+2727<-kcbgtcr()+31325<-ktucloUsMinScn()+539<-ktucloUsegScan()+992<-ksb_run_managed_action()+384<-ksbcti()+2490<-ksbabs()+1735<-kebm_mmon_main()+209<-ksbrdp()+1045<-opirip()+623<-opidrv()+603<-sou2o()+103<-opimai
_real()+250<-ssthrdmain()+265
<-main()+201<-__libc_start_main()+253Current Wait Stack:
 0: waiting for 'gc buffer busy acquire'
    file#=0x5, block#=0x278, class#=0x49
    wait_id=255378 seq_num=59358 snap_id=1
    wait times: snap=5 min 5 sec, exc=5 min 5 sec, total=5 min 5 sec
    wait times: max=infinite, heur=5 min 5 sec
    wait counts: calls=358 os=358
    in_wait=1 iflags=0x15a2
There is at least one session blocking this session.
  Dumping 1 direct blocker(s):
    inst: 1, sid: 990, ser: 1
  Dumping final blocker:
    inst: 1, sid: 990, ser: 1
Wait State:
  fixed_waits=0 flags=0x22 boundary=(nil)/-1
Session Wait History:
    elapsed time of 0.000061 sec since current wait
 0: waited for 'gc cr block 2-way'
    =0x5, =0x258, =0x47
    wait_id=255377 seq_num=59357 snap_id=1
    wait times: snap=0.000478 sec, exc=0.000478 sec, total=0.000478 sec
    wait times: max=infinite
    wait counts: calls=1 os=1
    occurred after 0.000122 sec of elapsed time
 1: waited for 'gc cr block 2-way'
    =0x5, =0x228, =0x45
    wait_id=255376 seq_num=59356 snap_id=1
    wait times: snap=0.000741 sec, exc=0.000741 sec, total=0.000741 sec
    wait times: max=infinite
    wait counts: calls=1 os=1
    occurred after 0.000120 sec of elapsed time
 2: waited for 'gc cr block 2-way'
    =0x5, =0x138, =0x43
    wait_id=255375 seq_num=59355 snap_id=1
    wait times: snap=0.000528 sec, exc=0.000528 sec, total=0.000528 sec
    wait times: max=infinite
    wait counts: calls=1 os=1
    occurred after 0.000111 sec of elapsed time
 3: waited for 'gc cr block 2-way'
    =0x5, =0xb8, =0x41
    wait_id=255374 seq_num=59354 snap_id=1
    wait times: snap=0.000583 sec, exc=0.000583 sec, total=0.000583 sec
    wait times: max=infinite
    wait counts: calls=1 os=1
    occurred after 0.000139 sec of elapsed time
 4: waited for 'gc cr block 2-way'
    =0x5, =0x110, =0x37
    wait_id=255373 seq_num=59353 snap_id=1
    wait times: snap=0.000541 sec, exc=0.000541 sec, total=0.000541 sec
    wait times: max=infinite
    wait counts: calls=1 os=1
    occurred after 0.000110 sec of elapsed time
 5: waited for 'gc cr block 2-way'
    =0x5, =0x100, =0x35
    wait_id=255372 seq_num=59352 snap_id=1
    wait times: snap=0.000629 sec, exc=0.000629 sec, total=0.000629 sec
    wait times: max=infinite
    wait counts: calls=1 os=1
    occurred after 0.000158 sec of elapsed time
 6: waited for 'gc cr block 2-way'
    =0x5, =0xf0, =0x33
    wait_id=255371 seq_num=59351 snap_id=1
    wait times: snap=0.000617 sec, exc=0.000617 sec, total=0.000617 sec
    wait times: max=infinite
    wait counts: calls=1 os=1
    occurred after 0.000128 sec of elapsed time
 7: waited for 'gc cr block 2-way'
    =0x5, =0xe0, =0x31
    wait_id=255370 seq_num=59350 snap_id=1
    wait times: snap=0.000561 sec, exc=0.000561 sec, total=0.000561 sec
    wait times: max=infinite
    wait counts: calls=1 os=1
    occurred after 0.000124 sec of elapsed time
 8: waited for 'gc cr block 2-way'
    =0x5, =0xd0, =0x2f
    wait_id=255369 seq_num=59349 snap_id=1
    wait times: snap=0.000565 sec, exc=0.000565 sec, total=0.000565 sec
    wait times: max=infinite
    wait counts: calls=1 os=1
    occurred after 0.000128 sec of elapsed time
 9: waited for 'gc cr block 2-way'
    =0x5, =0xc0, =0x2d
    wait_id=255368 seq_num=59348 snap_id=1
    wait times: snap=0.000555 sec, exc=0.000555 sec, total=0.000555 sec
    wait times: max=infinite
    wait counts: calls=1 os=1
    occurred after 0.000125 sec of elapsed time
Sampled Session History of session 1369 serial 1
---------------------------------------------------
The sampled session history is constructed by sampling
the target session every 1 second. The sampling process
captures at each sample if the session is in a non-idle wait,
an idle wait, or not in a wait. If the session is in a
non-idle wait then one interval is shown for all the samples
the session was in the same non-idle wait. If the
session is in an idle wait or not in a wait for
consecutive samples then one interval is shown for all
the consecutive samples. Though we display these consecutive
samples  in a single interval the session may NOT be continuously
idle or not in a wait (the sampling process does not know).
 
The history is displayed in reverse chronological order.
 
sample interval: 1 sec, max history 120 sec
---------------------------------------------------
  [121 samples,                                            07:04:03 - 07:06:03]
    waited for 'gc buffer busy acquire', seq_num: 59358
      p1: 'file#'=0x5
      p2: 'block#'=0x278
      p3: 'class#'=0x49
      time_waited: >= 120 sec (still in wait)
---------------------------------------------------
Sampled Session History Summary:
  longest_non_idle_wait: 'gc buffer busy acquire'
  [121 samples, 07:04:03 - 07:06:03]
      time_waited: >= 120 sec (still in wait)
---------------------------------------------------
----- END DDE Action: 'ORA_12751_DUMP' (SUCCESS, 1 csec) -----
----- END DDE Actions Dump (total 1 csec) -----
KEBM: MMON action policy violation. 'Block Cleanout Optim, Undo Segment Scan' viol=1; err=12751
minact-scn master-status: grec-scn:0x0e0e.5f0ec4ce gmin-scn:0x0e0e.5f0eac2e gcalc-scn:0x0e0e.5f0ead91

*** 2018-01-22 07:11:11.071
DDE rules only execution for: ORA 12751
----- START Event Driven Actions Dump ----
---- END Event Driven Actions Dump ----

ORA12751的错误原因是陈旧的SYS对象统计数据会导致生成次优执行计划，从而使AWR自动刷新从操作的语句运行更长时间和超时。

解决方法就是收集新的SYS对象统计信息，为优化器提供更好的统计信息，并生成更高效的执行计划

SQL> EXEC DBMS_STATS.GATHER_FIXED_OBJECTS_STATS;

PL/SQL procedure successfully completed.

SQL> EXEC DBMS_STATS.GATHER_SCHEMA_STATS ('SYS');

PL/SQL procedure successfully completed.

下面就是重启mmon和mmnl进程

SQL> alter system enable restricted session;

System altered.

SQL> alter system disable restricted session;

System altered.

查看alert日志可以看到mmon和mmnl进程已经重启了

Sat Aug 17 19:18:22 2019
Starting background process MMON
Sat Aug 17 19:18:22 2019
Starting background process MMNL
MMON started with pid=399, OS id=10373 
Sat Aug 17 19:18:22 2019
MMNL started with pid=405, OS id=10375 
ALTER SYSTEM enable restricted session;
Sat Aug 17 19:18:25 2019
Some DDE async actions failed or were cancelled
Sat Aug 17 19:18:25 2019
Sweep [inc][48021]: completed
Sweep [inc][48011]: completed
Sweep [inc][48002]: completed
Sweep [inc][35010]: completed
Sweep [inc][34706]: completed
Sweep [inc][34242]: completed
Sweep [inc][33546]: completed
Sweep [inc][33394]: completed
Sweep [inc2][48021]: completed
Sweep [inc2][48011]: completed
Sweep [inc2][48002]: completed
Sweep [inc2][35010]: completed
Sweep [inc2][34706]: completed
Sweep [inc2][34242]: completed
Sweep [inc2][33546]: completed
Sweep [inc2][33394]: completed
minact-scn: Inst 1 is a slave inc#:30 mmon proc-id:10373 status:0x2
minact-scn status: grec-scn:0x0e0e.61cb2e9c gmin-scn:0x0e0e.5f0eac2e gcalc-scn:0x0e0e.5f0ead91
Sat Aug 17 19:18:29 2019
db_recovery_file_dest_size of 10240 MB is 20.87% used. This is a
user-specified limit on the amount of space that will be used by this
database for recovery-related files, and does not reflect the amount of
space available in the underlying filesystem or ASM diskgroup.
Sat Aug 17 19:18:42 2019
ALTER SYSTEM disable restricted session;

再查看1号实例的mmon与mmnl进程状态

[root@db1 ~]# ps -ef | grep mmnl
oracle    10375      1  0 19:18 ?        00:00:00 ora_mmnl_RLZY1
root      10611 238963  0 19:18 pts/2    00:00:00 grep mmnl

[root@db1 ~]# ps -ef | grep mmon
oracle    10373      1  7 19:18 ?        00:00:02 ora_mmon_RLZY1
root      10630 238963  0 19:18 pts/2    00:00:00 grep mmon

过了两个小时去查看1号实例已经生成了两条快照信息

SQL> set long 200
SQL> set linesize 200
SQL> select * from ( select SNAP_ID,END_INTERVAL_TIME,instance_number from dba_hist_snapshot where instance_number=1 order by SNAP_ID desc) where rownum < =10;

   SNAP_ID END_INTERVAL_TIME                                                           INSTANCE_NUMBER
---------- --------------------------------------------------------------------------- ---------------
     24407 17-AUG-19 09.00.58.595 PM                                                                 1
     24406 17-AUG-19 08.00.40.244 PM                                                                 1

到此问题解决了。

当用户无限制使用表空间配额且表空间有足够空间时在执行DML操作时出现超出表空间的空间限额

Posted on 2019年6月3日 by YongJing(敬勇)

朋友生产数据库在向特定的一张表插入数据时报超出表空间data的空间限额如是执行下查看用户所用的表空间配额信息，查看用户所使用表空间的配额发现没有限制,因为max_bytes为 -1

SQL>  select * from dba_ts_quotas where username='data';

TABLESPACE_NAME                USERNAME                            BYTES  MAX_BYTES     BLOCKS MAX_BLOCKS DROPPED
------------------------------ ------------------------------ ---------- ---------- ---------- ---------- -------
data                           data                           8825732464         -1  107735992         -1 NO

 1 rows selected

SQL>
SQL> select
  2    fs.tablespace_name            "Tablespace",
  3    (df.totalspace-fs.freespace)  "Used MB",
  4    fs.freespace                  "Free MB",
  5    df.totalspace                 "Total MB",
  6    round(100*(fs.freespace/df.totalspace)) "Pct. Free"
  7  from
  8    (select tablespace_name,round(sum(bytes)/1024/1024) TotalSpace
  9    from dba_data_files group by  tablespace_name) df,
 10    (select tablespace_name,round(sum(bytes)/1024/1024) FreeSpace
 11    from dba_free_space group by tablespace_name) fs
 12  where df.tablespace_name=fs.tablespace_name;

Tablespace                        Used MB    Free MB   Total MB  Pct. Free
------------------------------ ---------- ---------- ---------- ----------
SYSTEM                               7207       3033      10240         30
TEST                                 6790      34170      40960         83
USERS                                 173      25427      25600         99
UNDOTBS2                              227      24013      24240         99
DATA                               990119     176281    1166400         15
SYSAUX                               3925       1195       5120         23
UNDOTBS1                            12898      28062      40960         69

7 rows selected

查看表lv_data的依赖对象

SQL> select NAME,TYPE from dba_dependencies where REFERENCED_NAME='LV_DATA';

NAME                           TYPE
------------------------------ ------------------
LV_DATA                        VIEW
FC_UPDATE_CORPFUND             PROCEDURE
FC_UPDATE_MY                   PROCEDURE
FC_UPDATE_KY                   PROCEDURE
FC_UPDATE_FACTPAY              PROCEDURE
FC_UPDATE_CALCPAY              PROCEDURE
FC_UPDATE_KY                   PROCEDURE
......
LV_DATA                        SYNONYM
LV_DATA                        VIEW
LV_DATA                        SYNONYM
LV_DATA                        SYNONYM

139 rows selected

查看所有依赖对象的所有者

SQL> select  distinct owner from dba_objects where OBJECT_NAME in(select NAME from dba_dependencies where REFERENCED_NAME='LV_DATA');

OWNER
------------------------------
SY
SY_BK
WEB
CX
DATA
OLD
TEST
XC
CZ
OSY
BACKUP
TJ

12 rows selected

对所有依赖对象所有者授权可以无限制使用表空间

SQL> grant unlimited tablespace to OSY;

Grant succeeded

SQL> grant unlimited tablespace to SBK;

Grant succeeded

SQL> grant unlimited tablespace to WEB;

Grant succeeded

SQL> grant unlimited tablespace to CX;

Grant succeeded

SQL> grant unlimited tablespace to DATA;

Grant succeeded

SQL> grant unlimited tablespace to OLD;

Grant succeeded

SQL> grant unlimited tablespace to TEST;

Grant succeeded

SQL> grant unlimited tablespace to XC;

Grant succeeded

SQL> grant unlimited tablespace to CZ;

Grant succeeded

SQL> grant unlimited tablespace to SY;

Grant succeeded

SQL> grant unlimited tablespace to BACKUP;

Grant succeeded

SQL> grant unlimited tablespace to TJ;

Grant succeeded

再向表lv_data插入数据时恢复正常