Oracle RAC修改public,private,vip scan IP

oracle rac修改IP涉及到OCR,在执行修改前要进行一个备份,这样即使修改失败,也能还原回来,如何备份可以参考
oracle-rac-ocr-和-votingdisk-的备份与恢复/
实验环境:redhat 5.4 oracle 10g rac,oracle 11g rac
原来两节点rac IP设置如下:
192.168.56.11 jy1
192.168.56.12 jy2
192.168.56.13 jy1-priv
192.168.56.14 jy2-priv
192.168.56.15 jy1-vip
192.168.56.16 jy2-vip
现在修改为
192.168.56.111 jy1
192.168.56.112 jy2
192.168.100.1 jy1-priv
192.168.100.2 jy2-priv
192.168.56.113 jy1-vip
192.168.56.114 jy2-vip

一. 停止oracle相关的所有进程, 包括数据库, asm, node application, crs
1.1 查看当前系统上crs运行的状态

[root@jy1 bin]# cd /u01/app/oracle/product/10.2.0/crs/bin
[root@jy1 bin]# pwd
/u01/app/oracle/product/10.2.0/crs/bin
[root@jy1 bin]# ./crs_stat -t
Name           Type           Target    State     Host        
------------------------------------------------------------
ora....SM1.asm application    ONLINE    ONLINE    jy1         
ora....Y1.lsnr application    ONLINE    ONLINE    jy1         
ora.jy1.gsd    application    ONLINE    ONLINE    jy1         
ora.jy1.ons    application    ONLINE    ONLINE    jy1         
ora.jy1.vip    application    ONLINE    ONLINE    jy1         
ora....SM2.asm application    ONLINE    ONLINE    jy2         
ora....Y2.lsnr application    ONLINE    ONLINE    jy2         
ora.jy2.gsd    application    ONLINE    ONLINE    jy2         
ora.jy2.ons    application    ONLINE    ONLINE    jy2         
ora.jy2.vip    application    ONLINE    ONLINE    jy2         
ora.jyrac.db   application    ONLINE    ONLINE    jy1         
ora....c1.inst application    ONLINE    ONLINE    jy1         
ora....c2.inst application    ONLINE    ONLINE    jy2     

1.2 关闭数据库

[root@jy1 bin]# ./srvctl stop database -d jyrac

1.3 关闭asm实例

[root@jy1 bin]# ./srvctl stop asm -n jy1
[root@jy1 bin]# ./srvctl stop asm -n jy2

1.4 关闭其他应用程序

[root@jy1 bin]# ./srvctl stop nodeapps -n jy1
[root@jy1 bin]# ./srvctl stop nodeapps -n jy2

1.5 关闭crs后台进程, 在操作系统一级中止运行的crs后台进程, 必须在所有节点上运行.

[root@jy1 bin]# /etc/init.d/init.crs stop
Shutting down Oracle Cluster Ready Services (CRS):
Stopping resources.
Successfully stopped CRS resources 
Stopping CSSD.
Shutting down CSS daemon.
Shutdown request successfully issued.
Shutdown has begun. The daemons should exit soon.

[root@jy2 bin]# /etc/init.d/init.crs stop
Shutting down Oracle Cluster Ready Services (CRS):
Stopping resources.
Successfully stopped CRS resources 
Stopping CSSD.
Shutting down CSS daemon.
Shutdown request successfully issued.
Shutdown has begun. The daemons should exit soon.

二 修改操作系统的ip设置
2.1 修改 /etc/hosts 文件。 保持2个节点的一致

[root@jy1 bin]# vi /etc/hosts
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1               jy1 localhost.localdomain localhost
::1             localhost6.localdomain6 localhost6
192.168.56.111           jy1
192.168.56.112           jy2
192.168.100.1           jy1-priv
192.168.100.2           jy2-priv
192.168.56.113           jy1-vip
192.168.56.114           jy2-vip

[root@jy2 bin]# vi /etc/hosts
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1               jy2 localhost.localdomain localhost
::1             localhost6.localdomain6 localhost6
192.168.56.111           jy1
192.168.56.112           jy2
192.168.100.1           jy1-priv
192.168.100.2           jy2-priv
192.168.56.113           jy1-vip
192.168.56.114           jy2-vip

2.2 在两个节点上分别修改eth0和eth1地址:

[root@jy1 bin]# vi /etc/sysconfig/network-scripts/ifcfg-eth0
# Intel Corporation 82540EM Gigabit Ethernet Controller
DEVICE=eth0
BOOTPROTO=none
ONBOOT=yes
HWADDR=08:00:27:54:5a:6a
NETMASK=255.255.255.0
IPADDR=192.168.56.111
GATEWAY=192.168.56.1
TYPE=Ethernet
USERCTL=no
IPV6INIT=no
PEERDNS=yes



[root@jy2 bin]# vi /etc/sysconfig/network-scripts/ifcfg-eth0
# Intel Corporation 82540EM Gigabit Ethernet Controller
DEVICE=eth0
BOOTPROTO=none
ONBOOT=yes
HWADDR=08:00:27:41:5d:e5
NETMASK=255.255.255.0
IPADDR=192.168.56.112
GATEWAY=192.168.56.1
TYPE=Ethernet
USERCTL=no
IPV6INIT=no
PEERDNS=yes


[root@jy1 bin]# vi /etc/sysconfig/network-scripts/ifcfg-eth1
IPADDR=192.168.100.1
# Intel Corporation 82540EM Gigabit Ethernet Controller
DEVICE=eth1
BOOTPROTO=none
ONBOOT=yes
HWADDR=08:00:27:c3:b7:37
NETMASK=255.255.255.0
IPADDR=192.168.100.1
TYPE=Ethernet
USERCTL=no
IPV6INIT=no
PEERDNS=yes

[root@jy2 bin]# vi /etc/sysconfig/network-scripts/ifcfg-eth1
GATEWAY=192.168.56.1
# Intel Corporation 82540EM Gigabit Ethernet Controller
DEVICE=eth1
BOOTPROTO=none
ONBOOT=yes
HWADDR=08:00:27:4a:e3:cb
TYPE=Ethernet
NETMASK=255.255.255.0
IPADDR=192.168.100.2
USERCTL=no
IPV6INIT=no
PEERDNS=yes

2.3 在两个节点上分别重启网络服务,是修改的IP生效

[root@jy1 bin]# service network restart

[root@jy2 bin]# service network restart

三 启动crs, 设置oracle中ip地址相关的设置
3.1 在两个节点启动crs, 并关闭随crs启动的所有服务

[root@jy1 bin]# /etc/init.d/init.crs start
Startup will be queued to init within 90 seconds.

[root@jy2 bin]# /etc/init.d/init.crs start
Startup will be queued to init within 90 seconds.

由于oracle所有服务设置为自动启动, 所以在crs启动时会试图启动所有的服务, 但是在对oracle相关的ip地址进行设置时需要crs处于运行状态. 但是数据库, asm和node application处于停止状态, 在一个节点执行以下命令来停止所有服务:

[root@jy1 bin]# ./srvctl stop database -d jyrac
[root@jy1 bin]# ./srvctl stop asm -n jy1
[root@jy1 bin]# ./srvctl stop asm -n jy2
[root@jy1 bin]# ./srvctl stop nodeapps -n jy1
[root@jy1 bin]# ./srvctl stop nodeapps -n jy2

[root@jy1 bin]# ./crs_stat -t
Name           Type           Target    State     Host        
------------------------------------------------------------
ora....SM1.asm application    OFFLINE   OFFLINE               
ora....Y1.lsnr application    OFFLINE   OFFLINE               
ora.jy1.gsd    application    OFFLINE   OFFLINE               
ora.jy1.ons    application    OFFLINE   OFFLINE               
ora.jy1.vip    application    OFFLINE   OFFLINE               
ora....SM2.asm application    OFFLINE   OFFLINE               
ora....Y2.lsnr application    OFFLINE   OFFLINE               
ora.jy2.gsd    application    OFFLINE   OFFLINE               
ora.jy2.ons    application    OFFLINE   OFFLINE               
ora.jy2.vip    application    OFFLINE   OFFLINE               
ora.jyrac.db   application    OFFLINE   OFFLINE               
ora....c1.inst application    OFFLINE   OFFLINE               
ora....c2.inst application    OFFLINE   OFFLINE     

3.2 使用oifcfg修改网卡设置, oifconfig可以用来设置和查看网卡使用的方式(如果修改的网段相同可以不做这一步)
3.2.1查看当前配置:

[root@jy1 bin]# ./oifcfg getif -global
eth0  192.168.56.0  global  public
eth1  192.168.56.0  global  cluster_interconnect

3.2.2删除当前配置

[root@jy1 bin]# ./oifcfg delif -global eth0
[root@jy1 bin]# ./oifcfg delif -global eth1
[root@jy1 bin]# ./oifcfg getif

3.2.3重新添加

[root@jy1 bin]# ./oifcfg setif -global eth0/192.168.56.0:public
[root@jy1 bin]# ./oifcfg setif -global eth1/192.168.100.0:cluster_interconnect
[root@jy1 bin]# ./oifcfg getif
eth0  192.168.56.0  global  public
eth1  192.168.100.0  global  cluster_interconnect
[root@jy1 bin]# ./oifcfg iflist
eth0  192.168.56.0
eth1  192.168.100.0

[root@jy2 bin]# ./oifcfg iflist
eth0  192.168.56.0
eth1  192.168.100.0

这里IP 地址最后一个为0,代表的是一个网段。修改的时候要切记。否在在启动OCR 时会报如下错误:[ CRSOCR][4054413904] OCR context init failure. Error: PROC-44: 网络地址和网络接口操作中出错 网络地址和网络接口操作错误 [7]

3.3 修改VIP

[root@jy1 bin]# ./srvctl modify nodeapps -n jy1 -A 192.168.56.113/255.255.255.0/eth0

[root@jy1 bin]# ./srvctl modify nodeapps -n jy2 -A 192.168.56.114/255.255.255.0/eth0

3.4 设置listener.ora和tnsnames.ora, 检查这些文件中是否有指定原来ip的地方,
修改为更改后的ip地址(注意:如果你使用了ocfs,修改ocfs配置文件(/etc/ocfs/cluster.conf),
验证修改后是否可用)

在节点jy1上没有使用原来的固定IP不用修改
[root@jy1 bin]# vi /u01/app/oracle/product/10.2.0/db/network/admin/listener.ora
# listener.ora.jy1 Network Configuration File: /u01/app/oracle/product/10.2.0/db/network/admin/listener.ora.jy1
# Generated by Oracle configuration tools.

LISTENER_JY1 =
(DESCRIPTION_LIST =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = jy1-vip)(PORT = 1521)(IP = FIRST))
)
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = 127.0.0.1)(PORT = 1521)(IP = FIRST))
)
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC))
)
)
)

节点jy2使用了原来的IP192.168.56.12所以将其修改为192.168.56.112
[root@jy2 bin]# cat /u01/app/oracle/product/10.2.0/db/network/admin/listener.ora
# listener.ora.jy2 Network Configuration File: /u01/app/oracle/product/10.2.0/db/network/admin/listener.ora.jy2
# Generated by Oracle configuration tools.

LISTENER_JY2 =
(DESCRIPTION_LIST =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = jy2-vip)(PORT = 1521)(IP = FIRST))
)
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.56.12)(PORT = 1521)(IP = FIRST))
)
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC))
)
)
)

[root@jy2 bin]# vi /u01/app/oracle/product/10.2.0/db/network/admin/listener.ora
# listener.ora.jy2 Network Configuration File: /u01/app/oracle/product/10.2.0/db/network/admin/listener.ora.jy2
# Generated by Oracle configuration tools.

LISTENER_JY2 =
(DESCRIPTION_LIST =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = jy2-vip)(PORT = 1521)(IP = FIRST))
)
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.56.112)(PORT = 1521)(IP = FIRST))
)
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC))
)
)
)

3.5 启动node applications, asm, 数据库
[root@jy1 bin]# ./srvctl start nodeapps -n jy1
[root@jy1 bin]# ./srvctl start nodeapps -n jy2
[root@jy1 bin]# ./srvctl start asm -n jy1
[root@jy1 bin]# ./srvctl start asm -n jy2
[root@jy1 bin]# ./srvctl start instance -d jyrac -i jyrac1
[root@jy1 bin]# ./srvctl start instance -d jyrac -i jyrac2

[root@jy1 bin]# ./crs_stat -t
Name Type Target State Host
————————————————————
ora….SM1.asm application ONLINE ONLINE jy1
ora….Y1.lsnr application ONLINE ONLINE jy1
ora.jy1.gsd application ONLINE ONLINE jy1
ora.jy1.ons application ONLINE ONLINE jy1
ora.jy1.vip application ONLINE ONLINE jy1
ora….SM2.asm application ONLINE ONLINE jy2
ora….Y2.lsnr application ONLINE ONLINE jy2
ora.jy2.gsd application ONLINE ONLINE jy2
ora.jy2.ons application ONLINE ONLINE jy2
ora.jy2.vip application ONLINE ONLINE jy2
ora.jyrac.db application ONLINE ONLINE jy2
ora….c1.inst application ONLINE ONLINE jy1
ora….c2.inst application ONLINE ONLINE jy2

3.6 确认IP, 修改完成。
[root@jy1 bin]# ifconfig
eth0 Link encap:Ethernet HWaddr 08:00:27:54:5A:6A
inet addr:192.168.56.111 Bcast:192.168.56.255 Mask:255.255.255.0
inet6 addr: fe80::a00:27ff:fe54:5a6a/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:515539 errors:0 dropped:0 overruns:0 frame:0
TX packets:546714 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:294661949 (281.0 MiB) TX bytes:357427245 (340.8 MiB)
Base address:0xd010 Memory:f0000000-f0020000

eth0:1 Link encap:Ethernet HWaddr 08:00:27:54:5A:6A
inet addr:192.168.56.113 Bcast:192.168.56.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
Base address:0xd010 Memory:f0000000-f0020000

eth1 Link encap:Ethernet HWaddr 08:00:27:C3:B7:37
inet addr:192.168.100.1 Bcast:192.168.100.255 Mask:255.255.255.0
inet6 addr: fe80::a00:27ff:fec3:b737/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:21818 errors:0 dropped:0 overruns:0 frame:0
TX packets:32288 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:10823304 (10.3 MiB) TX bytes:26553389 (25.3 MiB)
Base address:0xd240 Memory:f0820000-f0840000

lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:180050 errors:0 dropped:0 overruns:0 frame:0
TX packets:180050 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:57676272 (55.0 MiB) TX bytes:57676272 (55.0 MiB)

[root@jy2 bin]# ifconfig
eth0 Link encap:Ethernet HWaddr 08:00:27:41:5D:E5
inet addr:192.168.56.112 Bcast:192.168.56.255 Mask:255.255.255.0
inet6 addr: fe80::a00:27ff:fe41:5de5/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:545442 errors:0 dropped:0 overruns:0 frame:0
TX packets:514967 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:356961066 (340.4 MiB) TX bytes:295147830 (281.4 MiB)
Base address:0xd010 Memory:f0000000-f0020000

eth0:1 Link encap:Ethernet HWaddr 08:00:27:41:5D:E5
inet addr:192.168.56.114 Bcast:192.168.56.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
Base address:0xd010 Memory:f0000000-f0020000

eth1 Link encap:Ethernet HWaddr 08:00:27:4A:E3:CB
inet addr:192.168.100.2 Bcast:192.168.100.255 Mask:255.255.255.0
inet6 addr: fe80::a00:27ff:fe4a:e3cb/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:32190 errors:0 dropped:0 overruns:0 frame:0
TX packets:21813 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:26535297 (25.3 MiB) TX bytes:10821634 (10.3 MiB)
Base address:0xd240 Memory:f0820000-f0840000

lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:163867 errors:0 dropped:0 overruns:0 frame:0
TX packets:163867 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:30461066 (29.0 MiB) TX bytes:30461066 (29.0 MiB)

从输出结果可知修改成功.

四 修改oracle 11g rac的scan IP,我这里没有使用DNS,只设置了一个scan IP:10.138.130.155现将其修改为10.138.130.159(因为11g rac修改public,private,vip IP与上面10g rac一样不再说明)

root@jyrac1 ~]# cat  /etc/hosts
10.138.130.151 jyrac1
10.138.130.152 jyrac2
10.10.10.1     jyrac1-priv
10.10.10.2     jyrac2-priv
10.138.130.153 jyrac1-vip
10.138.130.154 jyrac2-vip
10.138.130.155 jyrac-scan

4.1在所有节点中 /etc/hosts 文件中修改 scan 对应的ip

[root@jyrac1 ~]# vi /etc/hosts
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1               jyrac1 localhost.localdomain localhost
::1             localhost6.localdomain6 localhost6
10.138.130.151 jyrac1
10.138.130.152 jyrac2
10.10.10.1     jyrac1-priv
10.10.10.2     jyrac2-priv
10.138.130.153 jyrac1-vip
10.138.130.154 jyrac2-vip
10.138.130.159 jyrac-scan
~
[root@jyrac2 ~]# vi /etc/hosts
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1               jyrac2 localhost.localdomain localhost
::1             localhost6.localdomain6 localhost6
10.138.130.151 jyrac1
10.138.130.152 jyrac2
10.10.10.1     jyrac1-priv
10.10.10.2     jyrac2-priv
10.138.130.153 jyrac1-vip
10.138.130.154 jyrac2-vip
10.138.130.159 jyrac-scan

4.2 查看scan IP信息

[grid@jyrac1 ~]$ srvctl config scan
SCAN name: jyrac-scan, Network: 1/10.138.130.0/255.255.255.0/eth0
SCAN VIP name: scan1, IP: /jyrac-scan/10.138.130.155

4.3 停止scan_listener ,scan 应用

[grid@jyrac2 ~]$ srvctl stop scan_listener
[grid@jyrac2 ~]$ srvctl stop scan

4.4 确认 scan_listener,scan 的状态

[grid@jyrac1 ~]$ srvctl status scan_listener
SCAN Listener LISTENER_SCAN1 is enabled
SCAN listener LISTENER_SCAN1 is not running
[grid@jyrac1 ~]$ srvctl status scan
SCAN VIP scan1 is enabled
SCAN VIP scan1 is not running

[grid@jyrac1 ~]$ crs_stat -t | grep  scan  
ora.scan1.vip  ora....ip.type OFFLINE   OFFLINE               
[grid@jyrac1 ~]$ crs_stat -t | grep  lsnr
ora....ER.lsnr ora....er.type ONLINE    ONLINE    jyrac1      
ora....N1.lsnr ora....er.type OFFLINE   OFFLINE               
ora....C1.lsnr application    ONLINE    ONLINE    jyrac1      
ora....C2.lsnr application    ONLINE    ONLINE    jyrac2 

4.5 修改scan 名称

[grid@jyrac1 ~]$ srvctl modify scan -h

Modifies the SCAN name.

Usage: srvctl modify scan -n 
    -n            Domain name qualified SCAN name 域名限定的 SCAN 名
    -h                       Print usage 输出帮助信息

grid 用户无权更改scan ip,必须使用root权限

[grid@jyrac2 ~]$ srvctl modify scan -n jyrac-scan
PRCS-1034 : Failed to modify Single Client Access Name jyrac-scan
PRCR-1071 : Failed to register or update resource type ora.scan_vip.type
CRS-0245:  User doesn't have enough privilege to perform the operation

[root@jyrac1 bin]# cd /u01/app/product/11.2.0/crs/bin
[root@jyrac1 bin]# pwd
/u01/app/product/11.2.0/crs/bin
[root@jyrac1 bin]# ./srvctl modify scan -n jyrac-scan

4.6 变更后,进行确认

[root@jyrac1 bin]# ./srvctl config scan
SCAN name: jyrac-scan, Network: 1/10.138.130.0/255.255.255.0/eth0
SCAN VIP name: scan1, IP: /jyrac-scan/10.138.130.159

4.7 重新启动scan,scan_listener

[grid@jyrac1 ~]$ srvctl start scan
[grid@jyrac1 ~]$ srvctl start scan_listener

4.8 确认scan 和scan_listener.ora的状态

[grid@jyrac1 ~]$ crs_stat -t | grep  scan
ora.scan1.vip  ora....ip.type ONLINE    ONLINE    jyrac1      
[grid@jyrac1 ~]$ crs_stat -t | grep  lsnr
ora....ER.lsnr ora....er.type ONLINE    ONLINE    jyrac1      
ora....N1.lsnr ora....er.type ONLINE    ONLINE    jyrac1      
ora....C1.lsnr application    ONLINE    ONLINE    jyrac1      
ora....C2.lsnr application    ONLINE    ONLINE    jyrac2  

4.9 进行ping测试

[grid@jyrac2 ~]$ ping 10.138.130.159
PING 10.138.130.159 (10.138.130.159) 56(84) bytes of data.
64 bytes from 10.138.130.159: icmp_seq=1 ttl=64 time=1.88 ms
64 bytes from 10.138.130.159: icmp_seq=2 ttl=64 time=0.166 ms

修改scan IP与修改private ip ,vip 不一样,修改scan ip 无需停止数据库实例,asm 或者重启crs比较简单

Oracle RAC OCR 和 VotingDisk 的备份与恢复

一:Voting Disk
Voting Disk 这个文件主要用于记录节点成员状态,在出现脑裂时,决定那个Partion获得控制权,其他的Partion必须从集群中剔除。Voting disk使用的是一种“多数可用算法”,如果有多个Voting disk,则必须一半以上的Votedisk同时使用,Clusterware才能正常使用。 比如配置了4个Votedisk,坏一个Votedisk,集群可以正常工作,如果坏了2个,则不能满足半数以上,集群会立即宕掉,
所有节点立即重启,所以如果添加Votedisk,尽量不要只添加一个,而应该添加2个。这点和OCR 不一样。OCR 只需配置一个。

1.1查看votedisk的位置:

[root@jy1 ~]# cd u01/app/oracle/product/10.2.0/crs/bin

[root@jy1 bin]# ./crsctl query css votedisk
 0.     0    /dev/raw/raw2

located 1 votedisk(s).

1.2备份votedisk盘:

[root@jy1 bin]# dd if=/dev/raw/raw2 of=/home/oracle/votedisk.bak
6291456+0 records in
6291456+0 records out
3221225472 bytes (3.2 GB) copied, 201.63 seconds, 16 MB/s

1.3通过Strings 命令来查看 voting disk 的内容

[root@jy1 bin]# strings /home/oracle/votedisk.bak | sort -u
fSLC
ssLckcoT
SslcLlik
sSlcrEp0
}|{z

1.4恢复votedisk盘:

[root@jy1 bin]# dd if=/home/oracle/votedisk.bak of=/dev/raw/raw2
6291456+0 records in
6291456+0 records out
3221225472 bytes (3.2 GB) copied, 201.63 seconds, 16 MB/s

二 :OCR
Oracle Clusterware把整个集群的配置信息放在共享存储上,这些信息包括了集群节点的列表,集群数据库实例到节点的映射以及CRS应用程序资源信息。存放的位置就在OCR Disk上. 在整个集群中,只有一个节点能对OCR Disk 进行读写操作,这个节点叫作Master Node,所有节点都会在内存中保留一份OCR的拷贝,同时有一个OCR Process 从这个内存中读取内容。 OCR 内容发生改变时,由Master Node的OCR Process负责同步到其他节点的OCR Process。

Oracle 每4个小时对其做一次备份,并且保留最后的3个备份,以及前一天,前一周的最后一个备份。 这个备份由Master Node CRSD进程完成,备份的默认位置是$CRS_HOME/crs/cdata/目录下,可以通过ocrconfig -backuploc 命令修改到新的目录。 每次备份后,备份文件名自动更改,以反应备份时间顺序,最近一次的备份
叫作backup00.ocr。这些备份文件除了保存在本地,DBA还应该在其他存储设备上保留一份,以防止意外的存储故障。

[root@jy1 crs]# pwd
/u01/app/oracle/product/10.2.0/crs/cdata/crs
[root@jy1 crs]# ls -lrt
total 12396
-rw-r--r-- 1 root root 4227072 Nov 17 14:45 backup00.ocr
-rw-r--r-- 1 root root 4227072 Nov 17 14:45 week.ocr
-rw-r--r-- 1 root root 4227072 Nov 17 14:45 day.ocr

在安装clusterware过程中,如果选择External Redundancy冗余方式,则只能输入一个OCR磁盘位置。 但是Oracle允许配置两个OCR 磁盘互为镜像,以防止OCR 磁盘的单点故障。 OCR 磁盘和Votedisk磁盘不一样,OCR磁盘最多只能有两个,一个Primary OCR 和一个Mirror OCR。

Oracle 推荐在对集群做调整时,比如增加,删除节点之前,修改RAC IP之前,对OCR做一个备份,可以使用export 备份到指定文件,如果做了replace或者restore 等操作,Oracle 建议使用 cluvfy comp ocr -n all 命令来做一次全面的检查。对OCR的备份与恢复,我们可以使用ocrconfig 命令。

[root@jy1 bin]# ./ocrconfig --help
Name:
        ocrconfig - Configuration tool for Oracle Cluster Registry.

Synopsis:
        ocrconfig [option]
        option:
                -export  [-s online]
                                                    - Export cluster register contents to a file
                -import                   - Import cluster registry contents from a file
                -upgrade [ []]
                                                    - Upgrade cluster registry from previous version
                -downgrade [-version ]
                                                    - Downgrade cluster registry to the specified version
                -backuploc                 - Configure periodic backup location
                -showbackup                         - Show backup information
                -restore                  - Restore from physical backup
                -replace ocr|ocrmirror [] - Add/replace/remove a OCR device/file
                -overwrite                          - Overwrite OCR configuration on disk
                -repair ocr|ocrmirror     - Repair local OCR configuration
                -help                               - Print out this help information

Note:
        A log file will be created in
        $ORACLE_HOME/log//client/ocrconfig_.log. Please ensure
        you have file creation privileges in the above directory before
        running this tool.

1. 用导出导入备份恢复OCR

1.1首先关闭所有节点的CRS

[root@jy1 bin]# ./crsctl stop crs
Stopping resources.
Successfully stopped CRS resources 
Stopping CSSD.
Shutting down CSS daemon.
Shutdown request successfully issued.

[root@jy2 bin]# ./crsctl stop crs
Stopping resources.
Successfully stopped CRS resources 
Stopping CSSD.
Shutting down CSS daemon.
Shutdown request successfully issued.

1.2用root 用户导出OCR内容

[root@jy1 bin]# ./ocrconfig -export /u01/ocrbak.exp
[root@jy1 bin]# ls -lrt /u01
total 96
drwxr-xr-x 3 root root  4096 Nov 10 23:12 app
drwxrwxrwx 6 root root  4096 Nov 11 11:54 tmp
-rw-r--r-- 1 root root 84375 Nov 17 16:52 ocrbak.exp

1.3重启CRS

[root@jy1 bin]# ./crsctl start crs
Attempting to start CRS stack 
The CRS stack will be started shortly

[root@jy2 bin]# ./crsctl start crs
Attempting to start CRS stack 
The CRS stack will be started shortly

1.4检查CRS 状态

Cannot communicate with EVM 
[root@jy1 bin]# ./crsctl check crs
CSS appears healthy
CRS appears healthy
EVM appears healthy

[root@jy2 bin]# ./crsctl check crs
CSS appears healthy
CRS appears healthy
EVM appears healthy

[root@jy1 bin]# ./crs_stat -t
Name           Type           Target    State     Host        
------------------------------------------------------------
ora....SM1.asm application    ONLINE    ONLINE    jy1         
ora....Y1.lsnr application    ONLINE    ONLINE    jy1         
ora.jy1.gsd    application    ONLINE    ONLINE    jy1         
ora.jy1.ons    application    ONLINE    ONLINE    jy1         
ora.jy1.vip    application    ONLINE    ONLINE    jy1         
ora....SM2.asm application    ONLINE    ONLINE    jy2         
ora....Y2.lsnr application    ONLINE    ONLINE    jy2         
ora.jy2.gsd    application    ONLINE    ONLINE    jy2         
ora.jy2.ons    application    ONLINE    ONLINE    jy2         
ora.jy2.vip    application    ONLINE    ONLINE    jy2         
ora.jyrac.db   application    ONLINE    ONLINE    jy2         
ora....c1.inst application    ONLINE    ONLINE    jy1         
ora....c2.inst application    ONLINE    ONLINE    jy2    

1.5 检查OCR一致性

[root@jy1 bin]# ./ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          2
         Total space (kbytes)     :    3145640
         Used space (kbytes)      :       3816
         Available space (kbytes) :    3141824
         ID                       : 1032702449
         Device/File Name         : /dev/raw/raw1
                                    Device/File integrity check succeeded

                                    Device/File not configured

         Cluster registry integrity check succeeded

1.6破坏OCR内容

[root@jy1 bin]# dd if=/dev/zero of=/dev/raw/raw1 bs=8192 count=1000
1000+0 records in
1000+0 records out
8192000 bytes (8.2 MB) copied, 0.355733 seconds, 23.0 MB/s

1.7再次检查OCR一致性

[root@jy1 bin]# ./ocrcheck
PROT-601: Failed to initialize ocrcheck

再来执行crs_stat -t命令就会发现crs已经终止了

[root@jy1 bin]# ./crs_stat -t
CRS-0184: Cannot communicate with the CRS daemon.

1.8使用cluvfy 工具检查一致性

[root@jy1 cluvfy]# su - oracle
[oracle@jy1 ~]$ cd /soft/clusterware/cluvfy
[oracle@jy1 ~]$ ./runcluvfy.sh comp ocr -n all

Verifying OCR integrity 
Unable to retrieve nodelist from Oracle clusterware.

Verification cannot proceed.

1.9使用Import 恢复OCR 内容(使用restore选项只能导入OCR自动产生的物理备份, import选项只能导入通过export选项导出的的逻辑备份)

[root@jy1 bin]#  ./ocrconfig -import /u01/ocrbak.exp

1.10 再次检查OCR

[root@jy1 bin]# ./ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          2
         Total space (kbytes)     :    3145640
         Used space (kbytes)      :       3816
         Available space (kbytes) :    3141824
         ID                       : 1032702449
         Device/File Name         : /dev/raw/raw1
                                    Device/File integrity check succeeded

                                    Device/File not configured

         Cluster registry integrity check succeeded

1.11 使用cluvfy工具检查

[root@jy1 cluvfy]# su - oracle
[oracle@jy1 ~]$ cd /soft/clusterware/cluvfy
[oracle@jy1 cluvfy]$ ./runcluvfy.sh comp ocr -n all

Verifying OCR integrity 

Checking OCR integrity...

Checking the absence of a non-clustered configuration...
All nodes free of non-clustered, local-only configurations.

Uniqueness check for OCR device passed.

Checking the version of OCR...
OCR of correct Version "2" exists.

Checking data integrity of OCR...
Data integrity check for OCR passed.

OCR integrity check passed.

Verification of OCR integrity was successful.

2使用自动备份恢复OCR
2.1关闭运行在集群数据库的所有节点上的CRS服务程序(在Oracle 11gR2 中已经没有了init.crs 命令了。 只能通过crsctl stop crs命令来关闭CRS.)
/etc/init.d/init.crs stop 或者crsctl stop crs

2.2 通过ocrconfig 的showbackup选项查看最近的备份

[root@jy1 bin]# /etc/init.d/init.crs stop
Shutting down Oracle Cluster Ready Services (CRS):
Stopping resources.
Successfully stopped CRS resources 
Stopping CSSD.
Shutting down CSS daemon.
Shutdown request successfully issued.
Shutdown has begun. The daemons should exit soon.

[root@jy2 bin]# /etc/init.d/init.crs stop
Shutting down Oracle Cluster Ready Services (CRS):
Stopping resources.
Successfully stopped CRS resources 
Stopping CSSD.
Shutting down CSS daemon.
Shutdown request successfully issued.
Shutdown has begun. The daemons should exit soon.

2.2通过ocrconfig 的showbackup选项查看最近的备份

[root@jy1 bin]# ./ocrconfig -showbackup

jy1     2014/11/17 14:45:54     /u01/app/oracle/product/10.2.0/crs/cdata/crs

          

          

jy1     2014/11/17 14:45:54     /u01/app/oracle/product/10.2.0/crs/cdata/crs

jy1     2014/11/17 14:45:54     /u01/app/oracle/product/10.2.0/crs/cdata/crs
[root@jy1 bin]# ls -lrt /u01/app/oracle/product/10.2.0/crs/cdata/crs
total 12396
-rw-r--r-- 1 root root 4227072 Nov 17 14:45 backup00.ocr
-rw-r--r-- 1 root root 4227072 Nov 17 14:45 week.ocr
-rw-r--r-- 1 root root 4227072 Nov 17 14:45 day.ocr

2.3破坏OCR内容

[root@jy1 bin]# dd if=/dev/zero of=/dev/raw/raw1 bs=8192 count=1000
1000+0 records in
1000+0 records out
8192000 bytes (8.2 MB) copied, 0.355733 seconds, 23.0 MB/s

2.4再次检查OCR一致性

[root@jy1 bin]# ./ocrcheck
PROT-601: Failed to initialize ocrcheck

再来执行crs_stat -t命令就会发现crs已经终止了

[root@jy1 bin]# ./crs_stat -t
CRS-0184: Cannot communicate with the CRS daemon.

2.5使用cluvfy 工具检查一致性

[root@jy1 cluvfy]# su - oracle
[oracle@jy1 ~]$ cd /soft/clusterware/cluvfy
[oracle@jy1 ~]$ ./runcluvfy.sh comp ocr -n all

Verifying OCR integrity 
Unable to retrieve nodelist from Oracle clusterware.

Verification cannot proceed.

2.6通过ocrconfig的restore或import选项导入OCR数据(使用restore选项只能导入OCR自动产生的物理备份,import选项只能导入通过export选项导出的的逻辑备份)
ocrconfig -restore filename_location

                         
[root@jy1 bin]# ./ocrconfig  -restore  /u01/app/oracle/product/10.2.0/crs/cdata/crs/backup00.ocr

2.7 检查CRS

[root@jy1 bin]# ./ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          2
         Total space (kbytes)     :    3145640
         Used space (kbytes)      :       3816
         Available space (kbytes) :    3141824
         ID                       : 1387716561
         Device/File Name         : /dev/raw/raw1
                                    Device/File integrity check succeeded

                                    Device/File not configured

         Cluster registry integrity check succeeded

2.8 使用cluvfy工具检查

[root@jy1 cluvfy]# su - oracle
[oracle@jy1 ~]$ cd /soft/clusterware/cluvfy
[oracle@jy1 cluvfy]$ ./runcluvfy.sh comp ocr -n all

Verifying OCR integrity 

Checking OCR integrity...

Checking the absence of a non-clustered configuration...
All nodes free of non-clustered, local-only configurations.

Uniqueness check for OCR device passed.

Checking the version of OCR...
OCR of correct Version "2" exists.

Checking data integrity of OCR...
Data integrity check for OCR passed.

OCR integrity check passed.

Verification of OCR integrity was successful. 

2.9 在所有节点上重新启动CRS
/etc/init.d/init.crs start 而在Oracle 11gR2使用:crsctl start crs 命令来启动CRS.

[root@jy1 bin]# /etc/init.d/init.crs start
Startup will be queued to init within 90 seconds.

[root@jy2 bin]# /etc/init.d/init.crs start
Startup will be queued to init within 90 seconds.

[root@jy1 bin]# ./crs_stat -t
Name           Type           Target    State     Host        
------------------------------------------------------------
ora....SM1.asm application    ONLINE    ONLINE    jy1         
ora....Y1.lsnr application    ONLINE    ONLINE    jy1         
ora.jy1.gsd    application    ONLINE    ONLINE    jy1         
ora.jy1.ons    application    ONLINE    ONLINE    jy1         
ora.jy1.vip    application    ONLINE    ONLINE    jy1         
ora....SM2.asm application    ONLINE    ONLINE    jy2         
ora....Y2.lsnr application    ONLINE    ONLINE    jy2         
ora.jy2.gsd    application    ONLINE    ONLINE    jy2         
ora.jy2.ons    application    ONLINE    ONLINE    jy2         
ora.jy2.vip    application    ONLINE    ONLINE    jy2         
ora.jyrac.db   application    ONLINE    ONLINE    jy1         
ora....c1.inst application    ONLINE    ONLINE    jy1         
ora....c2.inst application    ONLINE    ONLINE    jy2   

Linux 5.4在安装rac执行root.sh Failed to upgrade Oracle Cluster Registry configuration故障处理

因为oracle 10g版权被全省买断,11g没有买版权所以朋友要在Red hat Linux 5.4下使用裸设备来安装oracle 10g rac数据库。但在安装过程出现了一些错误 到节点2上用root用户身份执行/u01/app/oracle/product/10.2.0/crs下的root.sh:
/u01/app/product/10.2.0/crs/jdk/jre//bin/java: error while loading shared libraries:
libpthread.so.0: cannot open shared object file: No such file or directory

节点2上执行root.sh之前需要编辑两个文件,这两个文件都位于/u01/app/oracle/product/10.2.0/crs/bin下,文件名分别为vipca和srvctl。首先编辑vipca文件,找到如下内容:
Remove this workaround when the bug 3937317 is fixed arch=`uname -m`

if [ “$arch” = “i686” -o “$arch” = “ia64” ]
then
LD_ASSUME_KERNEL=2.4.19
export LD_ASSUME_KERNEL
fi
#End workaround
在fi 后新添加一行:
unset LD_ASSUME_KERNEL
然后编辑srvctl文件,找到如下内容:
LD_ASSUME_KERNEL=2.4.19
export LD_ASSUME_KERNEL
同样在其后新增加一行: unset LD_ASSUME_KERNEL

同时需要你改的还有节点1上的$ORACLE_HOME/bin/srvctl 文件,不然等装完数据库之后,srvctl 命令也是会报这个错误地。

再次执行root.sh脚本报错:Failed to upgrade Oracle Cluster Registry configuration
[root@rac2 crs]#./u01/app/oracle/product/10.2.0/crs/root.sh
WARNING: directory ‘/u01/app/oracle/product’ is not owned by root
WARNING: directory ‘/u01/app/oracle’ is not owned by root
WARNING: directory ‘/u01/app’ is not owned by root
WARNING: directory ‘/u01’ is not owned by root
Checking to see if Oracle CRS stack is already configured

Setting the permissions on OCR backup directory
Setting up NS directories
PROT-1: Failed to initialize ocrconfig
Failed to upgrade Oracle Cluster Registry configuration
一般出现这个问题的原因有几种:

(1) Bug
这种情况参考:
Executing root.sh errors with “Failed To Upgrade Oracle Cluster Registry Configuration” [ID 466673.1]
(2) raw 设备的权限问题
[root@rac2 /]# ls -lrt /dev/raw/
total 0
crw-rw—- 1 oracle oinstall 162, 4 Nov 10 23:34 raw4
crw-rw—- 1 oracle oinstall 162, 3 Nov 10 23:34 raw3
crw-rw—- 1 root oinstall 162, 2 Nov 10 23:34 raw2
crw-rw—- 1 root oinstall 162, 1 Nov 10 23:34 raw1

权限要设置成如下:
chown root.oinstall /dev/raw/raw[1-2]
chown oracle.oinstall /dev/raw/raw[3-5]

chmod 775 /dev/raw/raw[1-2]
chmod 775 /dev/raw/raw[3-5]

可以在udev的规则里设置

(3) raw 设备读写问题
这种情况参考:
Placement of Voting disk and OCR Files in Oracle RAC 10g and 11gR1 [ID 293819.1]
清空一下RAW:
[root@rac2 raw]# dd if=/dev/zero of=/dev/mapper/crsp6 bs=1M count=200

我这里是属于bug用clsfmt检查raw 设备
[root@rac2 bin]# ./clsfmt.bin ocr /dev/raw/raw1
clsfmt: Received unexpected error 4 from skgfifi
skgfifi: Additional information: -2
Additional information: -1073741824

这里没有成功是bug
用Pathch中的新文件替换clsfmt.bin 文件
[root@rac2 4679769]# cp clsfmt.bin /u01/app/oracle/product/10.2.0/crs/bin/

再次测试:
[root@rac2 bin]# ./clsfmt.bin ocr /dev/raw/raw1
clsfmt: successfully initialized file /dev/raw/raw1
[root@rac2 bin]# ./clsfmt.bin ocr /dev/raw/raw2
clsfmt: successfully initialized file /dev/raw/raw2
[root@rac2 bin]# ./clsfmt.bin ocr /dev/raw/raw3
clsfmt: successfully initialized file /dev/raw/raw3
[root@rac2 bin]# ./clsfmt.bin ocr /dev/raw/raw4
clsfmt: successfully initialized file /dev/raw/raw4

[root@jy2 bin]# ./clsfmt.bin ocr /dev/raw/raw1
clsfmt: successfully initialized file /dev/raw/raw1
[root@jy2 bin]# ./clsfmt.bin ocr /dev/raw/raw2
clsfmt: successfully initialized file /dev/raw/raw2
[root@jy2 bin]# ./clsfmt.bin ocr /dev/raw/raw3
clsfmt: successfully initialized file /dev/raw/raw3
[root@jy2 bin]# ./clsfmt.bin ocr /dev/raw/raw4
clsfmt: successfully initialized file /dev/raw/raw4

再次执行root.sh报如上错误:
Error 0(Native: listNetInterfaces:[3])
[Error 0(Native: listNetInterfaces:[3])]
You have mail in /var/spool/mail/root
Error 0(Native: listNetInterfaces:[3])
[Error 0(Native: listNetInterfaces:[3])]

解决方式如下:
[root@rac2 crs]# cd bin
[root@rac2 bin]# ./oifcfg iflist
eth0 192.168.1.0
eth1 192.168.1.0
[root@rac2 bin]# ./oifcfg setif -global eth0/192.168.1.0:public
[root@rac2 bin]# ./oifcfg setif -global eth1/192.168.1.0:cluster_interconnect
[root@rac2 bin]# ./oifcfg getif
eth0 192.168.1.0 global public
eth1 192.168.1.0 global cluster_interconnect

补丁文件:
p4679769_10201_LINUX

p4679769_10201_Linux-x86-64

p4679769_10202_LINUX-zSer

hp-ux利用rman将数据库跨平台迁移到aix平台上

这次因为要迁移的数据库大小并不大但也不是只有几十G所以选择使用rman将生产数据库从hp-ux迁移到aix上,数据库是10.2.0.4
1.操作环境
1.1 源生产环境
数据库名 CAIWU
实例名 CAIWU
DBID 338270341
数据库 HP-UX 10.2.0.4.0 – 64bit Production
数据文件目录 /cwjc_data/CAIWU
文本初始化参数文件:/oracle/product/10.2.0/dbs/init/initCAIWU.ora

1.2 目标生产环境

数据库名 CAIWU
实例名 CAIWU
DBID 338270341
数据库 HP-UX 10.2.0.4.0 – 64bit Production
数据文件目录 /yl_oradata/CAIWU
文本初始化参数文件:/oracle/product/10.2.0/db_1/dbs/init/initCAIWU.ora

2 备份
2.1 备份注意事项
1. 迁移库只能在同字节顺序平台之间(即ENDIAN_FORMAT相同)且得是ORACLE所支持的平台,可以在V$TRANSPORTABLE_PLATFORM中查看。
2. 传输前需要将源库置为只读。
3. 生成的2个脚本文件要仔细根据需要修改。

2.2 源库操作
2.2.1 将库只读,并检查是否支持迁移
一.将库只读,并检查是否支持迁移

[rx6600-2:oracle:/cwjc_data]$sqlplus / as sysdba

SQL*Plus: Release 10.2.0.4.0 - Production on Mon Sep 29 22:34:24 2014

Copyright (c) 1982, 2007, Oracle.  All Rights Reserved.


Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> shutdown immediate
Database closed.
Database dismounted.

SQL> startup mount
ORACLE instance started.

Total System Global Area 3439329280 bytes
Fixed Size                  2060008 bytes
Variable Size            2348810520 bytes
Database Buffers         1073741824 bytes
Redo Buffers               14716928 bytes
Database mounted.
SQL> alter database open read only;

Database altered.

2.2.2 检查支持迁移平台

SQL> select * from v$transportable_platform;

PLATFORM_ID PLATFORM_NAME                          ENDIAN_FORMAT
----------- -------------------------------------- --------------
          1 Solaris[tm] OE (32-bit)                Big
          2 Solaris[tm] OE (64-bit)                Big
          7 Microsoft Windows IA (32-bit)          Little
         10 Linux IA (32-bit)                      Little
          6 AIX-Based Systems (64-bit)             Big
          3 HP-UX (64-bit)                         Big
          5 HP Tru64 UNIX                          Little
          4 HP-UX IA (64-bit)                      Big
         11 Linux IA (64-bit)                      Little
         15 HP Open VMS                            Little
          8 Microsoft Windows IA (64-bit)          Little
          9 IBM zSeries Based Linux                Big
         13 Linux x86 64-bit                       Little
         16 Apple Mac OS                           Big
         12 Microsoft Windows x86 64-bit           Little
         17 Solaris Operating System (x86)         Little
         18 IBM Power Based Linux                  Big
         20 Solaris Operating System (x86-64)      Little
         19 HP IA Open VMS                         Little

19 rows selected.

2.2.3 利用dbms_tdb 包检查数据库
二.利用dbms_tdb 包检查数据库能否被传输以及列出外部表和DIRECTORY 等无法传输的对象信息,
分别在hp-ux平台数据库通过SQLPLUS执行以下两个存储过程

SQL> set serveroutput on
SQL> declare
  2   db_ready boolean;
  3   begin
  4   /* db_ready is ignored, but with SERVEROUTPUT set to ON any
  5   * conditions preventing transport will be output to console */
  6   db_ready := dbms_tdb.check_db('AIX-Based Systems (64-bit)',
  7   dbms_tdb.skip_none);
  8   end;
  9   /

PL/SQL procedure successfully completed.

SQL>  declare
  2   external boolean;
  3   begin
  4   /* value of external is ignored, but with SERVEROUTPUT set to ON
  5   * dbms_tdb.check_external displays report of external objects
  6   * on console */
  7   external := dbms_tdb.check_external;
  8  end;
  9  /
The following directories exist in the database:
SYS.DUMP_CAIWU, SYS.DATA_PUMP_DIR, SYS.WORK_DIR, SYS.ADMIN_DIR, SYS.ORACLE_OCM_CONFIG_DIR

PL/SQL procedure successfully completed.

2.2.4 RMAN ConvertDB
连接target database 查看数据库打开模式

[rx6600-2:oracle:/home/oracle]$sqlplus / as sysdba

SQL*Plus: Release 10.2.0.4.0 - Production on Mon Sep 29 22:46:30 2014

Copyright (c) 1982, 2007, Oracle.  All Rights Reserved.


Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> select open_mode from v$database;

OPEN_MODE
----------
READ ONLY

利用RMAN转换数据库

RMAN> convert database new database 'CAIWU'
2> transport script '/cwjc_data/CAIWURMAN/caiwu.sql'
3> to platform 'AIX-Based Systems (64-bit)'
4> db_file_name_convert '/cwjc_data/CAIWU/','/cwjc_data/CAIWURMAN/';

Starting convert at 29-SEP-14
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=539 devtype=DISK

Directory SYS.DUMP_CAIWU found in the database
Directory SYS.DATA_PUMP_DIR found in the database
Directory SYS.WORK_DIR found in the database
Directory SYS.ADMIN_DIR found in the database
Directory SYS.ORACLE_OCM_CONFIG_DIR found in the database

User SYS with SYSDBA and SYSOPER privilege found in password file
channel ORA_DISK_1: starting datafile conversion
input datafile fno=00005 name=/cwjc_data/CAIWU/zwdata01.dbf
converted datafile=/cwjc_data/CAIWURMAN/zwdata01.dbf
channel ORA_DISK_1: datafile conversion complete, elapsed time: 00:06:25
channel ORA_DISK_1: starting datafile conversion
input datafile fno=00006 name=/cwjc_data/CAIWU/zwdata02.dbf
converted datafile=/cwjc_data/CAIWURMAN/zwdata02.dbf
channel ORA_DISK_1: datafile conversion complete, elapsed time: 00:06:25
channel ORA_DISK_1: starting datafile conversion
input datafile fno=00001 name=/cwjc_data/CAIWU/system01.dbf
converted datafile=/cwjc_data/CAIWURMAN/system01.dbf
channel ORA_DISK_1: datafile conversion complete, elapsed time: 00:03:25
channel ORA_DISK_1: starting datafile conversion
input datafile fno=00002 name=/cwjc_data/CAIWU/undotbs01.dbf
converted datafile=/cwjc_data/CAIWURMAN/undotbs01.dbf
channel ORA_DISK_1: datafile conversion complete, elapsed time: 00:01:45
channel ORA_DISK_1: starting datafile conversion
input datafile fno=00003 name=/cwjc_data/CAIWU/sysaux01.dbf
converted datafile=/cwjc_data/CAIWURMAN/sysaux01.dbf
channel ORA_DISK_1: datafile conversion complete, elapsed time: 00:01:45
channel ORA_DISK_1: starting datafile conversion
input datafile fno=00004 name=/cwjc_data/CAIWU/users01.dbf
converted datafile=/cwjc_data/CAIWURMAN/users01.dbf
channel ORA_DISK_1: datafile conversion complete, elapsed time: 00:00:55
Run SQL script /cwjc_data/CAIWURMAN/caiwu.sql on the target platform to create database
Edit init.ora file /oracle/product/10.2.0/dbs/init_00pjoiqb_1_0.ora. This PFILE will be used to create the database on the target platform
To recompile all PL/SQL modules, run utlirp.sql and utlrp.sql on the target platform
To change the internal database identifier, use DBNEWID Utility
Finished backup at 29-SEP-14

将转换后的文件传输到目标主机上

[rx6600-2:oracle:/cwjc_data/CAIWURMAN]$ls -lrt
total 53002438
-rw-r-----   1 oracle     dba        8589942784 Sep 29 22:57 zwdata01.dbf
-rw-r-----   1 oracle     dba        8589942784 Sep 29 23:03 zwdata02.dbf
-rw-r-----   1 oracle     dba        4588576768 Sep 29 23:07 system01.dbf
-rw-r-----   1 oracle     dba        2147491840 Sep 29 23:08 undotbs01.dbf
-rw-r-----   1 oracle     dba        2147491840 Sep 29 23:10 sysaux01.dbf
-rw-r-----   1 oracle     dba        1073750016 Sep 29 23:11 users01.dbf
-rw-r--r--   1 oracle     dba           2704 Sep 29 23:11 caiwu.sql

ftp> get zwdata01.dbf
200 PORT command successful.
150 Opening BINARY mode data connection for zwdata01.dbf (8589942784 bytes).
226 Transfer complete.
8589942784 bytes received in 197.5 seconds (4.248e+04 Kbytes/s)
local: zwdata01.dbf remote: zwdata01.dbf
ftp> get zwdata02.dbf
200 PORT command successful.
150 Opening BINARY mode data connection for zwdata02.dbf (8589942784 bytes).
226 Transfer complete.
8589942784 bytes received in 202.7 seconds (4.138e+04 Kbytes/s)
local: zwdata02.dbf remote: zwdata02.dbf
ftp> get system01.dbf
200 PORT command successful.
150 Opening BINARY mode data connection for system01.dbf (4588576768 bytes).
226 Transfer complete.
4588576768 bytes received in 111.3 seconds (4.027e+04 Kbytes/s)
local: system01.dbf remote: system01.dbf
ftp> get undotbs01.dbf
200 PORT command successful.
150 Opening BINARY mode data connection for undotbs01.dbf (2147491840 bytes).
226 Transfer complete.
2147491840 bytes received in 52.55 seconds (3.99e+04 Kbytes/s)
local: undotbs01.dbf remote: undotbs01.dbf
ftp> get sysaux01.dbf
200 PORT command successful.
150 Opening BINARY mode data connection for sysaux01.dbf (2147491840 bytes).
226 Transfer complete.
2147491840 bytes received in 51.29 seconds (4.089e+04 Kbytes/s)
local: sysaux01.dbf remote: sysaux01.dbf
ftp> get users01.dbf
200 PORT command successful.
150 Opening BINARY mode data connection for users01.dbf (1073750016 bytes).
226 Transfer complete.
1073750016 bytes received in 26.01 seconds (4.032e+04 Kbytes/s)
local: users01.dbf remote: users01.dbf
ftp> get caiwu.sql
200 PORT command successful.
150 Opening BINARY mode data connection for caiwu.sql (2704 bytes).
226 Transfer complete.
2704 bytes received in 9e-05 seconds (2.934e+04 Kbytes/s)
local: caiwu.sql remote: caiwu.sql
ftp> get init_00pjoiqb_1_0.ora
200 PORT command successful.
150 Opening BINARY mode data connection for init_00pjoiqb_1_0.ora (1556 bytes).
226 Transfer complete.
1556 bytes received in 8.7e-05 seconds (1.747e+04 Kbytes/s)
local: init_00pjoiqb_1_0.ora remote: init_00pjoiqb_1_0.ora


[IBMP740-2:oracle:/yl_oradata/CAIWU]$ls -lrt
total 53002408
-rw-r--r--    1 oracle   dba      8589942784 Sep 29 23:02 zwdata01.dbf
-rw-r--r--    1 oracle   dba      8589942784 Sep 29 23:06 zwdata02.dbf
-rw-r--r--    1 oracle   dba      4588576768 Sep 29 23:08 system01.dbf
-rw-r--r--    1 oracle   dba      2147491840 Sep 29 23:09 undotbs01.dbf
-rw-r--r--    1 oracle   dba      2147491840 Sep 29 23:11 sysaux01.dbf
-rw-r--r--    1 oracle   dba      1073750016 Sep 29 23:11 users01.dbf
-rw-r--r--    1 oracle   dba            2704 Sep 29 23:12 caiwu.sql
-rw-r--r--    1 oracle   dba            1556 Sep 29 23:12 init_00pjoiqb_1_0.ora

将init_00pjoiqb_1_0.ora参数文件复制到$ORACLE_HOME/dbs目录下

[IBMP740-2:oracle:/yl_oradata/CAIWU]$ cp init_00pjoiqb_1_0.ora $ORACLE_HOME/dbs

2.2.5 在目标生产机器上修改pfile参数文件并将参数文件init_00pjoiqb_1_0.ora重命名为initCAIWU.ora

IBMP740-2:oracle:/oracle/product/10.2.0/db_1/dbs]$mv init_00pjoiqb_1_0.ora initCAIWU.ora

修改前参数文件内容如下:

[IBMP740-2:oracle:/oracle/product/10.2.0/db_1/dbs]$vi initCAIWU.ora
"initCAIWU.ora" 68 lines, 1556 characters 

  __shared_pool_size       = 2164260864

  __large_pool_size        = 16777216

  __java_pool_size         = 16777216

  __streams_pool_size      = 150994944

  __db_cache_size          = 1073741824
# Please change the values of the following parameters:

  control_files            ="/yl_oradata/CAIWU/control01.ctl","/yl_oradata/CAIWU/control02.ctl","/yl_oradata/CAIWU/control03.ctl"

  db_recovery_file_dest    = "/oracle/flash_recovery_area"

  db_recovery_file_dest_size= 2147483648

  background_dump_dest     = "/oracle/admin/CAIWU/bdump"

  user_dump_dest           = "/oracle/admin/CAIWU/udump"

  core_dump_dest           = "/oracle/admin/CAIWU/cdump"

  audit_file_dest          = "/oracle/admin/CAIWU/adump"

  db_name                  = "CAIWU"



# Please review the values of the following parameters:

  remote_login_passwordfile= "EXCLUSIVE"

  db_domain                = ""

  dispatchers              = "(PROTOCOL=TCP) (SERVICE=CAIWUXDB)"



# The values of the following parameters are from source database:

  processes                = 500

  sessions                 = 555

  sga_target               = 3439329280

  db_block_size            = 8192

  compatible               = "10.2.0.3.0"

# log_archive_dest_1       = "LOCATION=/yl_oradata/arch/CAIWU/"

  log_archive_format       = "%t_%s_%r.dbf"

  db_file_multiblock_read_count= 16

  undo_management          = "AUTO"

  undo_tablespace          = "UNDOTBS1"

  job_queue_processes      = 10

  open_cursors             = 300

  pga_aggregate_target     = 858783744

修改后参数文件内容如下:

[IBMP740-2:oracle:/oracle/product/10.2.0/db_1/dbs]$cat initCAIWU.ora    
# Please change the values of the following parameters:

  control_files            ="/yl_oradata/CAIWU/control01.ctl","/yl_oradata/CAIWU/control02.ctl","/yl_oradata/CAIWU/control03.ctl" 

  db_recovery_file_dest    = "/oracle/flash_recovery_area"

  db_recovery_file_dest_size= 2147483648

  background_dump_dest     = "/oracle/admin/CAIWU/bdump"

  user_dump_dest           = "/oracle/admin/CAIWU/udump"

  core_dump_dest           = "/oracle/admin/CAIWU/cdump"

  audit_file_dest          = "/oracle/admin/CAIWU/adump"

  db_name                  = "CAIWU"



# Please review the values of the following parameters:

  remote_login_passwordfile= "EXCLUSIVE"

  db_domain                = ""

  dispatchers              = "(PROTOCOL=TCP) (SERVICE=CAIWUXDB)"



# The values of the following parameters are from source database:

  processes                = 500

  sessions                 = 555

  sga_target               = 3439329280

  db_block_size            = 8192

  compatible               = "10.2.0.3.0"

# log_archive_dest_1       = "LOCATION=/yl_oradata/arch/CAIWU/"

  log_archive_format       = "%t_%s_%r.dbf"

  db_file_multiblock_read_count= 16

  undo_management          = "AUTO"

  undo_tablespace          = "UNDOTBS1"

  job_queue_processes      = 10

  open_cursors             = 300

  pga_aggregate_target     = 858783744

2.2.6 修改重建控制文件的脚本caiwu.sql
修改后的caiwu.sql内容如下:

[IBMP740-2:oracle:/yl_oradata/CAIWU]$cat caiwu.sql
-- The following commands will create a new control file and use it
-- to open the database.
-- Data used by Recovery Manager will be lost.
-- The contents of online logs will be lost and all backups will
-- be invalidated. Use this only if online logs are damaged.

-- After mounting the created controlfile, the following SQL
-- statement will place the database in the appropriate
-- protection mode:
--  ALTER DATABASE SET STANDBY DATABASE TO MAXIMIZE PERFORMANCE

STARTUP NOMOUNT PFILE='/oracle/product/10.2.0/db_1/dbs/initCAIWU.ora'
CREATE CONTROLFILE REUSE SET DATABASE "CAIWU" RESETLOGS  ARCHIVELOG
    MAXLOGFILES 16
    MAXLOGMEMBERS 3
    MAXDATAFILES 100
    MAXINSTANCES 8
    MAXLOGHISTORY 292
LOGFILE
  GROUP 1 '/yl_oradata/CAIWU/redo01.log'  SIZE 512M,
  GROUP 2 '/yl_oradata/CAIWU/redo02.log'  SIZE 512M,
  GROUP 3 '/yl_oradata/CAIWU/redo03.log'  SIZE 512M
DATAFILE
  '/yl_oradata/CAIWU/system01.dbf',
  '/yl_oradata/CAIWU/undotbs01.dbf',
  '/yl_oradata/CAIWU/sysaux01.dbf',
  '/yl_oradata/CAIWU/users01.dbf',
  '/yl_oradata/CAIWU/zwdata01.dbf',
  '/yl_oradata/CAIWU/zwdata02.dbf'
CHARACTER SET ZHS16GBK
;

-- Database can now be opened zeroing the online logs.
ALTER DATABASE OPEN RESETLOGS;

-- Commands to add tempfiles to temporary tablespaces.
-- Online tempfiles have complete space information.
-- Other tempfiles may require adjustment.
ALTER TABLESPACE TEMP ADD TEMPFILE '/yl_oradata/CAIWU/temp01.dbf'
     SIZE 2048M AUTOEXTEND ON NEXT 655360  MAXSIZE 32767M;
-- End of tempfile additions.
--

set echo off
prompt ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
prompt * Your database has been created successfully!
prompt * There are many things to think about for the new database. Here
prompt * is a checklist to help you stay on track:
prompt * 1. You may want to redefine the location of the directory objects.
prompt * 2. You may want to change the internal database identifier (DBID) 
prompt *    or the global database name for this database. Use the 
prompt *    NEWDBID Utility (nid).
prompt ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

SHUTDOWN IMMEDIATE 
STARTUP UPGRADE PFILE='/oracle/product/10.2.0/db_1/dbs/initCAIWU.ora'
@$ORACLE_HOME/rdbms/admin/utlirp.sql 
SHUTDOWN IMMEDIATE 
STARTUP PFILE='/oracle/product/10.2.0/db_1/dbs/initCAIWU.ora'
-- The following step will recompile all PL/SQL modules.
-- It may take serveral hours to complete.
@$ORACLE_HOME/rdbms/admin/utlrp.sql 
set feedback 6;

2.3 转换数据库
2.3.1 创建实例CAIWU执行转换脚本

SQL> STARTUP NOMOUNT PFILE='/oracle/product/10.2.0/db_1/dbs/initCAIWU.ora'
ORACLE instance started.

Total System Global Area 3439329280 bytes
Fixed Size                  2087840 bytes
Variable Size             671089760 bytes
Database Buffers         2717908992 bytes
Redo Buffers               48242688 bytes
SQL> CREATE CONTROLFILE REUSE SET DATABASE "CAIWU" RESETLOGS  ARCHIVELOG
  2      MAXLOGFILES 16
  3      MAXLOGMEMBERS 3
  4      MAXDATAFILES 100
  5      MAXINSTANCES 8
  6      MAXLOGHISTORY 292
  7  LOGFILE
  8    GROUP 1 '/yl_oradata/CAIWU/redo01.log'  SIZE 512M,
  9    GROUP 2 '/yl_oradata/CAIWU/redo02.log'  SIZE 512M,
 10    GROUP 3 '/yl_oradata/CAIWU/redo03.log'  SIZE 512M
 11  DATAFILE
 12    '/yl_oradata/CAIWU/system01.dbf',
 13    '/yl_oradata/CAIWU/undotbs01.dbf',
 14    '/yl_oradata/CAIWU/sysaux01.dbf',
 15    '/yl_oradata/CAIWU/users01.dbf',
 16    '/yl_oradata/CAIWU/zwdata01.dbf',
 17    '/yl_oradata/CAIWU/zwdata02.dbf'
 18  CHARACTER SET ZHS16GBK
 19  ;

Control file created.

SQL> ALTER DATABASE OPEN RESETLOGS;

Database altered.

SQL> ALTER TABLESPACE TEMP ADD TEMPFILE '/yl_oradata/CAIWU/temp01.dbf'
  2       SIZE 2048M AUTOEXTEND ON NEXT 655360  MAXSIZE 32767M;

Tablespace altered.

SQL> SHUTDOWN IMMEDIATE 
Database closed.
Database dismounted.
ORACLE instance shut down.

STARTUP UPGRADE PFILE='/oracle/product/10.2.0/db_1/dbs/initCAIWU.ora'
SQL>@$ORACLE_HOME/rdbms/admin/utlirp.sql

.....

SQL> 
SQL> Rem Continue even if there are SQL errors
SQL> WHENEVER SQLERROR CONTINUE;
SQL> 
SQL> Rem ===========================================================================
SQL> Rem END utlip.sql
SQL> Rem ===========================================================================
SQL> 
SQL> DOC
DOC>#######################################################################
DOC>#######################################################################
DOC>   utlirp.sql completed successfully. All PL/SQL objects in the
DOC>   database have been invalidated.
DOC>
DOC>   Shut down and restart the database in normal mode and run utlrp.sql to
DOC>   recompile invalid objects.
DOC>#######################################################################
DOC>#######################################################################
DOC># 

SQL> SHUTDOWN IMMEDIATE
Database closed.
Database dismounted.
ORACLE instance shut down

SQL> STARTUP PFILE='/oracle/product/10.2.0/db_1/dbs/initCAIWU.ora'
ORACLE instance started.

Total System Global Area 3439329280 bytes
Fixed Size                  2087840 bytes
Variable Size             671089760 bytes
Database Buffers         2717908992 bytes
Redo Buffers               48242688 bytes
Database mounted.
Database opened.


SQL>@$ORACLE_HOME/rdbms/admin/utlrp.sql
.....



SQL> 
SQL> 
SQL> Rem =====================================================================
SQL> Rem Run component validation procedure
SQL> Rem =====================================================================
SQL> 
SQL> SET serveroutput on
SQL> EXECUTE dbms_registry_sys.validate_components;

PL/SQL procedure successfully completed.

SQL> SET serveroutput off
SQL> 
SQL> 
SQL> Rem ===========================================================================
SQL> Rem END utlrp.sql
SQL> Rem ===========================================================================
SQL> 

2.3.2 用system用户和密码进行登录来验证

SQL> conn system/ufgov
Connected.

可以登录
2.3.3 表空间验证,查询迁移后的表空间及文件

SQL> select tablespace_name,file_id from dba_data_files;

TABLESPACE_NAME                   FILE_ID
------------------------------ ----------
ZWDATA                                  6
ZWDATA                                  5
USERS                                   4
SYSAUX                                  3
UNDOTBS1                                2
SYSTEM                                  1

6 rows selected.

2.3.4 对数据库进行DML操作验证

SQL> create table test (i int);

Table created.

SQL> insert into test values (1);

1 row created.

SQL> select * from test;

         I
----------
         1

SQL> delete from test;

1 row deleted.

SQL> drop table test;

Table dropped.

2.3.5 切换日志测试

SQL> alter system switch logfile;

System altered.

小结
1. 确保源库与目标数据库的字符集一致
2. 要确保源库与目标数据库的字节编码相同
3. 数据库迁移后要把无效的数据库对象重新编译,确保正确执行utlrp.sql,utlirp.sql
4. 数据库迁移后,确保应用系统用户可以正确连接,登陆有效
5. 监听要重新配置,检查临时表空间的属主与可用性
6. 迁移完成后要立即进行一次全备份

查询执行hash join出现ora-040300 (QERHJ hash-joi,kllcqas:kllsltba)的诊断案例

某公安系统开发人员在plsql中执行查询时报ORA-04030: 在尝试分配 123416 字节 (QERHJ hash-joi,kllcqas:kllsltba) 时进程内存不足。数据库是aix上的rac 11.2.0.4。当时朋友找到我,我当时给他的建议是直接把报错的查询直接在数据库服务器上执行看是否报错,因为在服务器上用sqlplus来执行是没有经过监听程序的,但后面给我的答复是在服务器上执行也报错。所以就远程连接来处理。

查看alert.log文件发现如下错误信息:

Errors in file /u01/app/oracle/diag/rdbms/test/test/incident/incdir_156708/test_ora_15859922_i156708.trc

ORA-04030: 在尝试分配 123416 字节 (QERHJ hash-joi,kllcqas:kllsltba) 时进程内存不足

从alert.log文件查看到的非缺省值参数信息如下 :

System parameters with non-default values:
  processes                = 1500
  sessions                 = 2272
  _realfree_heap_pagesize_hint= 256K
  _use_realfree_heap       = TRUE
  spfile                   = "+DATA_DG/xtrk/spfilextrk.ora"
  sga_target               = 14G
  control_files            = "+DATA_DG/xtrk/controlfile/current.261.851142581"
  control_files            = "+DATA_DG/xtrk/controlfile/current.260.851142581"
  db_block_size            = 8192
  _external_scn_rejection_threshold_hours= 1
  compatible               = "11.2.0.4.0"
  log_archive_format       = "%t_%s_%r.dbf"
  cluster_database         = TRUE
  db_create_file_dest      = "+DATA_DG"
  db_recovery_file_dest    = "+DATA_DG"
  db_recovery_file_dest_size= 400G
  thread                   = 1
  undo_tablespace          = "UNDOTBS1"
  instance_number          = 1
  remote_login_passwordfile= "EXCLUSIVE"
  db_domain                = ""
  dispatchers              = "(PROTOCOL=TCP) (SERVICE=xtrkXDB)"
  remote_listener          = "test-scan:1521"
  audit_file_dest          = "/u01/app/oracle/admin/test/adump"
  audit_trail              = "DB"
  db_name                  = "test"
  open_cursors             = 600
  pga_aggregate_target     = 10G
  diagnostic_dest          = "/u01/app/oracle"

如是查看错误跟踪文件:

Dump continued from file: /u01/app/oracle/diag/rdbms/xtrk/xtrk1/trace/xtrk1_ora_15859922.trc
ORA-04030: 在尝试分配 123416 字节 (QERHJ hash-joi,kllcqas:kllsltba) 时进程内存不足

========= Dump for incident 156708 (ORA 4030) ========
----- Beginning of Customized Incident Dump(s) -----
=======================================
TOP 10 MEMORY USES FOR THIS PROCESS
---------------------------------------

*** 2014-08-27 15:07:09.388
77%   86 MB, 727 chunks: "kllcqas:kllsltba          "  SQL
         QERHJ hash-joi  ds=11160a378  dsprt=110bcb0f8
 7% 8449 KB,  80 chunks: "QERHJ Bit vector          "  SQL
         QERHJ hash-joi  ds=11160a378  dsprt=110bcb0f8
 6% 6771 KB,   4 chunks: "kllcqc:kllcqslt           "  SQL
         QERHJ hash-joi  ds=11160a378  dsprt=110bcb0f8
 5% 5880 KB, 407 chunks: "free memory               "  SQL
         QERHJ hash-joi  ds=11160a378  dsprt=110bcb0f8
 2% 2056 KB,   2 chunks: "kllcqgf:kllsltba          "  SQL
         kxs-heap-w      ds=110bcb0f8  dsprt=1109f0120
 1%  651 KB,  28 chunks: "permanent memory          "  
         pga heap        ds=110005210  dsprt=0
 0%  376 KB,   1 chunk : "kfk_kfkio_freeq           "  
         KFK_IO_SUBHEAP  ds=1108dae38  dsprt=110005210
 0%  336 KB,   1 chunk : "kfkosd_p                  "  
         KFK_IO_SUBHEAP  ds=1108dae38  dsprt=110005210
 0%  259 KB,   8 chunks: "free memory               "  
         pga heap        ds=110005210  dsprt=0
 0%  253 KB,   2 chunks: "free memory               "  
         top call heap   ds=11011f360  dsprt=0

从上面报错进程内存分配信息来看报错的chunks: “kllcqas:kllsltba ” 大小是86M

=======================================
PRIVATE MEMORY SUMMARY FOR THIS PROCESS
---------------------------------------
******************************************************
PRIVATE HEAP SUMMARY DUMP
111 MB total:
   110 MB commented, 652 KB permanent
   606 KB free (0 KB in empty extents),
     109 MB,   1 heap:    "session heap   "            92 KB free held
------------------------------------------------------
Summary of subheaps at depth 1
110 MB total:
   110 MB commented, 96 KB permanent
    94 KB free (0 KB in empty extents),
     108 MB,   1 heap:    "kxs-heap-w     "            84 KB free held
------------------------------------------------------
Summary of subheaps at depth 2
109 MB total:
   108 MB commented, 45 KB permanent
    32 KB free (4 KB in empty extents),
     106 MB,   4 heaps:   "QERHJ hash-joi "            10 KB free held
------------------------------------------------------
Summary of subheaps at depth 3
106 MB total:
   100 MB commented, 41 KB permanent
  5880 KB free (0 KB in empty extents),
      91 MB, 727 chunks:  "kllcqas:kllsltba          " 5257 KB free held
    8786 KB,  80 chunks:  "QERHJ Bit vector          " 336 KB free held

从111 MB total可知出错进程只分配了111M的内存

=========================================
REAL-FREE ALLOCATOR DUMP FOR THIS PROCESS
-----------------------------------------
 
Dump of Real-Free Memory Allocator Heap [0x110959bf0]
mag=0xfefe0001 flg=0x5000003 fds=0x0 blksz=262144
blkdstbl=0x110959c00, iniblk=8192 maxblk=524288 numsegs=15
In-use num=402 siz=116391936, Freeable num=0 siz=0, Free num=0 siz=0
 
==========================================
INSTANCE-WIDE PRIVATE MEMORY USAGE SUMMARY
------------------------------------------
 
Dumping Work Area Table (level=1)
=====================================
 
  Global SGA Info
  ---------------
 
    global target:    10240 MB
    auto target:       8997 MB
    max pga:           2047 MB
    pga limit:         4095 MB
    pga limit known:  0
    pga limit errors:     0
 
    pga inuse:          275 MB
    pga alloc:          343 MB
    pga freeable:        11 MB
    pga freed:         4150 MB
    pga to free:          0 
    broker request:       0
 
    pga auto:            32 MB
    pga manual:           0 MB
 
    pga alloc  (max):  3043 MB
    pga auto   (max):    32 MB
    pga manual (max):     0 MB
 
    # workareas     :     4
    # workareas(max):     5
 
 

================================
PER-PROCESS PRIVATE MEMORY USAGE
--------------------------------
 
Private memory usage per Oracle process

当出现ora-04030时该实例分配的总PGA大小如下

-------------------------
Top 10 processes:
-------------------------
(percentage is of 420 MB total allocated memory)
27% pid 51: 111 MB used of 112 MB allocated  < = CURRENT PROC
 5% pid 13: 18 MB used of 22 MB allocated 
 5% pid 14: 18 MB used of 22 MB allocated 
 5% pid 36: 18 MB used of 22 MB allocated 
 5% pid 38: 18 MB used of 22 MB allocated 
 5% pid 39: 18 MB used of 22 MB allocated 
 5% pid 40: 18 MB used of 22 MB allocated 
 4% pid 10: 16 MB used of 17 MB allocated 
 4% pid 12: 13 MB used of 17 MB allocated 
 4% pid 20: 12 MB used of 15 MB allocated 

从上面的percentage is of 420 MB total allocated memory可知这个实例的PGA只分配了420M。从27% pid 51: 111 MB used of 112 MB allocated < = CURRENT PROC可知当前进程分配了112M,使用了111M。而pid=51,那么接下来查看pid=51的详细会话信息:

------------------------------------
Begin session detail for pid 51
  sid: 430 ser: 5 audsid: 400022 user: 239/TC_RKXT
    flags: (0x45) USR/- flags_idl: (0x1) BSY/-/-/-/-/-
    flags2: (0x40009) -/-/INC
  pid: 51 O/S info: user: grid, term: UNKNOWN, ospid: 15859922
    image: oracle@xtrkdb1
  client details:
    O/S info: user: Administrator, term: RKXXWEB, ospid: 4120:2956
    machine: RKXX\RKXXWEB program: plsqldev.exe
    application name: PL/SQL Developer, hash value=1190136663
    action name: SQL Window - New, hash value=3399691616
  current SQL:
  SELECT
 b.master_relation 户关系, b.hu_id_new 户号,a.name 姓名,a.pid 身份证号码,b.when_in_logged 入户时间, b.in_category 入户理由, b.dob,
a.photo_flag 有无相片, substr(ORGNAME,7,10) 所属派出所,
b.quxcun_id,c.name 居委会名称
from t_person a, t_huji b, tc_jcyw.t_org o ,tc_jcyw.t_quxcun c
where a.person_id = b.person_id
      and b.org_id = o.sunit_code
      and b.quxcun_id=c.quxcun_id
      and b.zxbz = '0'
      and a.dob < =to_date( '1996.07.01','yyyy.mm.dd' )
and not Exists(select * from t_pid_accept m where m.iscancel is null and m.pid = a.pid )
End session detail for pid 51
------------------------------------
 
================
SWAP INFORMATION
----------------
swap info: free_mem = 22025.57M rsv = 128.00M 
           alloc = 44.48M avail = 32768.00M swap_free = 32723.52M
----- End of Customized Incident Dump(s) -----

*** 2014-08-27 15:07:09.417
dbkedDefDump(): Starting incident default dumps (flags=0x2, level=3, mask=0x0)
----- Current SQL Statement for this session (sql_id=1gbsquwh58w4k) -----
SELECT
 b.master_relation 户关系, b.hu_id_new 户号,a.name 姓名,a.pid 身份证号码,b.when_in_logged 入户时间, b.in_category 入户理由, b.dob,
a.photo_flag 有无相片, substr(ORGNAME,7,10) 所属派出所,
b.quxcun_id,c.name 居委会名称
from t_person a, t_huji b, tc_jcyw.t_org o ,tc_jcyw.t_quxcun c
where a.person_id = b.person_id
      and b.org_id = o.sunit_code
      and b.quxcun_id=c.quxcun_id
      and b.zxbz = '0'
      and a.dob <=to_date( '1996.07.01','yyyy.mm.dd' )
and not Exists(select * from t_pid_accept m where m.iscancel is null and m.pid = a.pid )

从上面的信息可知swap大小为32768M,SGA为14G,PGA为10G,当这个实例的总PGA为分配了420M,当这个会话进程只分配111M内存时不应该会报告ora-04030错误.这时我还是亲自在服务器上执行上面的报错查询发现不报错,而开始我要朋友做测试,他是要别人做的,但给我的答案是报错,看来还是得亲自测试。这种情况在MOS有一篇文章(ID 758131.1)就是描述这种情况的并且说在aix下是一个特定的问题。造成这个问题的原因可能是因为操作系统资源限制,说当CRS启动它的资源时是使用root用户可能会出现ora-04030错误,如果是资源通过sqlplus,lsnrctl来启动则没有操作系统资源限制。如果我查看/etc/security/limits文件内容如下:

# cat limits

*
* Sizes are in multiples of 512 byte blocks, CPU time is in seconds
*
* fsize      - soft file size in blocks
* core       - soft core file size in blocks
* cpu        - soft per process CPU time limit in seconds
* data       - soft data segment size in blocks
* stack      - soft stack segment size in blocks
* rss        - soft real memory usage in blocks
* nofiles    - soft file descriptor limit
* fsize_hard - hard file size in blocks
* core_hard  - hard core file size in blocks
* cpu_hard   - hard per process CPU time limit in seconds
* data_hard  - hard data segment size in blocks
* stack_hard - hard stack segment size in blocks
* rss_hard   - hard real memory usage in blocks
* nofiles_hard - hard file descriptor limit
*
* The following table contains the default hard values if the
* hard values are not explicitly defined:
*
*   Attribute        Value
*   ==========    ============
*   fsize_hard    set to fsize
*   cpu_hard      set to cpu
*   core_hard         -1
*   data_hard         -1
*   stack_hard      8388608 
*   rss_hard          -1
*   nofiles_hard      -1
*
* NOTE:  A value of -1 implies "unlimited"
*

default:
        fsize = 2097151
        core = 2097151
        cpu = -1
        data = 262144
#       rss = 65536
        rss = -1
        stack = 65536
        nofiles = 2000

root:
        fsize = -1
        data = -1
        stack = -1
        core = -1
        rss = -1
        nofiles = -1
        stack = -1
        stack_hard = -1
        data_hard = -1
        cpu = -1
        core_hard = -1
        cpu_hard = -1
        fsize_hard = -1

oracle:
        fsize = -1
        data = -1
        stack = -1
        core = -1
        rss = -1
        nofiles = -1
        stack = -1
        stack_hard = -1
        data_hard = -1
        cpu = -1
        core_hard = -1
        cpu_hard = -1
        fsize_hard = -1

发现我朋友已经修改了root用户和oracle用户的权限但为什么还是报ora-04030错误了。原因为在oracle 11g rac安装不像以前的版本,现在安装Grid Infrastructure 和Oracle数据库软件推荐使用不同的用户来安装。通过查看这里用grid用户来安装Grid Infrastructure ,oracle用户来安装的oracle数据库软。下面的limits文件朋友只修改了root,oracle两个用户,但并没有修改grid用户对操作系统资源的使用限制。而在rac中,监听程序是集群软件来启动的也就是由grid用户来管理的。由于修改grid用户对操作系统资源的使用限制由于在limits文件中增加以下内容:

grid:
        fsize = -1
        data = -1
        stack = -1
        core = -1
        rss = -1
        nofiles = -1
        stack = -1
        stack_hard = -1
        data_hard = -1
        cpu = -1
        core_hard = -1
        cpu_hard = -1
        fsize_hard = -1

由于重启rac,在重启之后再执行上面报错语句一切正常了,造成这个问题的根本原因是没有修改oracle,grid用户对操作系统资源的使用限制所造成的,而这个问题应该是可以避免的,因为对这种操作系统资源的使用限制其实在安装rac之前是要做修改的在安装文档中也是有的,所以安装也是需要仔细阅读官方文档的。

oracle IO性能问题故障诊断案例

一业务系统在白天业务时间出现了严重了IO性能问题,下面是下午业务高峰时间(3-5)的awr报告
1

从等待事件来看主要都是与IO相关
2

3

4

从上面可以看到除了几个语句的逻辑读很高,其实物理不是很高,每秒产生的重做日志以及物理读也不高.

检查磁盘IO

rx6600-1:[/]#sar -d 1 10

HP-UX rx6600-1 B.11.23 U ia64    07/15/14

16:18:45   device   %busy   avque   r+w/s  blks/s  avwait  avserv
16:18:46  c39t0d3  100.00    0.50      18    1130    0.00  132.11
          c41t0d3   83.50    0.50       6     450    0.00  290.78
16:18:47   c3t0d0    0.99    0.50       2      63    0.00    7.12
          c39t0d3   91.09    0.50      10     982    0.00  115.53
          c41t0d3  100.00    0.50      12     586    0.00  291.67
16:18:48   c3t0d0    3.03    0.50       2      32    0.00   15.93
          c39t0d3  100.00    0.50       9    1034    0.00  139.76
          c41t0d3   92.93    0.50       7     388    0.00  310.07
16:18:49   c3t0d0    2.00    0.50       4      64    0.00   19.59
          c39t0d3  100.00    0.50      12    1088    0.00  127.33
          c41t0d3   86.00    0.50       8     416    0.00  251.32
16:18:50   c3t0d0    1.01    0.50       1       2    0.00    8.99
          c39t0d3  100.00    0.50      16     954    0.00  117.10
          c41t0d3  100.00    0.50       9     614    0.00  295.52
16:18:51   c3t0d0    0.99    0.50       1       8    0.00   10.60
          c39t0d3   93.07    0.50      17     913    0.00  110.59
          c41t0d3  100.00    0.50       9     350    0.00  326.92
16:18:52  c39t0d3  100.00    0.50      21    1168    0.00  127.22
          c41t0d3   88.00    0.50      11     544    0.00  252.08
16:18:53   c3t0d0    2.02    0.50       3      48    0.00   18.51
          c39t0d3   88.89    0.50      19    1164    0.00   98.25
          c41t0d3  100.00    0.50      11     630    0.00  324.39
16:18:54   c3t0d0    3.00    0.50       3      20    0.00   12.39
          c39t0d3   95.00    0.50      20     954    0.00  131.90
          c41t0d3   81.00    0.50       9     610    0.00  289.05
16:18:55   c3t0d0    9.00    0.50      11     134    0.00    8.62
          c39t0d3  100.00    0.50      19    1090    0.00  137.20
          c41t0d3  100.00    0.50      11     512    0.00  327.16

Average   c39t0d3   99.50    0.50      16    1048    0.00  123.38
Average   c41t0d3  100.00    0.50       9     510    0.00  296.44
Average    c3t0d0    2.20    0.50       3      37    0.00   12.28
rx6600-1:[/]#sar -d 1 10

HP-UX rx6600-1 B.11.23 U ia64    07/15/14

16:20:04   device   %busy   avque   r+w/s  blks/s  avwait  avserv
16:20:05   c3t0d0    1.00    0.50       1      16    0.00    8.33
          c39t0d3   98.00    0.50      16     928    0.00  114.86
          c41t0d3   98.00    0.50      10     684    0.00  266.43
16:20:06   c3t0d0    1.98    0.50       4      81    0.00    8.57
          c39t0d3   93.07    0.50      19    1251    0.00  128.81
          c41t0d3   91.09    0.50       6     475    0.00  365.83
16:20:07   c3t0d0    2.00    0.50       3      48    0.00    5.87
          c39t0d3   98.00    0.50      23    1216    0.00  113.66
          c41t0d3   98.00    0.50       8     576    0.00  307.92
16:20:08   c3t0d0    1.00    0.50       2      32    0.00    5.36
          c39t0d3  100.00    0.50      21    1132    0.00  118.47
          c41t0d3  100.00    0.50       7     592    0.00  300.71
16:20:09   c3t0d0    6.00    0.58      13     194    2.22   26.05
          c39t0d3   89.00    0.50      17    1152    0.00  123.54
          c41t0d3   87.00    0.50       8     512    0.00  298.26
16:20:10   c3t0d0    3.00    0.50       6      96    0.00   22.78
          c39t0d3   85.00    0.50      17    1136    0.00  114.79
          c41t0d3   98.00    0.50       9     592    0.00  252.52
16:20:11   c3t0d0    1.00    0.50       1       2    0.00    8.04
          c39t0d3  100.00    0.50      17    1216    0.00  138.04
          c41t0d3  100.00    0.50      12     672    0.00  291.69
16:20:12   c3t0d0    2.00    0.50       3      34    0.00    9.24
          c39t0d3   99.00    0.50      16    1024    0.00  122.11
          c41t0d3   88.00    0.50       9     476    0.00  299.79
16:20:13  c39t0d3   91.00    0.50      18    1024    0.00  111.77
          c41t0d3   92.00    0.50       3     384    0.00  396.25
16:20:14  c39t0d3   99.00    0.50      17     892    0.00  132.15
          c41t0d3  100.00    0.50      10     608    0.00  233.54

Average    c3t0d0    1.80    0.53       3      50    0.87   17.64
Average   c39t0d3   96.00    0.50      18    1097    0.00  121.54
Average   c41t0d3  100.00    0.50       8     557    0.00  290.35

在业务人员下班后重启的双机软件,但在启动数据库时停在了Completed redo application这一步

SQL> startup
ORACLE instance started.

Total System Global Area 1.0318E+10 bytes
Fixed Size                  2073176 bytes
Variable Size            3238006184 bytes
Database Buffers         7063207936 bytes
Redo Buffers               14700544 bytes
Database mounted.

从alert.log文件中可以看到如下信息:

Tue Jul 15 22:23:29 2014
Starting ORACLE instance (normal)
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
Picked latch-free SCN scheme 3
Autotune of undo retention is turned on. 
IMODE=BR
ILAT =61
LICENSE_MAX_USERS = 0
SYS auditing is disabled
ksdpec: called for event 13740 prior to event group initialization
Starting up ORACLE RDBMS Version: 10.2.0.4.0.
System parameters with non-default values:
  processes                = 500
  sessions                 = 555
  __shared_pool_size       = 3154116608
  __large_pool_size        = 16777216
  __java_pool_size         = 33554432
  __streams_pool_size      = 33554432
  sga_target               = 10317987840
  control_files            = /sx_data/ORCL/control01.ctl, /sx_data/ORCL/control02.ctl, /sx_data/ORCL/control03.ctl
  db_block_size            = 8192
  __db_cache_size          = 7063207936
  compatible               = 10.2.0.3.0
  log_archive_dest_1       = LOCATION=/sx_data/arch_ORCL/
  log_archive_format       = %t_%s_%r.dbf
  db_file_multiblock_read_count= 16
  db_recovery_file_dest    = /oracle/flash_recovery_area
  db_recovery_file_dest_size= 2147483648
  undo_management          = AUTO
  undo_tablespace          = UNDOTBS1
  undo_retention           = 39600
  fast_start_parallel_rollback= FALSE
  remote_login_passwordfile= EXCLUSIVE
  db_domain                = 
  dispatchers              = (PROTOCOL=TCP) (SERVICE=ORCLXDB)
  local_listener           = ORCL
  job_queue_processes      = 10
  background_dump_dest     = /oracle/admin/ORCL/bdump
  user_dump_dest           = /oracle/admin/ORCL/udump
  core_dump_dest           = /oracle/admin/ORCL/cdump
  audit_file_dest          = /oracle/admin/ORCL/adump
  db_name                  = ORCL
  open_cursors             = 2000
  optimizer_index_cost_adj = 20
  optimizer_index_caching  = 90
  pga_aggregate_target     = 2576351232
PMON started with pid=2, OS id=13613
PSP0 started with pid=3, OS id=13615
MMAN started with pid=4, OS id=13617
DBW0 started with pid=5, OS id=13619
LGWR started with pid=6, OS id=13621
CKPT started with pid=7, OS id=13623
SMON started with pid=8, OS id=13625
RECO started with pid=9, OS id=13627
CJQ0 started with pid=10, OS id=13629
MMON started with pid=11, OS id=13631
Tue Jul 15 22:23:30 2014
starting up 1 dispatcher(s) for network address '(ADDRESS=(PARTIAL=YES)(PROTOCOL=TCP))'...
MMNL started with pid=12, OS id=13635
Tue Jul 15 22:23:30 2014
starting up 1 shared server(s) ...
Tue Jul 15 22:23:31 2014
ALTER DATABASE   MOUNT
Tue Jul 15 22:23:39 2014
Setting recovery target incarnation to 2
Tue Jul 15 22:23:42 2014
Successful mount of redo thread 1, with mount id 1380841571
Tue Jul 15 22:23:42 2014
Database mounted in Exclusive Mode
Completed: ALTER DATABASE   MOUNT
Tue Jul 15 22:23:42 2014
ALTER DATABASE OPEN
Tue Jul 15 22:23:47 2014
Beginning crash recovery of 1 threads
 parallel recovery started with 7 processes
Tue Jul 15 22:23:50 2014
Started redo scan
Tue Jul 15 22:23:52 2014
Completed redo scan
 336597 redo blocks read, 78835 data blocks need recovery
Tue Jul 15 22:23:52 2014
Started redo application at
 Thread 1: logseq 2270, block 29
Tue Jul 15 22:23:53 2014
Recovery of Online Redo Log: Thread 1 Group 4 Seq 2270 Reading mem 0
  Mem# 0: /sx_data/ORCL/redo04.log
Tue Jul 15 22:23:58 2014
Completed redo application

一直停在Completed redo application这,而这时的等待事件是checkpoint complete开始以为是并行恢复慢造成的,就查询了v$transaction,v$fast_start_transactions但视图中并没有进行恢复操作的事务存在.后来咨询了老熊,老熊说检查一下IO情况看是不是存储出问题了,如是再次检查存储IO性能:

rx6600-1:[/]#sar 1 10

HP-UX rx6600-1 B.11.23 U ia64    07/15/14

22:36:41    %usr    %sys    %wio   %idle
22:36:42       2       2      12      84
22:36:43       1       0      12      87
22:36:44       0       0      17      83
22:36:45       0       0      13      87
22:36:46       0       1      12      87
22:36:47       2       1      13      84
22:36:48       1       1      16      82
22:36:49       0       0      12      88
22:36:50       0       0      12      88
22:36:51       0       0      22      78

Average        1       0      14      85

从上面可以看到现在实际上并没有业务在跑居然还存在IO等待这是不正常的

rx6600-2:[/]#bdf
Filesystem          kbytes    used   avail %used Mounted on
/dev/vg00/lvol3     983040  422504  556176   43% /
/dev/vg00/lvol1    1835008  135048 1686776    7% /stand
/dev/vg00/lvol8    8912896 8535352  374824   96% /var
/dev/vg00/lvol7    7962624 2762312 5159704   35% /usr
/dev/vg00/lvol4     524288   83192  437784   16% /tmp
/dev/vg00/tmplv    2064384   93512 1847942    5% /oratmp
/dev/vg00/orasoft  10256384 3144652 6668390   32% /orasoft
/dev/vg00/oracle   20480000 5480497 14062042   28% /oracle
/dev/vg00/lvol6    9076736 5206384 3840128   58% /opt
/dev/vg00/lvol5     131072   25472  104824   20% /home
/dev/cwjcvg/cwjc_datalv
                   414973952 134188114 263239842   34% /cwjc_data
/dev/sxvg/sx_datalv
                   624689152 298665485 305654147   49% /sx_data

rx6600-2:[/]#time dd if=/dev/zero of=/var/test bs=8k count=100000

下面对小机自身的磁盘进行IO测试写800M的数据只要12秒左右

msgcnt 2 vxfs: mesg 001: vx_nospace - /dev/vg00/lvol8 file system full (1 block extent)
I/O error 
47185+0 records in
47184+1 records out

real       11.7
user        0.0
sys         0.8

但是对EMC存储进行IO测试写800M的数据只要30多分还没有完成

rx6600-2:[/]#time dd if=/dev/zero of=/sx_data/test bs=8k count=100000
711856+0 records in
711855+0 records out

real    30:58.4
user        0.5
sys        13.0

这明显的是存储出了问题,后面得知管理人员早上10点多发现了存储有一个磁盘损坏了,存储做的raid 5,有热备盘.而且还有上百G的数据进行存储级的同步.与出现性能问题的时间一至。

到此问题原因找到了解决起来也就简单了.幸亏问题解决了,第二天有大领导来检查要不就…….哈哈

TNS-01190故障的处理

由于监听程序原来是使用的是端口1532.现在修改成1521,结果不能启动说监听已经启动了.于是停止监听报错
TNS-01190: The user is not authorized to execute the requested listener command

[oracle@jyrac1 admin]$ lsnrctl start

LSNRCTL for Linux: Version 11.2.0.1.0 - Production on 03-JUL-2014 11:23:07

Copyright (c) 1991, 2009, Oracle.  All rights reserved.

TNS-01106: Listener using listener name LISTENER has already been started

[oracle@jyrac1 admin]$ lsnrctl stop

LSNRCTL for Linux: Version 11.2.0.1.0 - Production on 03-JUL-2014 11:23:24

Copyright (c) 1991, 2009, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=jyrac1)(PORT=1521)))
TNS-01190: The user is not authorized to execute the requested listener command

查看监听状态:

[grid@jyrac1 ~]$ ps -ef | grep -i listener
grid      4180     1  0 11:28 ?        00:00:00 /grid/11.2.0/grid/bin/tnslsnr LISTENER -inherit
grid      4517  4138  0 11:56 pts/1    00:00:00 grep -i listener
[oracle@jyrac1 ~]$ lsnrctl status

LSNRCTL for Linux: Version 11.2.0.1.0 - Production on 03-JUL-2014 11:24:25

Copyright (c) 1991, 2009, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=jyrac1)(PORT=1521)))
STATUS of the LISTENER
------------------------
Alias                     LISTENER
Version                   TNSLSNR for Linux: Version 11.2.0.1.0 - Production
Start Date                03-JUL-2014 10:40:26
Uptime                    0 days 0 hr. 43 min. 59 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /grid/11.2.0/grid/network/admin/listener.ora
Listener Log File         /grid/11.2.0/grid/log/diag/tnslsnr/jyrac1/listener/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=jyrac1)(PORT=1521)))
Services Summary...
Service "PLSExtProc" has 1 instance(s).
  Instance "PLSExtProc", status UNKNOWN, has 1 handler(s) for this service...
Service "jycs" has 1 instance(s).
  Instance "jycs", status READY, has 1 handler(s) for this service...
Service "jycs (ORACLE_HOME =/u01/app/oracle/11.2.0/db" has 1 instance(s).
  Instance "jycs", status UNKNOWN, has 1 handler(s) for this service...
Service "jycsXDB" has 1 instance(s).
  Instance "jycs", status READY, has 1 handler(s) for this service...
The command completed successfully

其中:Security ON: Local OS Authentication 此条提示信息表明监听处于Local OS Authentication认证模式
Oracle 10g版本以及之后的版本中推出了监听的本地操作系统认证安全特性.若监听程序是在当前用户下启动的,则当前用户具有
管理监听的所有权利,其他用户对监听的管理将受到限制

因为数据库是11.2.0.1而且使用了oracle restart特性且用户为grid.注册了listener服务且只对默认端口1521有效.之前是1532所以
oracle restart不会自动重启监听.由于将端口修改成了1521所以oracle restart自动重启了listener

[grid@jyrac1 ~]$ srvctl status listener
Listener LISTENER is enabled
Listener LISTENER is running on node(s): jyrac1

由于oracle restart 以grid用户自动启动了监听所以oracle用户不能重动由grid用户所启动的监听.

配置静态听sid_name大小写造成无法登录

配置静态监听时SID_NAME名字大小写造成登录失败.对于oracle数据库来说同样的名字不一样的大小写表示完全不同的数据库实例。一旦静态监听的实例名字与对应的数据库实例不一致时,便会出现无法连接数据库的问题。

由于原来的1521端口要给另一个实例使用,现在的这个实例要使用另外的端口客户的就使用静态监听在设置完重启监听后登录出错.

SQL> conn test/test@127
ERROR:
ORA-01034: ORACLE not available
ORA-27101: shared memory realm does not exist
Linux-x86_64 Error: 2: No such file or directory

查看oracle home目录和oracle_sid

[oracle@ggfwweb admin]$ echo $ORACLE_HOME
/u01/app/oracle/10gR2/db
[oracle@ggfwweb admin]$ echo $ORACLE_SID
hygeia
[oracle@ggfwweb admin]$ ps -ef | grep pmon
oracle   25830     1  0 09:39 ?        00:00:00 ora_pmon_hygeia
oracle   25950 25585  0 09:59 pts/1    00:00:00 grep pmon

查看监听文件文件

[oracle@ggfwweb admin]$ cd $ORACLE_HOME/network/admin
[oracle@ggfwweb admin]$ ls
listener.ora  listener.ora.bak  samples  shrept.lst  tnsnames.ora
[oracle@ggfwweb admin]$ cat listener.ora
# listener.ora Network Configuration File: /u01/app/oracle/10gR2/db/network/admin/listener.ora
# Generated by Oracle configuration tools.

SID_LIST_LISTENER =
  (SID_LIST =
    (SID_DESC =
      (SID_NAME = PLSExtProc)
      (ORACLE_HOME = /u01/app/oracle/10gR2/db)
      (PROGRAM = extproc)
  )
    (SID_DESC = 
      (GLOBAL_DBNAME = HYGEIA)
      (ORACLE_HOME = /u01/app/oracle/10gR2/db)
      (SID_NAME = HYGEIA)
    )
  )

LISTENER =
  (DESCRIPTION_LIST =
    (DESCRIPTION =
      (ADDRESS = (PROTOCOL = TCP)(HOST =10.142.11.108)(PORT = 1568))
      (ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC0))
    )
  )

可以看到监听文件中使用的是HYGEIA,而ORACLE_SID是hygeia

将SID_NAME=HYGEIA修改为SID_NAME=hygeia后重启监听

[oracle@ggfwweb admin]$ lsnrctl stop

LSNRCTL for Linux: Version 10.2.0.5.0 - Production on 03-JUL-2014 10:03:32

Copyright (c) 1991, 2010, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=10.142.11.108)(PORT=1568)))
The command completed successfully

[oracle@ggfwweb admin]$ lsnrctl start

LSNRCTL for Linux: Version 10.2.0.5.0 - Production on 03-JUL-2014 10:03:53

Copyright (c) 1991, 2010, Oracle.  All rights reserved.

Starting /u01/app/oracle/10gR2/db/bin/tnslsnr: please wait...

TNSLSNR for Linux: Version 10.2.0.5.0 - Production
System parameter file is /u01/app/oracle/10gR2/db/network/admin/listener.ora
Log messages written to /u01/app/oracle/10gR2/db/network/log/listener.log
Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=10.142.11.108)(PORT=1568)))
Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC0)))

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=10.142.11.108)(PORT=1568)))
STATUS of the LISTENER
------------------------
Alias                     LISTENER
Version                   TNSLSNR for Linux: Version 10.2.0.5.0 - Production
Start Date                03-JUL-2014 10:03:53
Uptime                    0 days 0 hr. 0 min. 0 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /u01/app/oracle/10gR2/db/network/admin/listener.ora
Listener Log File         /u01/app/oracle/10gR2/db/network/log/listener.log
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=10.142.11.108)(PORT=1568)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC0)))
Services Summary...
Service "PLSExtProc" has 1 instance(s).
  Instance "PLSExtProc", status UNKNOWN, has 1 handler(s) for this service...
Service "hygeia" has 1 instance(s).
  Instance "hygeia", status UNKNOWN, has 1 handler(s) for this service...
The command completed successfully
[oracle@ggfwweb admin]$ sqlplus /nolog

SQL*Plus: Release 10.2.0.5.0 - Production on Thu Jul 3 10:04:12 2014

Copyright (c) 1982, 2010, Oracle.  All Rights Reserved.

SQL> conn / as sysdba
Connected.
SQL> show parameter local_listener

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
local_listener                       string
SQL> conn test/test@127
Connected.

在配置静态监听时要注意数据库实例名本身是区分大小写的,因此在配置静态监听配置SID_NAME时一定要注意大小写

oracle 11g health monitor健康监控

health monitor健康监控
从oracle 11g开始,oracle数据库包含对数据库运行诊断检查的健康监控框架

health check健康检查是对数据库的各层和组件乾地检查.健康检查会检测文件错误,物理和逻辑块错误,undo和redo错误,数据字典错误等等.健康检查会生成一个检查报告在许多情况下包含了问题解决的建议.健康检查有两种运行方式:
.reactive—故障诊断架构为了响应一个严重错误会自动运行健康检查.

.manual—–作为一个dba,可以使用dbms_hm包或企业管理器接口来手工运行健康检查.

健康检查会存储调查结果,建议并在ADR中存储其它的信息

健康检查能以以下两种方式来运行:
DB-online模式意味着当数据库处于打开状态(open或mount)时可以运行健康检查

DB-offline模式意味着当实例不可用且数据库处于关闭状态(nomount)时也能运行健康检查

所有的健康检查都能在DB-online模式下运行.只有重做日志完整性检查和数据库结构完整检查可以在DB-offline模式下运行.

健康检查的类型
健康监控运行以下检查:
DB Structure Integrity Check(数据库结构完整性检查)–这种检查验证数据文件的完整性,如果这些文件不能访问,文件错误
或者不一致时会报告这些错误信息.如果数据库在mount或者open状态,这种检查会检查控制文件中所列出的重做日志文件和数据文件.如果数据库在nomount状态,只会检查控制文件.

Data Block Integrity Check(数据块完整性检查)–这种检查会检查磁盘镜像块错误比如checksum故障,head/tail mismatch和数据块的逻辑不一致性.大多数的错误可以通过使用block media recovery来进行修复.错误块信息也会被v$database_block_corruption视图所捕获.这种检查不会检测inter-block或inter-segment错误.

Redo Integrity Check(重做完整性检查)–这种检查将会扫描重做日志内容的可访问性和错误信息,也能对归档日志文件进行检查.
这种检查会报告归档日志或重做日志的错误信息.

Undo Segment Integrity Check(回滚段完整性检查)–这种检查会发现逻辑undo错误.在定位一个undo错误之后,这种检查将使用PMON和SMON来尝试恢复这个错误事务.如果恢复失败,health monitor将会存储关于这个错误的信息到v$corrupt_xid_list中.大多数undo错误都可以通过强制提交来解决.

Transaction Interity Check–这种检查与undo segment integrity check是相同的只是它只检查一特定的事务.

Dictionary Integrity Check–这种检查会检查核心字典对象比如tab$或col$的完整性.它将执行以下操作:
.验证每一个字典对象的字典条目内容

.执行cross-row级别的检查.它将对字典中的行强制执行逻辑约束验证

.执行对象的关系检查,它将在字典对象之间强制执行父子关系验证

Dictionary Integrity Check操作会检查以下字典对象:
tab$, clu$, fet$, uet$, seg$, undo$, ts$, file$, obj$, ind$, icol$, col$, user$,
con$, cdef$, ccol$, bootstrap$, objauth$, ugroup$, tsq$, syn$, view$, typed_
view$, superobj$, seq$, lob$, coltype$, subcoltype$, ntab$, refcon$, opqtype$,
dependency$, access$, viewcon$, icoldep$, dual$, sysauth$, objpriv$, defrole$,ecol$.

手工运行health check
健康监控提供了两种方式来手工运行健康检查:
.使用dbms_hm包

.使用企业管理器接口

使用dbms_hm来运行健康检查
dbms_hm包中有一个run_check过程用来运行健康检查.为了调用run_check需要提供检查的名称和运行的名字比如:

SQL> exec dbms_hm.run_check('Dictionary Integrity Check','jy_dict_run');

PL/SQL procedure successfully completed.

为了获得一个健康检查名称列表执行以下查询:

SQL> select name from v$hm_check where internal_check='N';
NAME
----------------------------------------------------------------
DB Structure Integrity Check

CF Block Integrity Check

Data Block Integrity Check

Redo Integrity Check

Transaction Integrity Check

Undo Segment Integrity Check

Dictionary Integrity Check

ASM Allocation Check

大多数的健康检查接受输入参数.可以通过查询v$hm_check_param来查看参数名和描述.一些参数是强制性的其它的是可选的.
如果一个可选参数被忽略将会使用缺省值.下面的查询将会显示所有健康检查的参数信息:

SQL> SELECT c.name check_name, p.name parameter_name, p.type,
  2  p.default_value, p.description
  3  FROM v$hm_check_param p, v$hm_check c
  4  WHERE p.check_id = c.id and c.internal_check = 'N'
  5  ORDER BY c.name;

CHECK_NAME                       PARAMETER_NAME       TYPE                 DEFAULT_VALUE    DESCRIPTION
-------------------------------- -------------------- -------------------- ---------------- ------------------------------
ASM Allocation Check             ASM_DISK_GRP_NAME    DBKH_PARAM_TEXT                       ASM 组名
CF Block Integrity Check         CF_BL_NUM            DBKH_PARAM_UB4                        控制文件块号
Data Block Integrity Check       BLC_DF_NUM           DBKH_PARAM_UB4                        文件号
Data Block Integrity Check       BLC_BL_NUM           DBKH_PARAM_UB4                        块号
Dictionary Integrity Check       CHECK_MASK           DBKH_PARAM_TEXT      ALL              检查掩码
Dictionary Integrity Check       TABLE_NAME           DBKH_PARAM_TEXT      ALL_CORE_TABLES  表名
Redo Integrity Check             SCN_TEXT             DBKH_PARAM_TEXT      0                最新良好重做的 SCN (如果已知)
Transaction Integrity Check      TXN_ID               DBKH_PARAM_TEXT                       事务处理 ID
Undo Segment Integrity Check     USN_NUMBER           DBKH_PARAM_TEXT                       还原段号

输入参数通过input_params参数以name/value对用分号来分隔进行传递.下面的例子用事务ID作为参数进行事务完整性检查:

SQL>BEGIN
DBMS_HM.RUN_CHECK (
check_name => 'Transaction Integrity Check',
run_name => 'my_trans_run',
input_params => 'TXN_ID=8.31.4');
END;
/

PL/SQL procedure successfully completed.

查看检查报告
在一个检查运行完后可以查看它的执行报告.这个可报告包括了调查结果,建议和其它的信息.也可以使用企业管理器,ADRCI工具,或者
dbms_hm包.其中企业管理器查看的报告格式为html,dbms_hm包查看的报告格式为html,xml和text,检查DRCL工个查看报告的格式为XML

检查运行的结果被存储在ADR,但报告不会立即生成.当你请求查看报告时可以使用dbms_hm或企业管理器来生成.如果报告不存在,首先
得用ADR中的检查结果数据来生成并以xml或html格式来存储.如果使用ADRCI工具如果报告文件不存必须首先运行命令来生成报告文件然后运行其它的命令来显示它的内容.

使用dbms_hm来查看检查报告
使用dbms_hm.get_run_report函数可以查看健康检查报告.这个函数可以以html,xml或text格式来显示.缺省格式为text:

SQL> SET LONG 100000
SQL> SET LONGCHUNKSIZE 1000
SQL> SET PAGESIZE 1000
SQL> SET LINESIZE 512
SQL> select dbms_hm.get_run_report('jy_dict_run') from dual;

DBMS_HM.GET_RUN_REPORT('JY_DICT_RUN')
--------------------------------------------------------------------------------
Run Name : JY_DICT_RUN
Run Id : 1061
Check Name : Data Block Integrity Check
Mode : REACTIVE
Status : COMPLETED
Start Time : 2007-05-12 22:11:02.032292 -07:00
End Time : 2007-05-12 22:11:20.835135 -07:00
Error Encountered : 0
Source Incident Id : 7418
Number of Incidents Created : 0
Input Paramters for the Run
BLC_DF_NUM=1
BLC_BL_NUM=64349
Run Findings And Recommendations
Finding
Finding Name : Media Block Corruption
Finding ID : 1065
Type : FAILURE
Status : OPEN
Priority : HIGH
Message : Block 64349 in datafile 1:
'/u01/app/oracle/dbs/t_db1.f' is media corrupt
Message : Object BMRTEST1 owned by SYS might be unavailable
Finding
Finding Name : Media Block Corruption
Finding ID : 1071
Type : FAILURE
Status : OPEN
Priority : HIGH
Message : Block 64351 in datafile 1:
'/u01/app/oracle/dbs/t_db1.f' is media corrupt
Message : Object BMRTEST2 owned by SYS might be unavailable

使用ADRCI工具来查看检查报告
1.确保操作系统环境变量(比如 ORACLE_HOME)已经被设置.并输入以下命令:

[oracle@jyrac1 ~]$ adrci

ADRCI: Release 11.2.0.1.0 - Production on Wed Jun 11 17:18:53 2014

Copyright (c) 1982, 2009, Oracle and/or its affiliates.  All rights reserved.

ADR base = "/u01/app/oracle"

工具启动后会显示以下提示符
adrci>

可以改变当前的ADR home目录.使用show homes命令可以列出所有的ADR home目录,set homepath命令用来改变当前ADR home目录.

adrci> show homes
ADR Homes:
diag/tnslsnr/jyrac1/listener
diag/rdbms/cs/cs
diag/rdbms/jytest/jytest
diag/rdbms/jy/jy
diag/rdbms/jycs/jycs
adrci> set homepath diag/rdbms/jycs/jycs

2.输入以下命令:
show hm_run
这个命令会列出所有运行过并(存储在v$hm_run)注册到ADR档案库中的检查

adrci> show hm_run

ADR Home = /u01/app/oracle/diag/tnslsnr/jyrac1/listener:
*************************************************************************
0 rows fetched
<adr_relation name="">
<adr_home name="/u01/app/oracle/diag/tnslsnr/jyrac1/listener">

ADR Home = /u01/app/oracle/diag/rdbms/cs/cs:
*************************************************************************
0 rows fetched

<adr_home name="/u01/app/oracle/diag/rdbms/cs/cs">

ADR Home = /u01/app/oracle/diag/rdbms/jytest/jytest:
*************************************************************************
0 rows fetched

<adr_home name="/u01/app/oracle/diag/rdbms/jytest/jytest">

ADR Home = /u01/app/oracle/diag/rdbms/jy/jy:
*************************************************************************
0 rows fetched

<adr_home name="/u01/app/oracle/diag/rdbms/jy/jy">

ADR Home = /u01/app/oracle/diag/rdbms/jycs/jycs:
*************************************************************************

**********************************************************
HM RUN RECORD 1
**********************************************************
   RUN_ID                        1
   RUN_NAME                      HM_RUN_1
   CHECK_NAME                    DB Structure Integrity Check
   NAME_ID                       2
   MODE                          2
   START_TIME                    2014-03-24 13:32:58.872509 +08:00
   RESUME_TIME                   
   END_TIME                      2014-03-24 13:33:01.710518 +08:00
   MODIFIED_TIME                 2014-03-24 13:33:01.710518 +08:00
   TIMEOUT                       0
   FLAGS                         0
   STATUS                        5
   SRC_INCIDENT_ID               0
   NUM_INCIDENTS                 0
   ERR_NUMBER                    0
   REPORT_FILE                   

**********************************************************
HM RUN RECORD 2
**********************************************************
   RUN_ID                        61
   RUN_NAME                      HM_RUN_61
   CHECK_NAME                    DB Structure Integrity Check
   NAME_ID                       2
   MODE                          2
   START_TIME                    2014-04-08 10:22:43.295203 +08:00
   RESUME_TIME                   
   END_TIME                      2014-04-08 10:22:43.723241 +08:00
   MODIFIED_TIME                 2014-04-08 10:22:43.723241 +08:00
   TIMEOUT                       0
   FLAGS                         0
   STATUS                        5
   SRC_INCIDENT_ID               0
   NUM_INCIDENTS                 0
   ERR_NUMBER                    0
   REPORT_FILE                   

**********************************************************
HM RUN RECORD 3
**********************************************************
   RUN_ID                        81
   RUN_NAME                      jy_dict_run
   CHECK_NAME                    Dictionary Integrity Check
   NAME_ID                       24
   MODE                          0
   START_TIME                    2014-06-11 16:42:00.675293 +08:00
   RESUME_TIME                   
   END_TIME                      2014-06-11 16:42:02.950141 +08:00
   MODIFIED_TIME                 2014-06-11 17:14:33.658642 +08:00
   TIMEOUT                       0
   FLAGS                         0
   STATUS                        5
   SRC_INCIDENT_ID               0
   NUM_INCIDENTS                 0
   ERR_NUMBER                    0
   REPORT_FILE                   /u01/app/oracle/diag/rdbms/jycs/jycs/hm/HMREPORT_jy_dict_run.hm

3.定位你要创建报告的检查并注意检查运行的名称.如果检查报告已经存在那么report_file字段就会包含一个文件名比如上面的
/u01/app/oracle/diag/rdbms/jycs/jycs/hm/HMREPORT_jy_dict_run.hm.否则使用下面的命令来生成一个报告.

adrci> create report hm_run jy_dict_run_1
DIA-48448: This command does not support multiple ADR homes

这是因为当前存在多个ADR home目录

adrci> show homes
ADR Homes:
diag/tnslsnr/jyrac1/listener
diag/rdbms/cs/cs
diag/rdbms/jytest/jytest
diag/rdbms/jy/jy
diag/rdbms/jycs/jycs
adrci> set homepath diag/rdbms/jycs/jycs
adrci> create report hm_run jy_dict_run_1

4.执行以下命令来查看报告:

adrci> show report hm_run jy_dict_run_1
< ?xml version="1.0" encoding="US-ASCII"?>HM Report: jy_dict_run_1
    <run_info>
        <check_name>Dictionary Integrity Check
        <run_id>101
        <run_name>jy_dict_run_1
        <run_mode>MANUAL
        <run_status>COMPLETED
        <run_error_num>0
        <source_incident_id>0
        <num_incidents_created>0
        <run_start_time>2014-06-11 17:27:13.477462 +08:00
        <run_end_time>2014-06-11 17:27:14.734166 +08:00
    
    <run_parameters>
        <run_parameter>TABLE_NAME=ALL_CORE_TABLES
        <run_parameter>CHECK_MASK=ALL
    
    

health monitor视图
可以用视图来代替检查报告来查看特定检查的结果.可用的视图有v$hm_run,v$hm_finding,v$hm_recommendation
下面查询v$hm_run来判断运行检查的确良历史信息:

SQL> SELECT run_id, name, check_name, run_mode, src_incident FROM v$hm_run;

    RUN_ID NAME                             CHECK_NAME                       RUN_MODE SRC_INCIDENT
---------- -------------------------------- -------------------------------- -------- ------------
        61 HM_RUN_61                        DB Structure Integrity Check     REACTIVE            0
        81 jy_dict_run                      Dictionary Integrity Check       MANUAL              0
       101 jy_dict_run_1                    Dictionary Integrity Check       MANUAL              0
         1 HM_RUN_1                         DB Structure Integrity Check     REACTIVE            0

下面查询v$hm_finding来获得RUN_ID 61的详细信息:

SELECT type, description FROM v$hm_finding WHERE run_id = 61;
TYPE          DESCRIPTION
------------- -----------------------------------------------------------------------------
FAILURE      Block 64349 in datafile 1: '/u01/app/oracle/dbs/t_db1.f' is media corrupt
FAILURE      Block 64351 in datafile 1: '/u01/app/oracle/dbs/t_db1.f' is media corrupt

health check参数

Table 9–6 Parameters for Data Block Integrity Check
------------------------------------------------------------------------------------------------------------------
Parameter Name                Type                  Default Value           Description
------------------------------------------------------------------------------------------------------------------
BLC_DF_NUM                    Number                (none)                  Block datafile number
BLC_BL_NUM                    Number                (none)                  Data block number
------------------------------------------------------------------------------------------------------------------


Table 9–7 Parameters for Redo Integrity Check
------------------------------------------------------------------------------------------------------------------
Parameter Name                Type                   Default Value          Description
------------------------------------------------------------------------------------------------------------------
SCN_TEXT                      Text                   0                      SCN of the latest good redo (if known)
------------------------------------------------------------------------------------------------------------------


Table 9–8 Parameters for Undo Segment Integrity Check
------------------------------------------------------------------------------------------------------------------
Parameter Name                Type                   Default Value          Description
------------------------------------------------------------------------------------------------------------------
USN_NUMBER                    Text                   (none)                 Undo segment number
------------------------------------------------------------------------------------------------------------------


Table 9–9 Parameters for Transaction Integrity Check
------------------------------------------------------------------------------------------------------------------
Parameter Name                Type                   Default Value          Description
------------------------------------------------------------------------------------------------------------------              
TXN_ID                        Text                   (none)                 Transaction ID
------------------------------------------------------------------------------------------------------------------

Table 9–10 Parameters for Dictionary Integrity Check
------------------------------------------------------------------------------------------------------------------
Parameter Name                Type                   Default Value          Description
------------------------------------------------------------------------------------------------------------------
CHECK_MASK                    Text                   ALL                    Possible values are:
                                                                            COLUMN_CHECKS—Run column
                                                                            checks only. Verify column-level
                                                                            constraints in the core tables.

                                                                            ROW_CHECKS—Run row checks only.
                                                                            Verify row-level constraints in the
                                                                            core tables.

                                                                            REFERENTIAL_CHECKS—Run
                                                                            referential checks only. Verify
                                                                            referential constraints in the core
                                                                            tables.
                       
                                                                            ALL—Run all checks.

TABLE_NAME                    Text                   ALL_CORE_TABLES        Name of a single core table to check. If
                                                                            omitted, all core tables are checked.
------------------------------------------------------------------------------------------------------------------