ASM Rebalance
使用传统逻辑卷管理器,扩展或收缩条带化的文件系统通常是很困难的。使用ASM,这些磁盘改变现 在调用重新分布(rebalance)可以无逢操作来条带数据。另外,这些操作可以联机执行。存储配置的任何改变–增加,删除或重设置磁盘大小,都会触发rebalance操作。ASM不会动态的围绕 着”host areas”或”hot extents”进行移动。因为ASM跨所有磁盘与数据库buffer cache分布区,阻止 small chunks of data出现在磁盘的host areas,完全消除了host disks或extents。
Rebalance Operation
rebalance operation跨磁盘组中的所有磁盘总是对文件区与空间使用提供了一种均匀分布。对每个 文件执行rebalance操作可以确保每个文件跨所有磁盘均匀分布。最关键的是ASM保证了I/O负载平衡 。ASM后台进程,RBAL管理rebalance操作。RBAL进程检查每个文件区映射,基于新的存储配置区会均 匀分布。例如,有块八磁盘的一个磁盘组,一个数据文件有40个区(每个磁盘将会有五个区),当向磁 盘组增加两块大小一样的磁盘后,数据文件会跨10块磁盘执行rebalance与分布,每个磁盘只包含四 个区。只需要移动8个区就可以完成rebalance操作–,完全重新分布区是不必要,只需要移动最小数 量的区就可以达到均匀分布。
磁盘大小与文件大小是影响rebalance的权重因素。一个大的磁盘将消耗更多的区。ASM rebalance操作有以下工作流程:
1.对ASM实例,DBA向磁盘组增加磁盘或从磁盘组中删除磁盘。
2.调用RBAL进程来创建一个rebalance计划,然后开始调度重新分布操作。
3.RBAL计算评估时间与执行任务所需要的工作,然后给ASM rebalance(ARBx)进程发送处理请求。调 用的ARBx进程的数量直接由init.ora参数asm_power_limit或在add ,drop或rebalance命令所指定的 power level所决定。
4.持续操作目录(COD)会被更新来反映一个rebalance活动。COD在influx rebalance失败时很重要。 恢复实例时对于rebalance与重启将会看到一个显著的COD条目。
5.RBAL对ARBs分布计划。一般,RBAL对每个文件生成一个计划,然而,大文件可能被多个ARBs分解。
6.ARBx对这些区执行rebalance。每个区会被锁定,重定位与解锁。当一个区被锁定时可以被读取。 写也仍然可以执行,但可能需要对新位置重新执行。这个操作会在v$asm_operation中显示了REBAL。
测试过程如下:
1.查看asm_power_limit参数设置
SQL> show parameter asm_power NAME TYPE VALUE ------------------------------------ ---------------------- ------------------------------ asm_power_limit integer 1
2.向磁盘组datadg增加磁盘
SQL> alter diskgroup datadg add disk '/dev/raw/raw5' Diskgroup altered.
3.查看alert_+ASM1.log
SQL> alter diskgroup datadg add disk '/dev/raw/raw5' Thu Dec 01 15:39:18 CST 2016 NOTE: reconfiguration of group 1/0x489bd291 (DATADG), full=1 Thu Dec 01 15:39:18 CST 2016 NOTE: initializing header on grp 1 disk DATADG_0001 NOTE: cache opening disk 1 of grp 1: DATADG_0001 path:/dev/raw/raw5 NOTE: requesting all-instance disk validation for group=1 Thu Dec 01 15:39:18 CST 2016 NOTE: disk validation pending for group 1/0x489bd291 (DATADG) SUCCESS: validated disks for 1/0x489bd291 (DATADG) Thu Dec 01 15:39:21 CST 2016 NOTE: PST update: grp = 1 NOTE: requesting all-instance membership refresh for group=1 Thu Dec 01 15:39:21 CST 2016 NOTE: membership refresh pending for group 1/0x489bd291 (DATADG) SUCCESS: refreshed membership for 1/0x489bd291 (DATADG) Thu Dec 01 15:39:27 CST 2016 NOTE: starting rebalance of group 1/0x489bd291 (DATADG) at power 1 Starting background process ARB0 ARB0 started with pid=19, OS id=21560 Thu Dec 01 15:39:27 CST 2016 NOTE: assigning ARB0 to group 1/0x489bd291 (DATADG) Thu Dec 01 15:39:31 CST 2016 NOTE: X->S down convert bast on F1B3 bastCount=2 NOTE: X->S down convert bast on F1B3 bastCount=3 NOTE: X->S down convert bast on F1B3 bastCount=4 NOTE: X->S down convert bast on F1B3 bastCount=5 NOTE: X->S down convert bast on F1B3 bastCount=6 NOTE: X->S down convert bast on F1B3 bastCount=7 Thu Dec 01 15:40:34 CST 2016 NOTE: stopping process ARB0 Thu Dec 01 15:40:37 CST 2016 SUCCESS: rebalance completed for group 1/0x489bd291 (DATADG) Thu Dec 01 15:40:37 CST 2016 SUCCESS: rebalance completed for group 1/0x489bd291 (DATADG) NOTE: PST update: grp = 1 NOTE: PST update: grp = 1 Thu Dec 01 15:48:29 CST 2016
当使用缺省参数值asm_power_limit=1,向磁盘组增加一块磁盘执行rebalance操作花了将近9分钟( 从2016-12-01 15:39:12开始到2016-12-01 15:48:29完成)
手动指定rebalance power操作如下:
1.向磁盘组datadg增加磁盘
SQL> alter diskgroup datadg add disk '/dev/raw/raw6'; Diskgroup altered.
2.查看alert_+ASM1.log
SQL> alter diskgroup datadg add disk '/dev/raw/raw6' rebalance power 4 Thu Dec 01 15:48:30 CST 2016 NOTE: reconfiguration of group 1/0x489bd291 (DATADG), full=1 Thu Dec 01 15:48:30 CST 2016 NOTE: initializing header on grp 1 disk DATADG_0002 NOTE: cache opening disk 2 of grp 1: DATADG_0002 path:/dev/raw/raw6 NOTE: requesting all-instance disk validation for group=1 Thu Dec 01 15:48:30 CST 2016 NOTE: disk validation pending for group 1/0x489bd291 (DATADG) SUCCESS: validated disks for 1/0x489bd291 (DATADG) Thu Dec 01 15:48:33 CST 2016 NOTE: PST update: grp = 1 NOTE: requesting all-instance membership refresh for group=1 Thu Dec 01 15:48:33 CST 2016 NOTE: membership refresh pending for group 1/0x489bd291 (DATADG) SUCCESS: refreshed membership for 1/0x489bd291 (DATADG) Thu Dec 01 15:48:39 CST 2016 NOTE: starting rebalance of group 1/0x489bd291 (DATADG) at power 4 Starting background process ARB0 Starting background process ARB1 ARB0 started with pid=19, OS id=25110 Thu Dec 01 15:48:39 CST 2016 Starting background process ARB2 ARB1 started with pid=21, OS id=25114 Thu Dec 01 15:48:39 CST 2016 Starting background process ARB3 ARB2 started with pid=22, OS id=25119 Thu Dec 01 15:48:40 CST 2016 NOTE: assigning ARB0 to group 1/0x489bd291 (DATADG) ARB3 started with pid=23, OS id=25121 Thu Dec 01 15:48:40 CST 2016 NOTE: assigning ARB1 to group 1/0x489bd291 (DATADG) NOTE: assigning ARB2 to group 1/0x489bd291 (DATADG) NOTE: assigning ARB3 to group 1/0x489bd291 (DATADG) Thu Dec 01 15:48:47 CST 2016 NOTE: X->S down convert bast on F1B3 bastCount=8 NOTE: X->S down convert bast on F1B3 bastCount=9 NOTE: X->S down convert bast on F1B3 bastCount=10 NOTE: X->S down convert bast on F1B3 bastCount=11 NOTE: X->S down convert bast on F1B3 bastCount=12 NOTE: X->S down convert bast on F1B3 bastCount=13 Thu Dec 01 15:49:21 CST 2016 NOTE: stopping process ARB1 NOTE: stopping process ARB2 NOTE: stopping process ARB0 NOTE: stopping process ARB3 Thu Dec 01 15:49:25 CST 2016 SUCCESS: rebalance completed for group 1/0x489bd291 (DATADG) Thu Dec 01 15:49:25 CST 2016 SUCCESS: rebalance completed for group 1/0x489bd291 (DATADG) NOTE: PST update: grp = 1 NOTE: PST update: grp = 1
手动指定rebalance power=4,向磁盘组增加一块磁盘执行rebalance操作花了将近1分钟(从2016- 12-01 15:48:30开始到2016-12-01 15:49:25完成)。
对于如何设置rebalance进程数可以参考文档《Oracle Sun Database Machine High Availability Best Practices (Doc ID 1069521.1)》
对于rebalance操作所调用的每个ARB进程将会创建一个ARB跟踪文件。这个ARB跟踪文件可以在DIAG目 录上的子目录中找到。跟踪文件的内容类似如下:
/u01/app/oracle/admin/+ASM/bdump/+asm1_arb0_25110.trc Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 - Production With the Partitioning, Real Application Clusters, OLAP, Data Mining and Real Application Testing options ORACLE_HOME = /u01/app/oracle/product/10.2.0/db System name: Linux Node name: jyrac3 Release: 2.6.18-164.el5PAE Version: #1 SMP Tue Aug 18 15:59:11 EDT 2009 Machine: i686 Instance name: +ASM1 Redo thread mounted by this instance: 0Oracle process number: 19 Unix process pid: 25110, image: oracle@jyrac3 (ARB0) *** SERVICE NAME:() 2016-12-01 15:48:40.086 *** SESSION ID:(34.29) 2016-12-01 15:48:40.086 ARB0 relocating file +DATADG.2.1 (1 entries) ARB0 relocating file +DATADG.256.926895041 (34 entries) *** 2016-12-01 15:48:58.473 ARB0 relocating file +DATADG.257.926895043 (1 entries) ARB0 relocating file +DATADG.258.926895047 (16 entries) ARB0 relocating file +DATADG.259.926895047 (1 entries) ARB0 relocating file +DATADG.260.926895413 (6 entries) ARB0 relocating file +DATADG.261.926895419 (18 entries) *** 2016-12-01 15:49:08.569 ARB0 relocating file +DATADG.262.926895423 (19 entries) ARB0 relocating file +DATADG.263.926895443 (8 entries) ARB0 relocating file +DATADG.264.926895475 (33 entries)
在开始执行rebalance操作之后修改asm rebalance power
rebalance power的缺省值由asm_power_limit参数所指定为1。rebalance power的值越高, rebalance操作可能完成的越快。较低的rebalance power值可能造成rebalance操作时间很长,但是 消耗较少的CPU与I/O资源。
power的取值范围从0到11,当为0时停止rebalance,当为11时最快。从oracle 11.2.0.2开始,如果 磁盘组属性compatible.asm被设置为11.2.0.2或更高的版本,那么它的取值范围为0到1024。可以动 态调用这个参数,然而调整asm_power_limit只会影响之后的rebalance操作。不影响正在执行的 rebalance操作。为了在开始执行rebalance操作之后修改power,可以执行如下命令:
alter diskgroup
测试情况如下:
SQL> show parameter asm_power_limit NAME TYPE VALUE ------------------------------------ ---------------------- ------------------------------ asm_power_limit integer 1 SQL> alter diskgroup datadg add disk '/dev/raw/raw5','/dev/raw/raw6'; Diskgroup altered. SQL> select * from v$asm_operation; GROUP_NUMBER OPERATION STATE POWER ACTUAL SOFAR EST_WORK EST_RATE EST_MINUTES ------------ ---------- -------- ---------- ---------- ---------- ---------- ---------- ----------- 1 REBAL RUN 1 1 2 772 60 12
查看alert_+ASM1.log
Thu Dec 01 16:47:43 CST 2016 NOTE: reconfiguration of group 1/0x489bd291 (DATADG), full=1 Thu Dec 01 16:47:43 CST 2016 NOTE: initializing header on grp 1 disk DATADG_0001 NOTE: initializing header on grp 1 disk DATADG_0002 NOTE: cache opening disk 1 of grp 1: DATADG_0001 path:/dev/raw/raw5 NOTE: cache opening disk 2 of grp 1: DATADG_0002 path:/dev/raw/raw6 NOTE: requesting all-instance disk validation for group=1 Thu Dec 01 16:47:43 CST 2016 NOTE: disk validation pending for group 1/0x489bd291 (DATADG) SUCCESS: validated disks for 1/0x489bd291 (DATADG) Thu Dec 01 16:47:47 CST 2016 NOTE: PST update: grp = 1 NOTE: requesting all-instance membership refresh for group=1 Thu Dec 01 16:47:48 CST 2016 NOTE: membership refresh pending for group 1/0x489bd291 (DATADG) SUCCESS: refreshed membership for 1/0x489bd291 (DATADG) Thu Dec 01 16:47:54 CST 2016 NOTE: starting rebalance of group 1/0x489bd291 (DATADG) at power 1 Starting background process ARB0 ARB0 started with pid=18, OS id=16007 Thu Dec 01 16:47:54 CST 2016 NOTE: assigning ARB0 to group 1/0x489bd291 (DATADG) Thu Dec 01 16:48:00 CST 2016 NOTE: X->S down convert bast on F1B3 bastCount=27 NOTE: X->S down convert bast on F1B3 bastCount=28 NOTE: X->S down convert bast on F1B3 bastCount=29 NOTE: X->S down convert bast on F1B3 bastCount=30 NOTE: X->S down convert bast on F1B3 bastCount=31 NOTE: X->S down convert bast on F1B3 bastCount=32 NOTE: X->S down convert bast on F1B3 bastCount=33 NOTE: X->S down convert bast on F1B3 bastCount=34 NOTE: X->S down convert bast on F1B3 bastCount=35 NOTE: X->S down convert bast on F1B3 bastCount=36 Thu Dec 01 16:48:09 CST 2016
从信息Starting background process ARB0,可知只启动了一个ARB进程,因为asm_power_limit参数为1
SQL> alter diskgroup datadg rebalance power 8; Diskgroup altered.
查看alert_+ASM1.log
Thu Dec 01 16:48:09 CST 2016 ERROR: ORA-1013 thrown in ARB0 for group number 1 Thu Dec 01 16:48:09 CST 2016 Errors in file /u01/app/oracle/admin/+ASM/bdump/+asm1_arb0_16007.trc: ORA-01013: user requested cancel of current operation Thu Dec 01 16:48:09 CST 2016 NOTE: stopping process ARB0 Thu Dec 01 16:48:12 CST 2016 NOTE: rebalance interrupted for group 1/0x489bd291 (DATADG) Thu Dec 01 16:48:12 CST 2016 NOTE: PST update: grp = 1 NOTE: requesting all-instance membership refresh for group=1 Thu Dec 01 16:48:12 CST 2016 NOTE: membership refresh pending for group 1/0x489bd291 (DATADG) SUCCESS: refreshed membership for 1/0x489bd291 (DATADG) Thu Dec 01 16:48:18 CST 2016 NOTE: starting rebalance of group 1/0x489bd291 (DATADG) at power 8 Starting background process ARB0 Starting background process ARB1 ARB0 started with pid=18, OS id=16133 Thu Dec 01 16:48:19 CST 2016 Starting background process ARB2 ARB1 started with pid=19, OS id=16135 Thu Dec 01 16:48:19 CST 2016 Starting background process ARB3 ARB2 started with pid=21, OS id=16142 Thu Dec 01 16:48:19 CST 2016 Starting background process ARB4 ARB3 started with pid=22, OS id=16144 Thu Dec 01 16:48:19 CST 2016 Starting background process ARB5 ARB4 started with pid=23, OS id=16146 Thu Dec 01 16:48:19 CST 2016 Starting background process ARB6 ARB5 started with pid=24, OS id=16148 Thu Dec 01 16:48:20 CST 2016 Starting background process ARB7 ARB6 started with pid=25, OS id=16150 Thu Dec 01 16:48:20 CST 2016 NOTE: assigning ARB0 to group 1/0x489bd291 (DATADG) ARB7 started with pid=26, OS id=16157 Thu Dec 01 16:48:20 CST 2016 NOTE: assigning ARB1 to group 1/0x489bd291 (DATADG) NOTE: assigning ARB2 to group 1/0x489bd291 (DATADG) NOTE: assigning ARB3 to group 1/0x489bd291 (DATADG) NOTE: assigning ARB4 to group 1/0x489bd291 (DATADG) NOTE: assigning ARB5 to group 1/0x489bd291 (DATADG) NOTE: assigning ARB6 to group 1/0x489bd291 (DATADG) NOTE: assigning ARB7 to group 1/0x489bd291 (DATADG) Thu Dec 01 16:48:48 CST 2016 NOTE: stopping process ARB5 NOTE: stopping process ARB2 NOTE: stopping process ARB7 NOTE: stopping process ARB1 NOTE: stopping process ARB6 Thu Dec 01 16:49:01 CST 2016 NOTE: stopping process ARB0 Thu Dec 01 16:49:11 CST 2016 NOTE: stopping process ARB4 NOTE: stopping process ARB3 Thu Dec 01 16:49:14 CST 2016 SUCCESS: rebalance completed for group 1/0x489bd291 (DATADG) Thu Dec 01 16:49:14 CST 2016 SUCCESS: rebalance completed for group 1/0x489bd291 (DATADG) NOTE: PST update: grp = 1 NOTE: PST update: grp = 1
从信息NOTE: stopping process ARB0,可知在执行alter diskgroup datadg rebalance power 8命 令后,终止了之前所启动的rebalance进程,后面启动了8个ARB进程来完成rebalance操作。