health monitor健康监控
从oracle 11g开始,oracle数据库包含对数据库运行诊断检查的健康监控框架
health check健康检查是对数据库的各层和组件乾地检查.健康检查会检测文件错误,物理和逻辑块错误,undo和redo错误,数据字典错误等等.健康检查会生成一个检查报告在许多情况下包含了问题解决的建议.健康检查有两种运行方式:
.reactive—故障诊断架构为了响应一个严重错误会自动运行健康检查.
.manual—–作为一个dba,可以使用dbms_hm包或企业管理器接口来手工运行健康检查.
健康检查会存储调查结果,建议并在ADR中存储其它的信息
健康检查能以以下两种方式来运行:
DB-online模式意味着当数据库处于打开状态(open或mount)时可以运行健康检查
DB-offline模式意味着当实例不可用且数据库处于关闭状态(nomount)时也能运行健康检查
所有的健康检查都能在DB-online模式下运行.只有重做日志完整性检查和数据库结构完整检查可以在DB-offline模式下运行.
健康检查的类型
健康监控运行以下检查:
DB Structure Integrity Check(数据库结构完整性检查)–这种检查验证数据文件的完整性,如果这些文件不能访问,文件错误
或者不一致时会报告这些错误信息.如果数据库在mount或者open状态,这种检查会检查控制文件中所列出的重做日志文件和数据文件.如果数据库在nomount状态,只会检查控制文件.
Data Block Integrity Check(数据块完整性检查)–这种检查会检查磁盘镜像块错误比如checksum故障,head/tail mismatch和数据块的逻辑不一致性.大多数的错误可以通过使用block media recovery来进行修复.错误块信息也会被v$database_block_corruption视图所捕获.这种检查不会检测inter-block或inter-segment错误.
Redo Integrity Check(重做完整性检查)–这种检查将会扫描重做日志内容的可访问性和错误信息,也能对归档日志文件进行检查.
这种检查会报告归档日志或重做日志的错误信息.
Undo Segment Integrity Check(回滚段完整性检查)–这种检查会发现逻辑undo错误.在定位一个undo错误之后,这种检查将使用PMON和SMON来尝试恢复这个错误事务.如果恢复失败,health monitor将会存储关于这个错误的信息到v$corrupt_xid_list中.大多数undo错误都可以通过强制提交来解决.
Transaction Interity Check–这种检查与undo segment integrity check是相同的只是它只检查一特定的事务.
Dictionary Integrity Check–这种检查会检查核心字典对象比如tab$或col$的完整性.它将执行以下操作:
.验证每一个字典对象的字典条目内容
.执行cross-row级别的检查.它将对字典中的行强制执行逻辑约束验证
.执行对象的关系检查,它将在字典对象之间强制执行父子关系验证
Dictionary Integrity Check操作会检查以下字典对象:
tab$, clu$, fet$, uet$, seg$, undo$, ts$, file$, obj$, ind$, icol$, col$, user$,
con$, cdef$, ccol$, bootstrap$, objauth$, ugroup$, tsq$, syn$, view$, typed_
view$, superobj$, seq$, lob$, coltype$, subcoltype$, ntab$, refcon$, opqtype$,
dependency$, access$, viewcon$, icoldep$, dual$, sysauth$, objpriv$, defrole$,ecol$.
手工运行health check
健康监控提供了两种方式来手工运行健康检查:
.使用dbms_hm包
.使用企业管理器接口
使用dbms_hm来运行健康检查
dbms_hm包中有一个run_check过程用来运行健康检查.为了调用run_check需要提供检查的名称和运行的名字比如:
SQL> exec dbms_hm.run_check('Dictionary Integrity Check','jy_dict_run'); PL/SQL procedure successfully completed.
为了获得一个健康检查名称列表执行以下查询:
SQL> select name from v$hm_check where internal_check='N'; NAME ---------------------------------------------------------------- DB Structure Integrity Check CF Block Integrity Check Data Block Integrity Check Redo Integrity Check Transaction Integrity Check Undo Segment Integrity Check Dictionary Integrity Check ASM Allocation Check
大多数的健康检查接受输入参数.可以通过查询v$hm_check_param来查看参数名和描述.一些参数是强制性的其它的是可选的.
如果一个可选参数被忽略将会使用缺省值.下面的查询将会显示所有健康检查的参数信息:
SQL> SELECT c.name check_name, p.name parameter_name, p.type, 2 p.default_value, p.description 3 FROM v$hm_check_param p, v$hm_check c 4 WHERE p.check_id = c.id and c.internal_check = 'N' 5 ORDER BY c.name; CHECK_NAME PARAMETER_NAME TYPE DEFAULT_VALUE DESCRIPTION -------------------------------- -------------------- -------------------- ---------------- ------------------------------ ASM Allocation Check ASM_DISK_GRP_NAME DBKH_PARAM_TEXT ASM 组名 CF Block Integrity Check CF_BL_NUM DBKH_PARAM_UB4 控制文件块号 Data Block Integrity Check BLC_DF_NUM DBKH_PARAM_UB4 文件号 Data Block Integrity Check BLC_BL_NUM DBKH_PARAM_UB4 块号 Dictionary Integrity Check CHECK_MASK DBKH_PARAM_TEXT ALL 检查掩码 Dictionary Integrity Check TABLE_NAME DBKH_PARAM_TEXT ALL_CORE_TABLES 表名 Redo Integrity Check SCN_TEXT DBKH_PARAM_TEXT 0 最新良好重做的 SCN (如果已知) Transaction Integrity Check TXN_ID DBKH_PARAM_TEXT 事务处理 ID Undo Segment Integrity Check USN_NUMBER DBKH_PARAM_TEXT 还原段号
输入参数通过input_params参数以name/value对用分号来分隔进行传递.下面的例子用事务ID作为参数进行事务完整性检查:
SQL>BEGIN DBMS_HM.RUN_CHECK ( check_name => 'Transaction Integrity Check', run_name => 'my_trans_run', input_params => 'TXN_ID=8.31.4'); END; / PL/SQL procedure successfully completed.
查看检查报告
在一个检查运行完后可以查看它的执行报告.这个可报告包括了调查结果,建议和其它的信息.也可以使用企业管理器,ADRCI工具,或者
dbms_hm包.其中企业管理器查看的报告格式为html,dbms_hm包查看的报告格式为html,xml和text,检查DRCL工个查看报告的格式为XML
检查运行的结果被存储在ADR,但报告不会立即生成.当你请求查看报告时可以使用dbms_hm或企业管理器来生成.如果报告不存在,首先
得用ADR中的检查结果数据来生成并以xml或html格式来存储.如果使用ADRCI工具如果报告文件不存必须首先运行命令来生成报告文件然后运行其它的命令来显示它的内容.
使用dbms_hm来查看检查报告
使用dbms_hm.get_run_report函数可以查看健康检查报告.这个函数可以以html,xml或text格式来显示.缺省格式为text:
SQL> SET LONG 100000 SQL> SET LONGCHUNKSIZE 1000 SQL> SET PAGESIZE 1000 SQL> SET LINESIZE 512 SQL> select dbms_hm.get_run_report('jy_dict_run') from dual; DBMS_HM.GET_RUN_REPORT('JY_DICT_RUN') -------------------------------------------------------------------------------- Run Name : JY_DICT_RUN Run Id : 1061 Check Name : Data Block Integrity Check Mode : REACTIVE Status : COMPLETED Start Time : 2007-05-12 22:11:02.032292 -07:00 End Time : 2007-05-12 22:11:20.835135 -07:00 Error Encountered : 0 Source Incident Id : 7418 Number of Incidents Created : 0 Input Paramters for the Run BLC_DF_NUM=1 BLC_BL_NUM=64349 Run Findings And Recommendations Finding Finding Name : Media Block Corruption Finding ID : 1065 Type : FAILURE Status : OPEN Priority : HIGH Message : Block 64349 in datafile 1: '/u01/app/oracle/dbs/t_db1.f' is media corrupt Message : Object BMRTEST1 owned by SYS might be unavailable Finding Finding Name : Media Block Corruption Finding ID : 1071 Type : FAILURE Status : OPEN Priority : HIGH Message : Block 64351 in datafile 1: '/u01/app/oracle/dbs/t_db1.f' is media corrupt Message : Object BMRTEST2 owned by SYS might be unavailable
使用ADRCI工具来查看检查报告
1.确保操作系统环境变量(比如 ORACLE_HOME)已经被设置.并输入以下命令:
[oracle@jyrac1 ~]$ adrci ADRCI: Release 11.2.0.1.0 - Production on Wed Jun 11 17:18:53 2014 Copyright (c) 1982, 2009, Oracle and/or its affiliates. All rights reserved. ADR base = "/u01/app/oracle"
工具启动后会显示以下提示符
adrci>
可以改变当前的ADR home目录.使用show homes命令可以列出所有的ADR home目录,set homepath命令用来改变当前ADR home目录.
adrci> show homes ADR Homes: diag/tnslsnr/jyrac1/listener diag/rdbms/cs/cs diag/rdbms/jytest/jytest diag/rdbms/jy/jy diag/rdbms/jycs/jycs adrci> set homepath diag/rdbms/jycs/jycs
2.输入以下命令:
show hm_run
这个命令会列出所有运行过并(存储在v$hm_run)注册到ADR档案库中的检查
adrci> show hm_run ADR Home = /u01/app/oracle/diag/tnslsnr/jyrac1/listener: ************************************************************************* 0 rows fetched <adr_relation name=""> <adr_home name="/u01/app/oracle/diag/tnslsnr/jyrac1/listener"> ADR Home = /u01/app/oracle/diag/rdbms/cs/cs: ************************************************************************* 0 rows fetched <adr_home name="/u01/app/oracle/diag/rdbms/cs/cs"> ADR Home = /u01/app/oracle/diag/rdbms/jytest/jytest: ************************************************************************* 0 rows fetched <adr_home name="/u01/app/oracle/diag/rdbms/jytest/jytest"> ADR Home = /u01/app/oracle/diag/rdbms/jy/jy: ************************************************************************* 0 rows fetched <adr_home name="/u01/app/oracle/diag/rdbms/jy/jy"> ADR Home = /u01/app/oracle/diag/rdbms/jycs/jycs: ************************************************************************* ********************************************************** HM RUN RECORD 1 ********************************************************** RUN_ID 1 RUN_NAME HM_RUN_1 CHECK_NAME DB Structure Integrity Check NAME_ID 2 MODE 2 START_TIME 2014-03-24 13:32:58.872509 +08:00 RESUME_TIME END_TIME 2014-03-24 13:33:01.710518 +08:00 MODIFIED_TIME 2014-03-24 13:33:01.710518 +08:00 TIMEOUT 0 FLAGS 0 STATUS 5 SRC_INCIDENT_ID 0 NUM_INCIDENTS 0 ERR_NUMBER 0 REPORT_FILE ********************************************************** HM RUN RECORD 2 ********************************************************** RUN_ID 61 RUN_NAME HM_RUN_61 CHECK_NAME DB Structure Integrity Check NAME_ID 2 MODE 2 START_TIME 2014-04-08 10:22:43.295203 +08:00 RESUME_TIME END_TIME 2014-04-08 10:22:43.723241 +08:00 MODIFIED_TIME 2014-04-08 10:22:43.723241 +08:00 TIMEOUT 0 FLAGS 0 STATUS 5 SRC_INCIDENT_ID 0 NUM_INCIDENTS 0 ERR_NUMBER 0 REPORT_FILE ********************************************************** HM RUN RECORD 3 ********************************************************** RUN_ID 81 RUN_NAME jy_dict_run CHECK_NAME Dictionary Integrity Check NAME_ID 24 MODE 0 START_TIME 2014-06-11 16:42:00.675293 +08:00 RESUME_TIME END_TIME 2014-06-11 16:42:02.950141 +08:00 MODIFIED_TIME 2014-06-11 17:14:33.658642 +08:00 TIMEOUT 0 FLAGS 0 STATUS 5 SRC_INCIDENT_ID 0 NUM_INCIDENTS 0 ERR_NUMBER 0 REPORT_FILE /u01/app/oracle/diag/rdbms/jycs/jycs/hm/HMREPORT_jy_dict_run.hm
3.定位你要创建报告的检查并注意检查运行的名称.如果检查报告已经存在那么report_file字段就会包含一个文件名比如上面的
/u01/app/oracle/diag/rdbms/jycs/jycs/hm/HMREPORT_jy_dict_run.hm.否则使用下面的命令来生成一个报告.
adrci> create report hm_run jy_dict_run_1 DIA-48448: This command does not support multiple ADR homes
这是因为当前存在多个ADR home目录
adrci> show homes ADR Homes: diag/tnslsnr/jyrac1/listener diag/rdbms/cs/cs diag/rdbms/jytest/jytest diag/rdbms/jy/jy diag/rdbms/jycs/jycs adrci> set homepath diag/rdbms/jycs/jycs adrci> create report hm_run jy_dict_run_1
4.执行以下命令来查看报告:
adrci> show report hm_run jy_dict_run_1 < ?xml version="1.0" encoding="US-ASCII"?>HM Report: jy_dict_run_1 <run_info> <check_name>Dictionary Integrity Check <run_id>101 <run_name>jy_dict_run_1 <run_mode>MANUAL <run_status>COMPLETED <run_error_num>0 <source_incident_id>0 <num_incidents_created>0 <run_start_time>2014-06-11 17:27:13.477462 +08:00 <run_end_time>2014-06-11 17:27:14.734166 +08:00 <run_parameters> <run_parameter>TABLE_NAME=ALL_CORE_TABLES <run_parameter>CHECK_MASK=ALL
health monitor视图
可以用视图来代替检查报告来查看特定检查的结果.可用的视图有v$hm_run,v$hm_finding,v$hm_recommendation
下面查询v$hm_run来判断运行检查的确良历史信息:
SQL> SELECT run_id, name, check_name, run_mode, src_incident FROM v$hm_run; RUN_ID NAME CHECK_NAME RUN_MODE SRC_INCIDENT ---------- -------------------------------- -------------------------------- -------- ------------ 61 HM_RUN_61 DB Structure Integrity Check REACTIVE 0 81 jy_dict_run Dictionary Integrity Check MANUAL 0 101 jy_dict_run_1 Dictionary Integrity Check MANUAL 0 1 HM_RUN_1 DB Structure Integrity Check REACTIVE 0
下面查询v$hm_finding来获得RUN_ID 61的详细信息:
SELECT type, description FROM v$hm_finding WHERE run_id = 61; TYPE DESCRIPTION ------------- ----------------------------------------------------------------------------- FAILURE Block 64349 in datafile 1: '/u01/app/oracle/dbs/t_db1.f' is media corrupt FAILURE Block 64351 in datafile 1: '/u01/app/oracle/dbs/t_db1.f' is media corrupt
health check参数
Table 9–6 Parameters for Data Block Integrity Check ------------------------------------------------------------------------------------------------------------------ Parameter Name Type Default Value Description ------------------------------------------------------------------------------------------------------------------ BLC_DF_NUM Number (none) Block datafile number BLC_BL_NUM Number (none) Data block number ------------------------------------------------------------------------------------------------------------------ Table 9–7 Parameters for Redo Integrity Check ------------------------------------------------------------------------------------------------------------------ Parameter Name Type Default Value Description ------------------------------------------------------------------------------------------------------------------ SCN_TEXT Text 0 SCN of the latest good redo (if known) ------------------------------------------------------------------------------------------------------------------ Table 9–8 Parameters for Undo Segment Integrity Check ------------------------------------------------------------------------------------------------------------------ Parameter Name Type Default Value Description ------------------------------------------------------------------------------------------------------------------ USN_NUMBER Text (none) Undo segment number ------------------------------------------------------------------------------------------------------------------ Table 9–9 Parameters for Transaction Integrity Check ------------------------------------------------------------------------------------------------------------------ Parameter Name Type Default Value Description ------------------------------------------------------------------------------------------------------------------ TXN_ID Text (none) Transaction ID ------------------------------------------------------------------------------------------------------------------ Table 9–10 Parameters for Dictionary Integrity Check ------------------------------------------------------------------------------------------------------------------ Parameter Name Type Default Value Description ------------------------------------------------------------------------------------------------------------------ CHECK_MASK Text ALL Possible values are: COLUMN_CHECKS—Run column checks only. Verify column-level constraints in the core tables. ROW_CHECKS—Run row checks only. Verify row-level constraints in the core tables. REFERENTIAL_CHECKS—Run referential checks only. Verify referential constraints in the core tables. ALL—Run all checks. TABLE_NAME Text ALL_CORE_TABLES Name of a single core table to check. If omitted, all core tables are checked. ------------------------------------------------------------------------------------------------------------------