控制文件
ckpt的一项任务是更新数据文件头和控制文件,记录检查点信息,这些信息对于数据库的恢复和完整性校验都至关重要.下面来看一下控制文件和数据文件头都记录了哪些信息.通过以下内部命令可以转储oracle的数据文件头信息:
alter session set events ‘immediate trace name file_hdrs level 10’;
首先以immediate方式关闭数据库,在mount状态下执行该命令,研究一下此时转储的文件头信息:
SQL> shutdown immediate; Database closed. Database dismounted. ORACLE instance shut down. SQL> startup mount ORACLE instance started. Total System Global Area 167772160 bytes Fixed Size 1218316 bytes Variable Size 67111156 bytes Database Buffers 96468992 bytes Redo Buffers 2973696 bytes Database mounted. SQL> alter session set events 'immediate trace name file_hdrs level 10'; Session altered. SQL> select 2 d.value||'/'||lower(rtrim(i.instance, 3 chr(0)))||'_ora_'||p.spid||'.trc' trace_file_name 4 from ( select p.spid 5 from v$mystat m, 6 v$session s,v$process p 7 where m.statistic# = 1 and s.sid = m.sid and p.addr = s.paddr) p, 8 ( select t.instance from v$thread t,v$parameter v 9 where v.name = 'thread' and 10 (v.value = 0 or t.thread# = to_number(v.value))) i, 11 ( select value from v$parameter 12 where name = 'user_dump_dest') d 13 / TRACE_FILE_NAME -------------------------------------------------------------------------------- /u01/app/oracle/admin/jingyong/udump/jingyong_ora_4606.trc
查看跟踪文件信息,选取一个文件的信息(这里选择userso1.dbf文件),这类trace文件的信息包含两个部分,一部分来自控制文件,另一部分来自数据文件:
DATA FILE #4: (name #5) /u01/app/oracle/product/10.2.0/oradata/jingyong/users01.dbf creation size=0 block size=8192 status=0xe head=5 tail=5 dup=1 tablespace 4, index=4 krfil=4 prev_file=0 unrecoverable scn: 0x0000.00000000 01/01/1988 00:00:00 Checkpoint cnt:192 scn: 0x0000.0011601b 01/18/2013 06:31:30 Stop scn: 0x0000.0011601b 01/18/2013 06:31:30 Creation Checkpointed at scn: 0x0000.00002946 06/30/2005 19:10:40 ...... V10 STYLE. FILE HEADER: Compatibility Vsn = 169869568=0xa200100 Db ID=3172629284=0xbd1a7b24, Db Name='JINGYONG' Activation ID=0=0x0 Control Seq=4511=0x119f, File size=6400=0x1900 File Number=4, Blksiz=8192, File Type=3 DATA Tablespace #4 - USERS rel_fn:4 Creation at scn: 0x0000.00002946 06/30/2005 19:10:40 Backup taken at scn: 0x0000.000d3f79 01/06/2013 09:20:39 thread:1 reset logs count:0x2fecc5c7 scn: 0x0000.000e487c reset logs terminal rcv data:0x0 scn: 0x0000.00000000 prev reset logs count:0x2fecb1fc scn: 0x0000.000df5e6 prev reset logs terminal rcv data:0x0 scn: 0x0000.00000000 recovered at 01/14/2013 13:30:37 status:0x0 root dba:0x00000000 chkpt cnt: 192 ctl cnt:191 begin-hot-backup file size: 640 Checkpointed at scn: 0x0000.0011601b 01/18/2013 06:31:30 thread:1 rba:(0x9.1a0.10) .....
其中”FILE HEADER”开始的信息就是来自数据文件头,之前的相关内容来自控制文件,在mount状态下将users01.dbf文件移除,重新转储数据文件头:
With the Partitioning, OLAP and Data Mining options [oracle@jingyong ~]$ sqlplus / as sysdba SQL*Plus: Release 10.2.0.1.0 - Production on Fri Jan 18 06:37:49 2013 Copyright (c) 1982, 2005, Oracle. All rights reserved. Connected to: Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Production With the Partitioning, OLAP and Data Mining options SQL> ! mv /u01/app/oracle/product/10.2.0/oradata/jingyong/users01.dbf /u01/app/oracle/product/10.2.0/oradata/jingyong/users01.dbf.bak SQL> alter session set events 'immediate trace name file_hdrs level 10'; Session altered. SQL> select 2 d.value||'/'||lower(rtrim(i.instance, 3 chr(0)))||'_ora_'||p.spid||'.trc' trace_file_name 4 from ( select p.spid 5 from v$mystat m, 6 v$session s,v$process p 7 where m.statistic# = 1 and s.sid = m.sid and p.addr = s.paddr) p, 8 ( select t.instance from v$thread t,v$parameter v 9 where v.name = 'thread' and 10 (v.value = 0 or t.thread# = to_number(v.value))) i, 11 ( select value from v$parameter 12 where name = 'user_dump_dest') d 13 / TRACE_FILE_NAME -------------------------------------------------------------------------------- /u01/app/oracle/admin/jingyong/udump/jingyong_ora_4620.trc
检查现在生成的跟踪文件可以看到,由于users01.dbf文件丢失,”FILE HEADER”部分信息将无法获得:
DATA FILE #4: (name #5) /u01/app/oracle/product/10.2.0/oradata/jingyong/users01.dbf creation size=0 block size=8192 status=0xe head=5 tail=5 dup=1 tablespace 4, index=4 krfil=4 prev_file=0 unrecoverable scn: 0x0000.00000000 01/01/1988 00:00:00 Checkpoint cnt:192 scn: 0x0000.0011601b 01/18/2013 06:31:30 Stop scn: 0x0000.0011601b 01/18/2013 06:31:30 Creation Checkpointed at scn: 0x0000.00002946 06/30/2005 19:10:40 thread:0 rba:(0x0.0.0) ...... ORA-01157: cannot identify/lock data file 4 - see DBWR trace file ORA-01110: data file 4: '/u01/app/oracle/product/10.2.0/oradata/jingyong/users01.dbf' *** Error 1157 in open/read file # 4 ***
此时报出的错误信息是,文件无法找到,也就是说当执行trace file_hdrs时需要读取数据文件头,获得相关信息,回过头来看一下来自控制文件部分的信息,其中包含:
Checkpoint cnt:192 scn: 0x0000.0011601b 01/18/2013 06:31:30
在”FILE HEADER”部分信息中包含了如下部分:
status:0x0 root dba:0x00000000 chkpt cnt: 192 ctl cnt:191
begin-hot-backup file size: 640
Checkpointed at scn: 0x0000.0011601b 01/18/2013 06:31:30
其中控制文件中记录的scn指最后一次成功完成的检查点scn;数据文件头中记录的checkpointed at scn指数据文件头中记录的最后一次成功完成的检查点scn;这两者在正常情况下是相等的.此外在控制文件和数据文件头都记录一个检查点计数(chkpt cnt或checkpoint cnt)而且数据文件头还记录了一个控制文件检查点计数(ctl cnt),在以上输出中ctl cnt:191比控制文件中的checkpoint cnt192要小1,这是因为当检查点更新控制文件和数据文件头上的chkpt cnt/checkpoint cnt信息时,在更新控制文件之前,可以获得当前的控制文件的clt cnt,这个信息被记入到数据文件头中,也就是ctlcnt:191,为什么要写这个ctl cnt到数据文件头了.是因为不能保证当前更新控制文件上的checkpoint cnt一定会成功(数据库可能突然crash了),记录之前成功的ctl cnt可以确保上一次的checkpoint是成功完成的,从而节了校验步骤.
数据库的启动验证
在数据库启动过程中的检验包含以下两个步骤;
第一步检查数据文件头中的checkpoint cnt是否与对应的控制文件中的checkpoint cnt一致.如果相等,则进行第二步检查.
第二步检查数据文件头的开始scn和对应的控制文件中的结束scn是否一致,如果控制文件中的结束scn等于数据文件头中的开始scn,则不需要对那个文件进行恢复.
对于每个数据文件都要完成检查后才打开数据库,同时将每个数据文件的结束的scn设置为无穷大也就是0xFFFFFFF.FFFFF
当使用alter session set events ‘immediate trace name file_hdrs level 10’来转储数据文件头信息时,oracle会转储两部分信息,一部分来自控制文件,另一部分来自数据文件,在数据库启动过程中,这两部分信息要用来进行启动验证.通过以下过程来进一步深入探讨一下这部分内容.
首先来看一下来自mount状态控制文件部分转储;
DATA FILE #4: (name #5) /u01/app/oracle/product/10.2.0/oradata/jingyong/users01.dbf creation size=0 block size=8192 status=0xe head=5 tail=5 dup=1 tablespace 4, index=4 krfil=4 prev_file=0 unrecoverable scn: 0x0000.00000000 01/01/1988 00:00:00 Checkpoint cnt:192 scn: 0x0000.0011601b 01/18/2013 06:31:30 Stop scn: 0x0000.0011601b 01/18/2013 06:31:30 Creation Checkpointed at scn: 0x0000.00002946 06/30/2005 19:10:40 thread:0 rba:(0x0.0.0) 这部分中包含的重要信息有检查点计数(ckeckpoint cnt:192),检查点scn (scn: 0x0000.0011601b 01/18/2013 06:31:30)和数据文件Stop scn( top scn: 0x0000.0011601b 01/18/2013 06:31:30).
接下来再来看来自数据文件头的信息:
V10 STYLE. FILE HEADER: Compatibility Vsn = 169869568=0xa200100 Db ID=3172629284=0xbd1a7b24, Db Name='JINGYONG' Activation ID=0=0x0 Control Seq=4511=0x119f, File size=6400=0x1900 File Number=4, Blksiz=8192, File Type=3 DATA Tablespace #4 - USERS rel_fn:4 Creation at scn: 0x0000.00002946 06/30/2005 19:10:40 Backup taken at scn: 0x0000.000d3f79 01/06/2013 09:20:39 thread:1 reset logs count:0x2fecc5c7 scn: 0x0000.000e487c reset logs terminal rcv data:0x0 scn: 0x0000.00000000 prev reset logs count:0x2fecb1fc scn: 0x0000.000df5e6 prev reset logs terminal rcv data:0x0 scn: 0x0000.00000000 recovered at 01/14/2013 13:30:37 status:0x0 root dba:0x00000000 chkpt cnt: 192 ctl cnt:191 begin-hot-backup file size: 640 Checkpointed at scn: 0x0000.0011601b 01/18/2013 06:31:30 thread:1 rba:(0x9.1a0.10)
这部分中包含的重要信息有检查点scn(Checkpointed at scn: 0x0000.0011601b 01/18/2013 06:31:30)
和检查点计数库(chkpt cnt: 192 ctl cnt:191),这两者都和控制文件中所记录的一致.如果这两者一致.数据库启动时就能通过验证,启动数据库.
那么如果不一致,oracle则会请求进行恢复;以下是从崩溃中进行恢复users01.dbf文件.首先第一部分从控制文件中获得的信息是相同的:
先复制users01.dbf
[oracle@jingyong udump]$ cp /u01/app/oracle/product/10.2.0/oradata/jingyong/users01.dbf /u01/app/oracle/product/10.2.0/oradata/jingyong/users01.dbf.bak
使用shutdown abort来模拟崩溃故障
SQL> shutdown abort; ORACLE instance shut down.
再来删除users01.dbf文件
[oracle@jingyong udump]$ mv /u01/app/oracle/product/10.2.0/oradata/jingyong/users01.dbf /u01/app/oracle/product/10.2.0/oradata/jingyong/users01.dbf.del
再从备份的users01.dbf.bak文件中还原users01.dbf文件
[oracle@jingyong udump]$ mv /u01/app/oracle/product/10.2.0/oradata/jingyong/users01.dbf.bak /u01/app/oracle/product/10.2.0/oradata/jingyong/users01.dbf
再启动到mount状态下转储数据文件头信息
SQL> startup mount ORACLE instance started. Total System Global Area 167772160 bytes Fixed Size 1218316 bytes Variable Size 67111156 bytes Database Buffers 96468992 bytes Redo Buffers 2973696 bytes Database mounted.
再业转储数据文件头信息
Database mounted. SQL> alter session set events 'immediate trace name file_hdrs level 10'; Session altered. SQL> select 2 d.value||'/'||lower(rtrim(i.instance, 3 chr(0)))||'_ora_'||p.spid||'.trc' trace_file_name 4 from ( select p.spid 5 from v$mystat m, 6 v$session s,v$process p 7 where m.statistic# = 1 and s.sid = m.sid and p.addr = s.paddr) p, 8 ( select t.instance from v$thread t,v$parameter v 9 where v.name = 'thread' and 10 (v.value = 0 or t.thread# = to_number(v.value))) i, 11 ( select value from v$parameter 12 where name = 'user_dump_dest') d 13 / TRACE_FILE_NAME -------------------------------------------------------------------------------- /u01/app/oracle/admin/jingyong/udump/jingyong_ora_4923.trc DATA FILE #4: (name #5) /u01/app/oracle/product/10.2.0/oradata/jingyong/users01.dbf creation size=0 block size=8192 status=0xe head=5 tail=5 dup=1 tablespace 4, index=4 krfil=4 prev_file=0 unrecoverable scn: 0x0000.00000000 01/01/1988 00:00:00 Checkpoint cnt:201 scn: 0x0000.00120188 01/18/2013 07:15:10 Stop scn: 0x0000.00120188 01/18/2013 07:15:10 Creation Checkpointed at scn: 0x0000.00002946 06/30/2005 19:10:40 thread:0 rba:(0x0.0.0)
得到的控制文件中记录检查点计数(Checkpoint cnt:201),检查点scn
(scn: 0x0000.00120188 01/18/2013 07:15:10),和数据文件Stop scn (Stop scn: 0x0000.00120188 01/18/2013 07:15:10) V10 STYLE. FILE HEADER: Compatibility Vsn = 169869568=0xa200100 Db ID=3172629284=0xbd1a7b24, Db Name='JINGYONG' Activation ID=0=0x0 Control Seq=4545=0x11c1, File size=6400=0x1900 File Number=4, Blksiz=8192, File Type=3 DATA Tablespace #4 - USERS rel_fn:4 Creation at scn: 0x0000.00002946 06/30/2005 19:10:40 Backup taken at scn: 0x0000.000d3f79 01/06/2013 09:20:39 thread:1 reset logs count:0x2fecc5c7 scn: 0x0000.000e487c reset logs terminal rcv data:0x0 scn: 0x0000.00000000 prev reset logs count:0x2fecb1fc scn: 0x0000.000df5e6 prev reset logs terminal rcv data:0x0 scn: 0x0000.00000000 recovered at 01/18/2013 07:12:48 status:0x4 root dba:0x00000000 chkpt cnt: 199 ctl cnt:198 begin-hot-backup file size: 640 Checkpointed at scn: 0x0000.00120027 01/18/2013 07:12:49 thread:1 rba:(0xb.2.10)
而从文件头中获得的备份文件信息则是:检查点是Checkpointed at scn: 0x0000.00120027 01/18/2013 07:12:49
检查点计数为:chkpt cnt: 199 ctl cnt:198
数据文件头中的检查计数为chkpt cnt: 199小于控制文件中的记录的Checkpoint cnt:201oracle可以判断文件是从备份中恢复的,或者文件故障,需要进行介质恢复.如果此时试图打开数据库,则oracle提示文件需要介质恢复.而且控制文件中的检查点scn: 0x0000.00120188与数据文件头中的Checkpointed at scn: 0x0000.00120027也不相同
SQL> alter database open; alter database open * ERROR at line 1: ORA-01113: file 4 needs media recovery ORA-01110: data file 4: '/u01/app/oracle/product/10.2.0/oradata/jingyong/users01.dbf'
执行介质恢复
SQL> recover datafile 4; Media recovery complete.
再来对数据文件头信息进行转储,来看一下恢复完成之后,控制文件和数据文件头的变化.
SQL> alter session set events 'immediate trace name file_hdrs level 10'; Session altered. SQL> select 2 d.value||'/'||lower(rtrim(i.instance, 3 chr(0)))||'_ora_'||p.spid||'.trc' trace_file_name 4 from ( select p.spid 5 from v$mystat m, 6 v$session s,v$process p 7 where m.statistic# = 1 and s.sid = m.sid and p.addr = s.paddr) p, 8 ( select t.instance from v$thread t,v$parameter v 9 where v.name = 'thread' and 10 (v.value = 0 or t.thread# = to_number(v.value))) i, 11 ( select value from v$parameter 12 where name = 'user_dump_dest') d 13 / TRACE_FILE_NAME -------------------------------------------------------------------------------- /u01/app/oracle/admin/jingyong/udump/jingyong_ora_4940.trc
首先看控制文件的变化:
DATA FILE #4: (name #5) /u01/app/oracle/product/10.2.0/oradata/jingyong/users01.dbf creation size=0 block size=8192 status=0xe head=5 tail=5 dup=1 tablespace 4, index=4 krfil=4 prev_file=0 unrecoverable scn: 0x0000.00000000 01/01/1988 00:00:00 Checkpoint cnt:202 scn: 0x0000.00120188 01/18/2013 07:15:10 Stop scn: 0x0000.00120187 01/18/2013 07:15:10 Creation Checkpointed at scn: 0x0000.00002946 06/30/2005 19:10:40 thread:0 rba:(0x0.0.0)
检查点计数据Checkpoint cnt:202,执行了恢复之后,检查点计数较前增加了1,此时检查点scn是scn: 0x0000.00120188 01/18/2013 07:15:10,数据文件的Stop scn为
Stop scn: 0x0000.00120187 01/18/2013 07:15:10,说明数据文件stop scn和数据文件进行了同步.
以下是数据文件头信息:
V10 STYLE. FILE HEADER: Compatibility Vsn = 169869568=0xa200100 Db ID=3172629284=0xbd1a7b24, Db Name='JINGYONG' Activation ID=0=0x0 Control Seq=4554=0x11ca, File size=6400=0x1900 File Number=4, Blksiz=8192, File Type=3 DATA Tablespace #4 - USERS rel_fn:4 Creation at scn: 0x0000.00002946 06/30/2005 19:10:40 Backup taken at scn: 0x0000.000d3f79 01/06/2013 09:20:39 thread:1 reset logs count:0x2fecc5c7 scn: 0x0000.000e487c reset logs terminal rcv data:0x0 scn: 0x0000.00000000 prev reset logs count:0x2fecb1fc scn: 0x0000.000df5e6 prev reset logs terminal rcv data:0x0 scn: 0x0000.00000000 recovered at 01/18/2013 07:24:36 status:0x0 root dba:0x00000000 chkpt cnt: 202 ctl cnt:201 begin-hot-backup file size: 640 Checkpointed at scn: 0x0000.00120187 01/18/2013 07:15:10 thread:1 rba:(0xb.15a.10)
此时数据文件头信息显示检查点(Checkpointed at scn: 0x0000.00120187 01/18/2013 07:15:10)
和控制文件中记录的Stop scn(Stop scn: 0x0000.00120187 01/18/2013 07:15:10)一致,数据库启动可以顺利进行.检查点计数为(chkpt cnt: 202 ctl cnt:201)
打开数据库,看一看open阶段的变化:
SQL> alter database open; Database altered. SQL> alter session set events 'immediate trace name file_hdrs level 10'; Session altered. SQL> select 2 d.value||'/'||lower(rtrim(i.instance, 3 chr(0)))||'_ora_'||p.spid||'.trc' trace_file_name 4 from ( select p.spid 5 from v$mystat m, 6 v$session s,v$process p 7 where m.statistic# = 1 and s.sid = m.sid and p.addr = s.paddr) p, 8 ( select t.instance from v$thread t,v$parameter v 9 where v.name = 'thread' and 10 (v.value = 0 or t.thread# = to_number(v.value))) i, 11 ( select value from v$parameter 12 where name = 'user_dump_dest') d 13 / TRACE_FILE_NAME -------------------------------------------------------------------------------- /u01/app/oracle/admin/jingyong/udump/jingyong_ora_4968.trc
此时数据库恢复正常运行,控制文件信息如下:
DATA FILE #4: (name #5) /u01/app/oracle/product/10.2.0/oradata/jingyong/users01.dbf creation size=0 block size=8192 status=0xe head=5 tail=5 dup=1 tablespace 4, index=4 krfil=4 prev_file=0 unrecoverable scn: 0x0000.00000000 01/01/1988 00:00:00 Checkpoint cnt:203 scn: 0x0000.00120189 01/18/2013 07:28:46 Stop scn: 0xffff.ffffffff 01/18/2013 07:15:10 Creation Checkpointed at scn: 0x0000.00002946 06/30/2005 19:10:40 thread:0 rba:(0x0.0.0)
此时Stop scn被置为无穷大(Stop scn: 0xffff.ffffffff)
数据文件头信息如下,其中检查点信息和控制文件中记录的checkpoint信息一致:
V10 STYLE. FILE HEADER: Compatibility Vsn = 169869568=0xa200100 Db ID=3172629284=0xbd1a7b24, Db Name='JINGYONG' Activation ID=0=0x0 Control Seq=4557=0x11cd, File size=6400=0x1900 File Number=4, Blksiz=8192, File Type=3 DATA Tablespace #4 - USERS rel_fn:4 Creation at scn: 0x0000.00002946 06/30/2005 19:10:40 Backup taken at scn: 0x0000.000d3f79 01/06/2013 09:20:39 thread:1 reset logs count:0x2fecc5c7 scn: 0x0000.000e487c reset logs terminal rcv data:0x0 scn: 0x0000.00000000 prev reset logs count:0x2fecb1fc scn: 0x0000.000df5e6 prev reset logs terminal rcv data:0x0 scn: 0x0000.00000000 recovered at 01/18/2013 07:24:36 status:0x4 root dba:0x00000000 chkpt cnt: 203 ctl cnt:202 begin-hot-backup file size: 640 Checkpointed at scn: 0x0000.00120189 01/18/2013 07:28:46 thread:1 rba:(0xb.15a.10)