同事对rac集群生成性能报告时发现rac集群有一个实例没有生成awr快照,另一个实例快照正常。下面是具体处理步骤。
1号实例没有生成awr快照
SQL> select SNAP_ID,END_INTERVAL_TIME,instance_number from dba_hist_snapshot where instance_number=1; no rows selected
2号实快照正常
SQL> set long 200 SQL> set linesize 200 SQL> select * from ( select SNAP_ID,END_INTERVAL_TIME,instance_number from dba_hist_snapshot where instance_number=2 order by SNAP_ID desc) where rownum < =10; SNAP_ID END_INTERVAL_TIME INSTANCE_NUMBER ---------- --------------------------------------------------------------------------- --------------- 24405 17-AUG-19 07.00.47.595 PM 2 24404 17-AUG-19 06.00.42.150 PM 2 24403 17-AUG-19 05.00.37.041 PM 2 24402 17-AUG-19 04.00.31.774 PM 2 24401 17-AUG-19 03.00.26.414 PM 2 24400 17-AUG-19 02.00.21.176 PM 2 24399 17-AUG-19 01.00.16.316 PM 2 24398 17-AUG-19 12.00.10.997 PM 2 24397 17-AUG-19 11.00.05.446 AM 2 24396 17-AUG-19 10.00.59.801 AM 2 10 rows selected.
mmon进程与awr快照相关,mmnl与ash相关,如是查看两个实例的mmon与mmnl进程
2号实例
[root@db2 ~]# ps -ef | grep mmon root 128329 127956 0 18:11 pts/2 00:00:00 grep mmon oracle 201527 1 0 2018 ? 17:17:11 ora_mmon_RLZY2 [root@db2 ~]# ps -ef | grep mmnl root 131772 127956 0 18:17 pts/2 00:00:00 grep mmnl oracle 201531 1 0 2018 ? 1-06:06:24 ora_mmnl_RLZY2
1号实例
[root@db1 ~]# ps -ef | grep mmon root 239020 238963 0 18:52 pts/2 00:00:00 grep mmon [root@db1 ~]# ps -ef | grep mmnl root 239052 238963 0 18:52 pts/2 00:00:00 grep mmnl
可以看到1号实例没有mmon与mmnl进程了。
如是查看1号实例的mmon进程的跟踪文件
[root@db1 trace]# ls -lrt *mmon*.trc -rw-r----- 1 oracle asmadmin 1351052 Jan 19 2018 RLZY1_mmon_20073.trc -rw-r----- 1 oracle asmadmin 173031 Jan 22 2018 RLZY1_mmon_49119.trc [root@db1 trace]# more RLZY1_mmon_49119.trc Trace file /u01/app/oracle/diag/rdbms/rlzy/RLZY1/trace/RLZY1_mmon_49119.trc Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP, Data Mining and Real Application Testing options ORACLE_HOME = /u01/app/oracle/product/11.2.0/db_1 System name: Linux Node name: db1 Release: 3.8.13-68.3.4.el6uek.x86_64 Version: #2 SMP Tue Jul 14 15:03:36 PDT 2015 Machine: x86_64 Instance name: RLZY1 Redo thread mounted by this instance: 1 Oracle process number: 36 Unix process pid: 49119, image: oracle@db1 (MMON) *** 2018-01-19 13:55:20.030 *** SESSION ID:(1369.1) 2018-01-19 13:55:20.030 *** CLIENT ID:() 2018-01-19 13:55:20.030 *** SERVICE NAME:() 2018-01-19 13:55:20.030 *** MODULE NAME:() 2018-01-19 13:55:20.030 *** ACTION NAME:() 2018-01-19 13:55:20.030 minact-scn slave-status: grec-scn:0x0000.00000000 gmin-scn:0x0000.00000000 gcalc-scn:0x0000.00000000 *** 2018-01-19 14:00:20.643 minact-scn master-status: grec-scn:0x0e0e.55ad96ef gmin-scn:0x0e0e.55abf256 gcalc-scn:0x0e0e.55ac2a0a .......... KEBM: MMON action policy violation. 'Block Cleanout Optim, Undo Segment Scan' viol=1; err=12751 minact-scn master-status: grec-scn:0x0e0e.5f0ebf8c gmin-scn:0x0e0e.5f0eac2e gcalc-scn:0x0e0e.5f0ead91 DDE rules only execution for: ORA 12751 *** 2018-01-22 07:06:04.060 ----- START Event Driven Actions Dump ---- ---- END Event Driven Actions Dump ---- ----- START DDE Actions Dump ----- Executing SYNC actions Executing ASYNC actions ----- START DDE Action: 'ORA_12751_DUMP' (Sync) ----- Runtime exceeded 300 seconds Time limit violation detected at: ksedsts()+465< -kspol_12751_dump()+145<-dbgdaExecuteAction()+1065<-dbgerRunAction()+109<-dbgerRunActions()+4134<-dbgexPhaseII()+1873<-dbgexProcessError()+2680<-dbgeExecuteForError()+88<-dbgePostErrorKGE()+2136<-dbkePostKGE_kgsf()+71<-kge selv()+276<-ksesecl0()+162 <-ksucin()+147<-kcbzwb()+2727<-kcbgtcr()+31325<-ktucloUsMinScn()+539<-ktucloUsegScan()+992<-ksb_run_managed_action()+384<-ksbcti()+2490<-ksbabs()+1735<-kebm_mmon_main()+209<-ksbrdp()+1045<-opirip()+623<-opidrv()+603<-sou2o()+103<-opimai _real()+250<-ssthrdmain()+265 <-main()+201<-__libc_start_main()+253Current Wait Stack: 0: waiting for 'gc buffer busy acquire' file#=0x5, block#=0x278, class#=0x49 wait_id=255378 seq_num=59358 snap_id=1 wait times: snap=5 min 5 sec, exc=5 min 5 sec, total=5 min 5 sec wait times: max=infinite, heur=5 min 5 sec wait counts: calls=358 os=358 in_wait=1 iflags=0x15a2 There is at least one session blocking this session. Dumping 1 direct blocker(s): inst: 1, sid: 990, ser: 1 Dumping final blocker: inst: 1, sid: 990, ser: 1 Wait State: fixed_waits=0 flags=0x22 boundary=(nil)/-1 Session Wait History: elapsed time of 0.000061 sec since current wait 0: waited for 'gc cr block 2-way' =0x5, =0x258, =0x47 wait_id=255377 seq_num=59357 snap_id=1 wait times: snap=0.000478 sec, exc=0.000478 sec, total=0.000478 sec wait times: max=infinite wait counts: calls=1 os=1 occurred after 0.000122 sec of elapsed time 1: waited for 'gc cr block 2-way' =0x5, =0x228, =0x45 wait_id=255376 seq_num=59356 snap_id=1 wait times: snap=0.000741 sec, exc=0.000741 sec, total=0.000741 sec wait times: max=infinite wait counts: calls=1 os=1 occurred after 0.000120 sec of elapsed time 2: waited for 'gc cr block 2-way' =0x5, =0x138, =0x43 wait_id=255375 seq_num=59355 snap_id=1 wait times: snap=0.000528 sec, exc=0.000528 sec, total=0.000528 sec wait times: max=infinite wait counts: calls=1 os=1 occurred after 0.000111 sec of elapsed time 3: waited for 'gc cr block 2-way' =0x5, =0xb8, =0x41 wait_id=255374 seq_num=59354 snap_id=1 wait times: snap=0.000583 sec, exc=0.000583 sec, total=0.000583 sec wait times: max=infinite wait counts: calls=1 os=1 occurred after 0.000139 sec of elapsed time 4: waited for 'gc cr block 2-way' =0x5, =0x110, =0x37 wait_id=255373 seq_num=59353 snap_id=1 wait times: snap=0.000541 sec, exc=0.000541 sec, total=0.000541 sec wait times: max=infinite wait counts: calls=1 os=1 occurred after 0.000110 sec of elapsed time 5: waited for 'gc cr block 2-way' =0x5, =0x100, =0x35 wait_id=255372 seq_num=59352 snap_id=1 wait times: snap=0.000629 sec, exc=0.000629 sec, total=0.000629 sec wait times: max=infinite wait counts: calls=1 os=1 occurred after 0.000158 sec of elapsed time 6: waited for 'gc cr block 2-way' =0x5, =0xf0, =0x33 wait_id=255371 seq_num=59351 snap_id=1 wait times: snap=0.000617 sec, exc=0.000617 sec, total=0.000617 sec wait times: max=infinite wait counts: calls=1 os=1 occurred after 0.000128 sec of elapsed time 7: waited for 'gc cr block 2-way' =0x5, =0xe0, =0x31 wait_id=255370 seq_num=59350 snap_id=1 wait times: snap=0.000561 sec, exc=0.000561 sec, total=0.000561 sec wait times: max=infinite wait counts: calls=1 os=1 occurred after 0.000124 sec of elapsed time 8: waited for 'gc cr block 2-way' =0x5, =0xd0, =0x2f wait_id=255369 seq_num=59349 snap_id=1 wait times: snap=0.000565 sec, exc=0.000565 sec, total=0.000565 sec wait times: max=infinite wait counts: calls=1 os=1 occurred after 0.000128 sec of elapsed time 9: waited for 'gc cr block 2-way' =0x5, =0xc0, =0x2d wait_id=255368 seq_num=59348 snap_id=1 wait times: snap=0.000555 sec, exc=0.000555 sec, total=0.000555 sec wait times: max=infinite wait counts: calls=1 os=1 occurred after 0.000125 sec of elapsed time Sampled Session History of session 1369 serial 1 --------------------------------------------------- The sampled session history is constructed by sampling the target session every 1 second. The sampling process captures at each sample if the session is in a non-idle wait, an idle wait, or not in a wait. If the session is in a non-idle wait then one interval is shown for all the samples the session was in the same non-idle wait. If the session is in an idle wait or not in a wait for consecutive samples then one interval is shown for all the consecutive samples. Though we display these consecutive samples in a single interval the session may NOT be continuously idle or not in a wait (the sampling process does not know). The history is displayed in reverse chronological order. sample interval: 1 sec, max history 120 sec --------------------------------------------------- [121 samples, 07:04:03 - 07:06:03] waited for 'gc buffer busy acquire', seq_num: 59358 p1: 'file#'=0x5 p2: 'block#'=0x278 p3: 'class#'=0x49 time_waited: >= 120 sec (still in wait) --------------------------------------------------- Sampled Session History Summary: longest_non_idle_wait: 'gc buffer busy acquire' [121 samples, 07:04:03 - 07:06:03] time_waited: >= 120 sec (still in wait) --------------------------------------------------- ----- END DDE Action: 'ORA_12751_DUMP' (SUCCESS, 1 csec) ----- ----- END DDE Actions Dump (total 1 csec) ----- KEBM: MMON action policy violation. 'Block Cleanout Optim, Undo Segment Scan' viol=1; err=12751 minact-scn master-status: grec-scn:0x0e0e.5f0ec4ce gmin-scn:0x0e0e.5f0eac2e gcalc-scn:0x0e0e.5f0ead91 *** 2018-01-22 07:11:11.071 DDE rules only execution for: ORA 12751 ----- START Event Driven Actions Dump ---- ---- END Event Driven Actions Dump ----
ORA12751的错误原因是陈旧的SYS对象统计数据会导致生成次优执行计划,从而使AWR自动刷新从操作的语句运行更长时间和超时。
解决方法就是收集新的SYS对象统计信息,为优化器提供更好的统计信息,并生成更高效的执行计划
SQL> EXEC DBMS_STATS.GATHER_FIXED_OBJECTS_STATS; PL/SQL procedure successfully completed. SQL> EXEC DBMS_STATS.GATHER_SCHEMA_STATS ('SYS'); PL/SQL procedure successfully completed.
下面就是重启mmon和mmnl进程
SQL> alter system enable restricted session; System altered. SQL> alter system disable restricted session; System altered.
查看alert日志可以看到mmon和mmnl进程已经重启了
Sat Aug 17 19:18:22 2019 Starting background process MMON Sat Aug 17 19:18:22 2019 Starting background process MMNL MMON started with pid=399, OS id=10373 Sat Aug 17 19:18:22 2019 MMNL started with pid=405, OS id=10375 ALTER SYSTEM enable restricted session; Sat Aug 17 19:18:25 2019 Some DDE async actions failed or were cancelled Sat Aug 17 19:18:25 2019 Sweep [inc][48021]: completed Sweep [inc][48011]: completed Sweep [inc][48002]: completed Sweep [inc][35010]: completed Sweep [inc][34706]: completed Sweep [inc][34242]: completed Sweep [inc][33546]: completed Sweep [inc][33394]: completed Sweep [inc2][48021]: completed Sweep [inc2][48011]: completed Sweep [inc2][48002]: completed Sweep [inc2][35010]: completed Sweep [inc2][34706]: completed Sweep [inc2][34242]: completed Sweep [inc2][33546]: completed Sweep [inc2][33394]: completed minact-scn: Inst 1 is a slave inc#:30 mmon proc-id:10373 status:0x2 minact-scn status: grec-scn:0x0e0e.61cb2e9c gmin-scn:0x0e0e.5f0eac2e gcalc-scn:0x0e0e.5f0ead91 Sat Aug 17 19:18:29 2019 db_recovery_file_dest_size of 10240 MB is 20.87% used. This is a user-specified limit on the amount of space that will be used by this database for recovery-related files, and does not reflect the amount of space available in the underlying filesystem or ASM diskgroup. Sat Aug 17 19:18:42 2019 ALTER SYSTEM disable restricted session;
再查看1号实例的mmon与mmnl进程状态
[root@db1 ~]# ps -ef | grep mmnl oracle 10375 1 0 19:18 ? 00:00:00 ora_mmnl_RLZY1 root 10611 238963 0 19:18 pts/2 00:00:00 grep mmnl [root@db1 ~]# ps -ef | grep mmon oracle 10373 1 7 19:18 ? 00:00:02 ora_mmon_RLZY1 root 10630 238963 0 19:18 pts/2 00:00:00 grep mmon
过了两个小时去查看1号实例已经生成了两条快照信息
SQL> set long 200 SQL> set linesize 200 SQL> select * from ( select SNAP_ID,END_INTERVAL_TIME,instance_number from dba_hist_snapshot where instance_number=1 order by SNAP_ID desc) where rownum < =10; SNAP_ID END_INTERVAL_TIME INSTANCE_NUMBER ---------- --------------------------------------------------------------------------- --------------- 24407 17-AUG-19 09.00.58.595 PM 1 24406 17-AUG-19 08.00.40.244 PM 1
到此问题解决了。