今天上班同事说医保数据库expdp导出没有完成,同时医保业务人员报告登录系统有时能登录,有时不能登录。Expdp导出日志如下:
[IBMP740-1:root:/yb_oradata/RLZYbak/dpdump]#cat insur_changde_150921_2330.log Export: Release 10.2.0.4.0 - 64bit Production on Monday, 21 September, 2015 23:30:00 Copyright (c) 2003, 2007, Oracle. All rights reserved. ;;; Connected to: Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production With the Partitioning, OLAP, Data Mining and Real Application Testing options Starting "INSUR_CHANGDE"."SYS_EXPORT_SCHEMA_07": insur_changde/******** directory=dump_RLZY dumpfile=insur_changde_150921_2330.dmp logfile=insur_changde_150921_2330.log Estimate in progress using BLOCKS method... Processing object type SCHEMA_EXPORT/TABLE/TABLE_DATA Total estimation using BLOCKS method: 492.0 GB Processing object type SCHEMA_EXPORT/USER Processing object type SCHEMA_EXPORT/SYSTEM_GRANT Processing object type SCHEMA_EXPORT/ROLE_GRANT Processing object type SCHEMA_EXPORT/DEFAULT_ROLE Processing object type SCHEMA_EXPORT/PRE_SCHEMA/PROCACT_SCHEMA Processing object type SCHEMA_EXPORT/SYNONYM/SYNONYM Processing object type SCHEMA_EXPORT/TYPE/TYPE_SPEC Processing object type SCHEMA_EXPORT/DB_LINK Processing object type SCHEMA_EXPORT/SEQUENCE/SEQUENCE Processing object type SCHEMA_EXPORT/SEQUENCE/GRANT/OWNER_GRANT/OBJECT_GRANT Processing object type SCHEMA_EXPORT/TABLE/TABLE Processing object type SCHEMA_EXPORT/TABLE/GRANT/OWNER_GRANT/OBJECT_GRANT Processing object type SCHEMA_EXPORT/TABLE/INDEX/INDEX Processing object type SCHEMA_EXPORT/TABLE/CONSTRAINT/CONSTRAINT Processing object type SCHEMA_EXPORT/TABLE/INDEX/STATISTICS/INDEX_STATISTICS Processing object type SCHEMA_EXPORT/TABLE/COMMENT Processing object type SCHEMA_EXPORT/PACKAGE/PACKAGE_SPEC Processing object type SCHEMA_EXPORT/FUNCTION/FUNCTION Processing object type SCHEMA_EXPORT/FUNCTION/GRANT/OWNER_GRANT/OBJECT_GRANT Processing object type SCHEMA_EXPORT/PROCEDURE/PROCEDURE Processing object type SCHEMA_EXPORT/PROCEDURE/GRANT/OWNER_GRANT/OBJECT_GRANT Processing object type SCHEMA_EXPORT/PACKAGE/COMPILE_PACKAGE/PACKAGE_SPEC/ALTER_PACKAGE_SPEC Processing object type SCHEMA_EXPORT/FUNCTION/ALTER_FUNCTION Processing object type SCHEMA_EXPORT/PROCEDURE/ALTER_PROCEDURE Processing object type SCHEMA_EXPORT/VIEW/VIEW Processing object type SCHEMA_EXPORT/VIEW/GRANT/OWNER_GRANT/OBJECT_GRANT Processing object type SCHEMA_EXPORT/PACKAGE/PACKAGE_BODY Processing object type SCHEMA_EXPORT/TYPE/TYPE_BODY Processing object type SCHEMA_EXPORT/TABLE/CONSTRAINT/REF_CONSTRAINT Processing object type SCHEMA_EXPORT/TABLE/TRIGGER Processing object type SCHEMA_EXPORT/TABLE/INDEX/FUNCTIONAL_AND_BITMAP/INDEX Processing object type SCHEMA_EXPORT/TABLE/INDEX/STATISTICS/FUNCTIONAL_AND_BITMAP/INDEX_STATISTICS Processing object type SCHEMA_EXPORT/TABLE/STATISTICS/TABLE_STATISTICS Processing object type SCHEMA_EXPORT/JOB . . exported "INSUR_CHANGDE"."MT_BIZ_SCENE_FIN" 51.86 GB 1111873243 rows . . exported "INSUR_CHANGDE"."MT_FEE_FIN" 22.76 GB 133817090 rows
从上面的expdp日志信息来看并没有错误,更像理导出进程停止,如果查看dba_datapump_job视图来查看有没有被异常终止的epxdp导出job
从上面的信息可以看到insur_changde用户出现了多个expdp导出异常终止的job。从job命名规则可以看到最近的异常终止job是sys_export_schema_07,而且状态是空闲的。那么重新连接sys_export_schema_07这个job来查看job状态。
[IBMP740-1:oracle:/yb_oradata/RLZYbak]$expdp 'insur_changde/"power$20140224"' attach=SYS_EXPORT_SCHEMA_07 Export: Release 10.2.0.4.0 - 64bit Production on Tuesday, 22 September, 2015 16:51:51 Copyright (c) 2003, 2007, Oracle. All rights reserved. Connected to: Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production With the Partitioning, OLAP, Data Mining and Real Application Testing options Job: SYS_EXPORT_SCHEMA_07 Owner: INSUR_CHANGDE Operation: EXPORT Creator Privs: FALSE GUID: 20448E2327C5015EE053C0A80201015E Start Time: Tuesday, 22 September, 2015 16:51:56 Mode: SCHEMA Instance: RLZY Max Parallelism: 1 EXPORT Job Parameters: Parameter Name Parameter Value: CLIENT_COMMAND insur_changde/******** directory=dump_RLZY dumpfile=insur_changde_150921_2330.dmp logfile=insur_changde_150921_2330.log State: IDLING Bytes Processed: 80,139,523,792 Percent Done: 41 Current Parallelism: 1 Job Error Count: 0 Dump File: /yb_oradata/RLZYbak/dpdump/insur_changde_150921_2330.dmp bytes written: 80,145,354,752 Worker 1 Status: State: UNDEFINED Object Schema: INSUR_CHANGDE Object Name: LV_INDIPAR Object Type: SCHEMA_EXPORT/TABLE/TABLE_DATA Completed Objects: 3 Total Objects: 1,225 Completed Rows: 288,824,659 Worker Parallelism: 1 Worker 1 Status: State: UNDEFINED Object Schema: INSUR_CHANGDE Object Name: LV_INDIPAR Object Type: SCHEMA_EXPORT/TABLE/TABLE_DATA Completed Objects: 3 Total Objects: 1,225 Completed Rows: 288,824,659 Worker Parallelism: 1
从上面的信息可以看出现在expdp job正在导出的表为LV_INDIPAR表,状态为UNDEFINED,也没有其它有用信息。那么为什么expdp job会异常终止了。检查alert.log文件在执行expdp导出时出现了以下错误信息。
Starting control autobackup Control autobackup written to SBT_TAPE device comment 'API Version 2.0,MMS Version 1.2.0.0', media 'backup_nw.023.RO' handle 'c-1589671076-20150921-00' Mon Sep 21 23:30:02 2015 The value (30) of MAXTRANS parameter ignored. kupprdp: master process DM00 started with pid=188, OS id=23527444 to execute - SYS.KUPM$MCP.MAIN('SYS_EXPORT_SCHEMA_07', 'INSUR_CHANGDE', 'KUPC$C_1_20150921233002', 'KUPC$S_1_20150921233002', 0); kupprdp: worker process DW01 started with worker id=1, pid=189, OS id=1704856 to execute - SYS.KUPW$WORKER.MAIN('SYS_EXPORT_SCHEMA_07', 'INSUR_CHANGDE'); Tue Sep 22 00:24:18 2015 ksvcreate: Process(q001) creation failed Tue Sep 22 00:24:38 2015 Process startup failed, error stack: Tue Sep 22 00:24:39 2015 Errors in file /oracle/admin/RLZY/bdump/rlzy_psp0_7471450.trc: ORA-27300: OS system dependent operation:fork failed with status: 12 ORA-27301: OS failure message: Not enough space ORA-27302: failure occurred at: skgpspawn3 Tue Sep 22 00:24:39 2015 Process q001 died, see its trace file Tue Sep 22 00:24:39 2015 ksvcreate: Process(q001) creation failed Tue Sep 22 00:24:51 2015
从上面信息可以看到expdp job是在21号的23:30开始执行,在22号的00:24:39出现了故障并在/oracle/admin/RLZY/bdump/rlzy_psp0_7471450.trc文件中生成了错误信息如下。
*** 2015-09-20 00:24:36.347 Process startup failed, error stack: ORA-27300: OS system dependent operation:fork failed with status: 12 ORA-27301: OS failure message: Not enough space ORA-27302: failure occurred at: skgpspawn3
根据MOS文章Troubleshooting ORA-27300 ORA-27301 ORA-27302 errors (Doc ID 579365.1),出现这种错误信息主要是因为内存或交换区被用尽的原因,如是检查系统内存与交换区的使用情况
[IBMP740-1:root:/]#topas_nmon lqtopas_nmonqqh=HelpqqqqqqqqqqqqqHost=IBMP740-1qqqqqqRefresh=2 secsqqq16:58.25 x Memory x x Physical PageSpace | pages/sec In Out | FileSystemCache x x% Used 99.8% 68.0% | to Paging Space 0.0 0.0 | (numperm) 49.6% x x% Free 0.2% 32.0% | to File System 586.6 11.7 | Process 42.7% x xMB Used 63572.0MB 11142.9MB | Page Scans 126.5 | System 7.6% x xMB Free 108.0MB 5241.1MB | Page Cycles 0.0 | Free 0.2% x xTotal(MB) 63680.0MB 16384.0MB | Page Steals 126.5 | ------ x x | Page Faults 1317.1 | Total 100.0% x x------------------------------------------------------------ | numclient 49.6% x xMin/Maxperm 1853MB( 3%) 55589MB( 90%) < --% of RAM | maxclient 90.0% x xMin/Maxfree 960 1088 Total Virtual 78.2GB | User 89.3% x xMin/Maxpgahead 2 8 Accessed Virtual 33.6GB 43.0%| Pinned 9.4% x x | lruable pages 15811872.0
从上面的信息可以看到物理内存为63680.0MB,交换区为16384.0M了,物理内存使用了63572.0M,交换区使用了11142.9M,物理内存了可用内存只有108.0M占总物理内存的0.2%,交换区是5241.1M占总交换区的32%。FileSystemCache (numperm) 49.6% 说明AIX 文件系统缓存占用了物理内存的49.6%,Process 42.7%说明进程占用了物理内存的42.7%,System 7.6%说明系统占用了物理内存的7.6%, Free 0.2%说明了可用的物理内存只有0.2%。并且可以看到Maxperm=90%,maxclient=90%,说明文件系统缓存使用物理内存的最大限制为物理内存的90%。
检查AIX系统中消耗内存前10的进程,如下所示大部分是Oracle相关进程
[IBMP740-1:root:/]#ps -ealf | head -1 ; ps -ealf | sort -rn +9 | head F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME CMD 240001 A oracle 6553662 1 0 60 20 a31123590 115936 Jun 27 - 22:05 ora_lgwr_RLZY 240001 A oracle 57671750 1 0 60 20 c41744590 111768 f1000e0004ee48c8 Sep 16 - 28:48 oracleRLZY (LOCAL=NO) 240001 A oracle 61735218 1 0 60 20 c44fc4590 109912 f1000e00100440c8 Sep 16 - 31:40 oracleRLZY (LOCAL=NO) 240001 A oracle 58982776 1 0 60 20 fb447b590 109528 f1000e0004a0b8c8 Sep 16 - 12:57 oracleRLZY (LOCAL=NO) 240001 A oracle 26935684 1 0 60 20 f416f4590 108264 Jun 27 - 2:07 ora_arc1_RLZY 240001 A oracle 26870144 1 0 60 20 cf16cf590 108264 Jun 27 - 2:37 ora_arc0_RLZY 240001 A oracle 7536818 1 0 60 20 a71127590 108248 Jun 27 - 15:59 ora_cjq0_RLZY 240001 A oracle 7733430 1 0 60 20 8a0e0a590 106096 Jun 27 - 8:54 ora_dbw0_RLZY 240001 A oracle 8913722 1 24 72 20 864c86590 104764 Sep 16 - 18:14 oracleRLZY (LOCAL=NO) 240001 A oracle 26214712 1 0 60 20 944194590 104584 16:51:55 - 0:00 ora_dm00_RLZY
[IBMP740-1:root:/]#topas -M Topas Monitor for host: IBMP740-1 Interval: 2 Tue Sep 22 17:13:05 2015 ================================================================================ REF1 SRAD TOTALMEM INUSE FREE FILECACHE HOMETHRDS CPUS -------------------------------------------------------------------------------- 0 0 60.4G 60.3G 106.5 30.8G 748 0-31 1 1 0.0 0.0 0.0 0.0 625 32-63 ================================================================================ CPU SRAD TOTALDISP LOCALDISP% NEARDISP% FARDISP% ------------------------------------------------------------ 36 1 439 100.0 0.0 0.0 60 1 345 100.0 0.0 0.0 56 1 184 100.0 0.0 0.0 0 0 144 100.0 0.0 0.0 32 1 93 100.0 0.0 0.0 16 0 88 100.0 0.0 0.0 8 0 54 100.0 0.0 0.0 40 1 43 100.0 0.0 0.0 12 0 36 100.0 0.0 0.0 20 0 28 100.0 0.0 0.0 4 0 28 100.0 0.0 0.0 28 0 21 100.0 0.0 0.0 44 1 18 100.0 0.0 0.0 24 0 12 100.0 0.0 0.0 52 1 11 100.0 0.0 0.0 48 1 1 100.0 0.0 0.0 17 0 0 0.0 0.0 0.0 18 0 0 0.0 0.0 0.0 19 0 0 0.0 0.0 0.0 10 0 0 0.0 0.0 0.0 21 0 0 0.0 0.0 0.0 22 0 0 0.0 0.0 0.0 23 0 0 0.0 0.0 0.0 9 0 0 0.0 0.0 0.0 25 0 0 0.0 0.0 0.0 26 0 0 0.0 0.0 0.0 27 0 0 0.0 0.0 0.0 7 0 0 0.0 0.0 0.0 29 0 0 0.0 0.0 0.0 30 0 0 0.0 0.0 0.0 31 0 0 0.0 0.0 0.0 6 0 0 0.0 0.0 0.0 33 1 0 0.0 0.0 0.0 5 0 0 0.0 0.0 0.0
从上面的信息可知除了系统所用的物理内存之外,总的可用物理内存是60.4G,使用了60.3G,可用106.5M,文件系统缓存是30.8G。
使用操作系统命令vmo -a –F来查看操作系统参数
[IBMP740-1:root:/]#vmo -a -F ame_cpus_per_pool = n/a ame_maxfree_mem = n/a ame_min_ucpool_size = n/a ame_minfree_mem = n/a ams_loan_policy = n/a enhanced_affinity_affin_time = 1 enhanced_affinity_vmpool_limit = 10 esid_allocator = 0 force_relalias_lite = 0 kernel_heap_psize = 65536 lgpg_regions = 0 lgpg_size = 0 low_ps_handling = 1 maxfree = 1088 maxperm = 14230680 maxpin = 13137354 maxpin% = 80 memory_frames = 16302080 memplace_data = 0 memplace_mapped_file = 0 memplace_shm_anonymous = 0 memplace_shm_named = 0 memplace_stack = 0 memplace_text = 0 memplace_unmapped_file = 0 minfree = 960 minperm = 474353 minperm% = 3 nokilluid = 0 npskill = 32768 npswarn = 131072 num_locks_per_semid = 1 numpsblks = 4194304 pinnable_frames = 14750156 relalias_percentage = 0 scrub = 0 v_pinshm = 0 vmm_default_pspa = 0 vmm_klock_mode = 1 wlm_memlimit_nonpg = 1 ##Restricted tunables ame_sys_memview = n/a cpu_scale_memp = 8 data_stagger_interval = 161 defps = 1 enhanced_affinity_attach_limit = 100 enhanced_affinity_balance = 100 enhanced_affinity_private = 40 enhanced_memory_affinity = 1 framesets = 2 htabscale = n/a kernel_psize = 65536 large_page_heap_size = 0 lru_file_repage = 0 lru_poll_interval = 10 lrubucket = 131072 maxclient% = 90 maxperm% = 90 mbuf_heap_psize = 65536 memory_affinity = 1 multiple_semid_lists = 0 munmap_npages = 16384 npsrpgmax = 262144 npsrpgmin = 196608 npsscrubmax = 262144 npsscrubmin = 196608 num_sem_undo_lists = 0 num_sems_per_lock = 1 num_spec_dataseg = 0 numperm_global = 1 page_steal_method = 1 psm_timeout_interval = 20000 relalias_lockmode = 1 rpgclean = 0 rpgcontrol = 2 scrubclean = 0 shm_1tb_shared = 12 shm_1tb_unsh_enable = 1 shm_1tb_unshared = 256 soft_min_lgpgs_vmpool = 0 spec_dataseg_int = 512 strict_maxclient = 1 strict_maxperm = 0 sync_npages = 0 thrpgio_inval = 1024 thrpgio_npages = 1024 vm_mmap_areload = 0 vm_modlist_threshold = -1 vm_pvlist_dohard = 0 vm_pvlist_szpcnt = 0 vmm_fork_policy = 1 vmm_mpsize_support = 2 vmm_vmap_policy = 0 vtiol_avg_ms = 200 vtiol_minreq = 25 vtiol_minth_active = 1 vtiol_mode = 0 vtiol_pgin_mode = 2 vtiol_pgout_mode = 2 vtiol_q_cpu_pct = 2500 vtiol_thread_cpu_pct = 5000
主要是maxclient% = 90,maxperm% = 90参数,说明文件系统缓存使用物理内存的最大限制为物理内存的90%。所以这里只需要将maxclient%与maxperm%参数调小,让系统有空闲内存来分配给新产生的进程来执行特定操作。调整maxclient%与maxperm%参数。
[IBMP740-1:root:/]#vmo -p -o maxclient%=20 Modification to restricted tunable maxclient%, confirmation required yes/no yes Setting maxclient% to 20 in nextboot file Setting maxclient% to 20 Warning: a restricted tunable has been modified [IBMP740-1:root:/]#vmo -p -o maxperm%=20 Modification to restricted tunable maxperm%, confirmation required yes/no yes Setting maxperm% to 20 in nextboot file Setting maxperm% to 20 Warning: a restricted tunable has been modified
调整后再次查看操作系统参数
[IBMP740-1:root:/]#vmo -a -F ame_cpus_per_pool = n/a ame_maxfree_mem = n/a ame_min_ucpool_size = n/a ame_minfree_mem = n/a ams_loan_policy = n/a enhanced_affinity_affin_time = 1 enhanced_affinity_vmpool_limit = 10 esid_allocator = 0 force_relalias_lite = 0 kernel_heap_psize = 65536 lgpg_regions = 0 lgpg_size = 0 low_ps_handling = 1 maxfree = 1088 maxperm = 3162370 maxpin = 13137354 maxpin% = 80 memory_frames = 16302080 memplace_data = 0 memplace_mapped_file = 0 memplace_shm_anonymous = 0 memplace_shm_named = 0 memplace_stack = 0 memplace_text = 0 memplace_unmapped_file = 0 minfree = 960 minperm = 790590 minperm% = 5 nokilluid = 0 npskill = 32768 npswarn = 131072 num_locks_per_semid = 1 numpsblks = 4194304 pinnable_frames = 14770780 relalias_percentage = 0 scrub = 0 v_pinshm = 0 vmm_default_pspa = 0 vmm_klock_mode = 1 wlm_memlimit_nonpg = 1 ##Restricted tunables ame_sys_memview = n/a cpu_scale_memp = 8 data_stagger_interval = 161 defps = 1 enhanced_affinity_attach_limit = 100 enhanced_affinity_balance = 100 enhanced_affinity_private = 40 enhanced_memory_affinity = 1 framesets = 2 htabscale = n/a kernel_psize = 65536 large_page_heap_size = 0 lru_file_repage = 0 lru_poll_interval = 10 lrubucket = 131072 maxclient% = 20 maxperm% = 20 mbuf_heap_psize = 65536 memory_affinity = 1 multiple_semid_lists = 0 munmap_npages = 16384 npsrpgmax = 262144 npsrpgmin = 196608 npsscrubmax = 262144 npsscrubmin = 196608 num_sem_undo_lists = 0 num_sems_per_lock = 1 num_spec_dataseg = 0 numperm_global = 1 page_steal_method = 1 psm_timeout_interval = 20000 relalias_lockmode = 1 rpgclean = 0 rpgcontrol = 2 scrubclean = 0 shm_1tb_shared = 12 shm_1tb_unsh_enable = 1 shm_1tb_unshared = 256 soft_min_lgpgs_vmpool = 0 spec_dataseg_int = 512 strict_maxclient = 1 strict_maxperm = 0 sync_npages = 0 thrpgio_inval = 1024 thrpgio_npages = 1024 vm_mmap_areload = 0 vm_modlist_threshold = -1 vm_pvlist_dohard = 0 vm_pvlist_szpcnt = 0 vmm_fork_policy = 1 vmm_mpsize_support = 2 vmm_vmap_policy = 0 vtiol_avg_ms = 200 vtiol_minreq = 25 vtiol_minth_active = 1 vtiol_mode = 0 vtiol_pgin_mode = 2 vtiol_pgout_mode = 2 vtiol_q_cpu_pct = 2500 vtiol_thread_cpu_pct = 5000
从上面的结果看到修改生效了,文件系统缓存最大可以使用20%的物理内存。
[IBMP740-1:root:/]#topas_nmon lqtopas_nmonqqh=HelpqqqqqqqqqqqqqHost=IBMP740-1qqqqqqRefresh=2 secsqqq17:44.52 x Memory x x Physical PageSpace | pages/sec In Out | FileSystemCache x x% Used 69.8% 67.2% | to Paging Space 0.5 0.0 | (numperm) 19.3% x x% Free 30.2% 32.8% | to File System 9455.5 8.4 | Process 42.9% x xMB Used 44476.3MB 11010.5MB | Page Scans 9562.1 | System 7.6% x xMB Free 19203.7MB 5373.5MB | Page Cycles 0.0 | Free 30.2% x xTotal(MB) 63680.0MB 16384.0MB | Page Steals 9510.6 | ------ x x | Page Faults 7478.9 | Total 100.0% x x------------------------------------------------------------ | numclient 19.3% x xMin/Maxperm 3088MB( 5%) 12353MB( 20%) < --% of RAM | maxclient 20.0% x xMin/Maxfree 960 1088 Total Virtual 78.2GB | User 59.3% x xMin/Maxpgahead 2 8 Accessed Virtual 33.6GB 43.0%| Pinned 9.4% x x | lruable pages 15811872.0
可以看到现在空闲物理内存为30.2%,文件系统缓存(FileSystemCache) 19.2%,maxperm,maxclient为20%。现在执行expdp导出正常。
通过这个问题可以看出,AIX为了提高系统IO能力将空闲的物理内存作为文件系统缓存来使用,而且缺省参数可以使用物理内存的90%,这个缺省值在实际的生产环境中是很容易将内存耗尽的,所以AIX推荐的缺省值也是有问题的。