A data recovery case of 5TB RAC database - Enmotech, A global supplier of end-to-end solutions of data

　　A customer's core database failed to open due to the storage problem after the restart operation. According to customer feedback, when starting the database, the IO error was reported as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21

Sun Mar 15 11:59:37 2015
Errors in file /oracle/app1/oracle/admin/xxxx/bdump/xxxx2_arc1_630876.trc:
ORA-00204: error in reading (block 1, # blocks 1) of control file
ORA-00202: control file: '/xxx/xxxx/control01.ctl'
ORA-17500: ODM err:ODM ERROR V-41-4-2-43-6 No such device or address
Sun Mar 15 11:59:37 2015
Errors in file /oracle/app1/oracle/admin/xxxx/bdump/xxxx2_arc1_630876.trc:
ORA-00204: error in reading (block 1, # blocks 1) of control file
ORA-00202: control file: '/xxx/xxxx/control01.ctl'
ORA-17500: ODM err:ODM ERROR V-41-4-2-43-6 No such device or address
Sun Mar 15 11:59:37 2015
Errors in file /oracle/app1/oracle/admin/xxxx/bdump/xxxx2_arc1_630876.trc:
ORA-00204: error in reading (block 1, # blocks 1) of control file
ORA-00202: control file: '/xxx/xxxx/control01.ctl'
ORA-17500: ODM err:ODM ERROR V-41-4-2-43-6 No such device or address
Sun Mar 15 11:59:37 2015
Master background archival failure: 204
Sun Mar 15 11:59:49 2015
Termination issued to instance processes. Waiting for the processes to exit
Sun Mar 15 15:40:09 2015
Starting ORACLE instance (normal)

　　The above problems are essentially related to the control file, and the general suggestion is replacing the damaged control file, and we did it., After that, the following error was reported in the open database:

2
3
4
5
6
7
8
9
10
11
12
13
14
15

Sun Mar 15 16:10:48 2015
Errors in file /oracle/app1/oracle/admin/xxxx/udump/xxxx2_ora_626734.trc:
ORA-00600: internal error code, arguments: [kcrfr_update_nab_2], [0x70000038F8C94E0], [2], [], [], [], [], []
Abort recovery for domain 0
Sun Mar 15 16:10:49 2015
Aborting crash recovery due to error 600
Sun Mar 15 16:10:49 2015
Errors in file /oracle/app1/oracle/admin/xxxx/udump/xxxx2_ora_626734.trc:
ORA-00600: internal error code, arguments: [kcrfr_update_nab_2], [0x70000038F8C94E0], [2], [], [], [], [], []
ORA-600 signalled during: ALTER DATABASE OPEN...
Sun Mar 15 16:10:49 2015
Trace dumping is performing id=[cdmp_20150315161049]
Sun Mar 15 16:12:35 2015
Shutting down instance: further logons disabled
Sun Mar 15 16:12:35 2015

　　This error is essentially due to the redo problem, that is, redo log damages. When I tried to recover redo log in RMAN, we find similar errors as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33

Sun Mar 15 16:47:59 2015
Beginning crash recovery of 2 threads
parallel recovery setup failed: using serial mode
Sun Mar 15 16:47:59 2015
Started redo scan
Sun Mar 15 16:47:59 2015
Errors in file /oracle/app1/oracle/admin/xxxx/udump/xxxx2_ora_299470.trc:
ORA-00313: open failed for members of log group 5 of thread 2
ORA-00312: online log 5 thread 2: '/xxx/xxxx/redo05a.log'
ORA-17503: ksfdopn:4 Failed to open file /xxx/xxxx/redo05a.log
ORA-17500: ODM err:File does not exist
Sun Mar 15 16:47:59 2015
Errors in file /oracle/app1/oracle/admin/xxxx/udump/xxxx2_ora_299470.trc:
ORA-00313: open failed for members of log group 4 of thread 1
ORA-00312: online log 4 thread 1: '/xxx/xxxx/redo04a.log'
ORA-17503: ksfdopn:4 Failed to open file /xxx/xxxx/redo04a.log
ORA-17500: ODM err:File does not exist
Sun Mar 15 17:03:03 2015
Errors in file /oracle/app1/oracle/admin/xxxx/udump/xxxx2_ora_299470.trc:
ORA-00354: corrupt redo log block header
ORA-00353: log corruption near block 2009344 change 14160745159583 time 03/15/2015 11:56:29
ORA-00334: archived log: '/xxx/xxxx/redo04b.log'
Sun Mar 15 17:03:03 2015
Abort recovery for domain 0
Sun Mar 15 17:03:03 2015
Aborting crash recovery due to error 354
Sun Mar 15 17:03:03 2015
Errors in file /oracle/app1/oracle/admin/xxxx/udump/xxxx2_ora_299470.trc:
ORA-00354: corrupt redo log block header
ORA-00353: log corruption near block 2009344 change 14160745159583 time 03/15/2015 11:56:29
ORA-00312: online log 4 thread 1: '/xxx/xxxx/redo04b.log'
ORA-354 signalled during: ALTER DATABASE OPEN...
Sun Mar 15 17:08:02 2015

　　The above process is how our customer tried to fix it before we get to that case.

　　I intervened about 18 o'clock. First of all, I tried to use RMAN for recovery and found an error:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21

RMAN> recover database;

Starting recover at 15-MAR-15
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=3268 instance=xxxx2 devtype=DISK

starting media recovery

media recovery failed
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of recover command at 03/15/2015 18:39:26
ORA-00283: recovery session canceled due to errors
RMAN-11003: failure during parse/execution of SQL statement: alter database recover if needed
start
ORA-00283: recovery session canceled due to errors
ORA-00354: corrupt redo log block header
ORA-00353: log corruption near block 1788672 change 14160744248478 time 03/15/2015 11:54:46
ORA-00312: online log 4 thread 1: '/xxx/xxxx/redo04a.log'

　　From the above error, it can be seen that the redo04a.log file has been corrupted, Specifically speaking it is a block 1788672. In order to verify if the block is corrupted, I dumped it with a dump command similar to the following and found an error:

　　alter system dump logfile 'xxx' scn min xxxx scn max xxxx;

　　yes, the block is undoubtedly damaged.

　　Because the customer's demand us is to pull up the database as fast as possible, due to the redo damage.I can only process completely restored and forced open, here I use the following parameters:

*._allow_resetlogs_corruption=TRUE
*._allow_error_simulation=TRUE
Before open the database with resetlogs, I had backed up redo.
When database was opened, I found that the database was reported as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

Sun Mar 15 19:43:36 2015
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Sun Mar 15 19:43:36 2015
SMON: enabling cache recovery
Sun Mar 15 19:43:37 2015
Instance recovery: looking for dead threads
Instance recovery: lock domain invalid but no dead threads
Sun Mar 15 19:43:37 2015
ORA-01555 caused by SQL statement below (SQL ID: 5wc2915k44m38, Query Duration=0 sec, SCN: 0x0ce1.0e2d8971):
Sun Mar 15 19:43:37 2015
select user#,type# from user$ where name=:1
Sun Mar 15 19:43:37 2015
Errors in file /oracle/app1/oracle/admin/xxxx/udump/xxxx2_ora_667814.trc:
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 1
ORA-01555: snapshot too old: rollback segment number 25 with name "_SYSSMU25$" too small
Error 704 happened during db open, shutting down database
USER: terminating instance due to error 704
Instance terminated by USER, pid = 667814
ORA-1092 signalled during: alter database open resetlogs...

From the log, the wild guess is that there may be a problem with the _SYSSMU25$ rollback segment, so I tried to mask the rollback segment with the following implicit parameters:
_corrupted_rollback_segments=_SYSSMU25$
_offline_rollback_segments=_SYSSMU25$
After blocking the rollback segment, I tried again to open the database and find that the error persists. Trace through 10046 trace and find that recursive SQL fails on the following block:

1
2
3
4
5
6
7
8
9
10
11
12
13
14

PARSING IN CURSOR #3 len=43 dep=1 uid=0 oct=3 lid=0 tim=37951056727245 hv=1682066536 ad='8cb74a90'
select user#,type# from user$ where name=:1
END OF STMT
PARSE #3:c=0,e=372,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,tim=37951056727243
BINDS #3:
kkscoacd
Bind#0
oacdty=01 mxl=32(03) mxlc=00 mal=00 scl=00 pre=00
oacflg=18 fl2=0001 frm=01 csi=852 siz=32 off=0
kxsbbbfp=1126d4b70bln=32avl=03flg=05
value="XDB"
EXEC #3:c=0,e=465,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,tim=37951056727780
WAIT #3: nam='db file sequential read' ela= 1582 file#=1 block#=282 blocks=1 obj#=44 tim=37951056729421
WAIT #3: nam='db file sequential read' ela= 6642 file#=1 block#=91 blocks=1 obj#=22 tim=37951056736126

By dump file 1 block 91, it is found that the 2nd ITL on the block confirms that there is an active transaction. I planned to submit the transaction directly bbed, but after I compiled bbed, I realized that the block is a cluster block, using bbed to recover a cluster block would face certain risk, so I didn’t do that. Instead I used the undo_management parameter to start the database, then force the open database and find that the error becomes as follows:

2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19

Errors in file /oracle/app1/oracle/admin/xxxx/udump/xxxx2_ora_778430.trc:
ORA-00600: internal error code, arguments: [kclchkblk_4], [3297], [238525189], [3297], [238091117], [], [], []
Sun Mar 15 20:50:52 2015
Instance recovery: looking for dead threads
Instance recovery: lock domain invalid but no dead threads
Sun Mar 15 20:50:53 2015
Redo thread 1 internally disabled at seq 1 (CKPT)
Sun Mar 15 20:50:53 2015
ARC1: Archiving disabled thread 1 sequence 1
Sun Mar 15 20:50:54 2015
Trace dumping is performing id=[cdmp_20150315205054]
Sun Mar 15 20:50:54 2015
Errors in file /oracle/app1/oracle/admin/xxxx/udump/xxxx2_ora_778430.trc:
ORA-00600: internal error code, arguments: [kclchkblk_4], [3297], [238525189], [3297], [238091117], [], [], []
Sun Mar 15 20:50:54 2015
Error 600 happened during db open, shutting down database
USER: terminating instance due to error 600
Instance terminated by USER, pid = 778430
ORA-1092 signalled during: alter database openresetlogs...

From the error information, we know that this should be the issue of SCN. If you want to manually advance the SCN, then the level should be treated as 3297*4, because in this case 238091117/1024/1024/1024 is less than 1, so when promoting scn, level=3297*4+2 is almost the same. Here I conducted 10046 trace again and found the following information:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48

WAIT #5: nam='db file sequential read' ela= 1021 file#=1 block#=400 blocks=1 obj#=0 tim=37953716489772
EXEC #5:c=0,e=2969,p=1,cr=2,cu=3,mis=1,r=1,dep=1,og=4,tim=37953716490098
STAT #5 id=1 cnt=1 pid=0 pos=1 obj=0 op='UPDATEUNDO$ (cr=2 pr=1 pw=0 time=1542 us)'
STAT #5 id=2 cnt=1 pid=1 pos=1 obj=34 op='INDEX UNIQUE SCAN I_UNDO1 (cr=2 pr=0 pw=0 time=11 us)'
WAIT #1: nam='row cache lock' ela= 71 cache id=3 mode=0 request=5 obj#=0 tim=37953716490369
WAIT #1: nam='db file sequential read' ela= 5783 file#=2 block#=25 blocks=1 obj#=0 tim=37953716496201
........
........
GLOBAL CACHE ELEMENT DUMP (address: 700000011f91498):
id1: 0x19 id2: 0x20000 obj: US#2 block: (2/25)
lock: SL rls: 0x0000 acq: 0x0000 latch: 3
flags: 0x41 fair: 0 recovery: 0 fpin: 'ktuwh02: ktugus'
bscn: 0x0.0 bctx: 0 write: 0 scan: 0x0 xflg: 0 xid: 0x0.0.0
lcp: 0 lnk: [NULL] lch: [70000023bf4bc20,70000023bf4bc20]
seq: 3 hist: 143:0 208 352
LIST OF BUFFERS LINKED TO THIS GLOBAL CACHE ELEMENT:
flg: 0x00080000 state: READING mode: EXCL
pin: 'ktuwh02: ktugus'
addr: 70000023bf4bb10 obj: INVALID cls: UNDO HEAD bscn: 0xce1.e379b05---这里的bscn即scn值
GCS SHADOW 700000011f91508,1 sq[70000035fb339f8,70000035fb339f8] resp[70000035fb339d0,0x19.20000] pkey 4294950914
grant 1 cvt 0 mdrole 0x21 st 0x40 GRANTQ rl LOCAL
master 1 owner 1 sid 0 remote[0,0] hist 0x106
history 0x6.0x4.0x0.0x0.0x0.0x0. cflag 0x0 sender 0 flags 0x0 replay# 0
disk: 0x0000.00000000 write request: 0x0000.00000000
pi scn: 0x0000.00000000
msgseq 0x0 updseq 0x0 reqids[1,0,0] infop 0x0
GCS RESOURCE 70000035fb339d0 hashq [70000038cbc6658,70000038cbc6658] name[0x19.20000] pkey 4294950914
grant 700000011f91508 cvt 0 send 0,0 write 0,0@65535
flag 0x0 mdrole 0x1 mode 1 scan 0 role LOCAL
disk: 0x0000.00000000 write: 0x0000.00000000 cnt 0x0 hist 0x0
xid 0x0000.000.00000000 sid 0 pkwait 59s
pkey 4294950914
hv 0 [stat 0x0, 1->1, wm 32767, RMno 0, remxxx 0, dom 0]
kjga st 0x4, step 0.0.0, cxxx 2, rmno 0, flags 0x0
lb 0, hb 0, myb 6147, drmb 6147, apifrz 0
GCS SHADOW 700000011f91508,1 sq[70000035fb339f8,70000035fb339f8] resp[70000035fb339d0,0x19.20000] pkey 4294950914
grant 1 cvt 0 mdrole 0x21 st 0x40 GRANTQ rl LOCAL
master 1 owner 1 sid 0 remote[0,0] hist 0x106
history 0x6.0x4.0x0.0x0.0x0.0x0. cflag 0x0 sender 0 flags 0x0 replay# 0
disk: 0x0000.00000000 write request: 0x0000.00000000
pi scn: 0x0000.00000000
msgseq 0x0 updseq 0x0 reqids[1,0,0] infop 0x0
kjbmbassert [0x19.20000]
*** 2015-03-15 20:54:54.385
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [kclchkblk_4], [3297], [238525189], [3297], [238091117], [], [], []
Current SQL statement for this session:
alter database open resetlogs

Bscn: 0xce1.e379b05 Convert the scn, we can find: 0xce1 is 3297, e379b05 is 238525189. Consistent with the above error message. At the same time, I found that the number 2 rollback segment was used here, as follows:

1	id1: 0x19 id2: 0x20000 obj: US#2 block: (2/25)

Therefore, we try to mask this second rollback segment with implicit parameters and try to open the database, but the error persists. It seems that it is still necessary to adjust the SCN to act as follows:
Sun Mar 15 21:23:20 2015
Errors in file /oracle/app1/oracle/admin/xxxx/udump/xxxx2_ora_774222.trc:
ORA-00600: internal error code, arguments: [kclchkblk_4], [3297], [238958669], [3297], [238091118], [], [], []
Sun Mar 15 21:23:20 2015
Errors in file /oracle/app1/oracle/admin/xxxx/udump/xxxx2_ora_774222.trc:
ORA-00339: archived log does not contain any redo
ORA-00334: archived log: '/xxx/xxxx/redo02a.log'
ORA-00600: internal error code, arguments: [kclchkblk_4], [3297], [238958669], [3297], [238091118], [], [], []
Sun Mar 15 21:23:21 2015
Errors in file /oracle/app1/oracle/admin/xxxx/udump/xxxx2_ora_774222.trc:
ORA-00600: internal error code, arguments: [kclchkblk_4], [3297], [238958669], [3297], [238091118], [], [], []
Sun Mar 15 21:23:21 2015
Error 600 happened during db open, shutting down database
USER: terminating instance due to error 600
Instance terminated by USER, pid = 774222

first of all I tried setting at the session level:
alter session set events '10015 trace name adjust_scn level 13190';
alter database open failed, then I tried to use the *._minimum_giga_scn parameter, but at startup, there is a prompt information says that this parameter is not supported. So we figure the environment may be installed with a relatively new PSU, Oracle discarded the parameters, so that the previous 10015 event does not work at all. it can only be manually modified by oradebug SCN to start the database, as follows:

3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

SQL> startup mount pfile='/tmp/gb.ora';
ORACLE instance started.

Total System Global Area 1.5032E+10 bytes
Fixed Size2110096 bytes
Variable Size5704256880 bytes
Database Buffers 9311354880 bytes
Redo Buffers 14663680 bytes
Database mounted.
SQL> oradebug setmypid
Statement processed.
SQL> oradebug DUMPvar SGA kcsgscn_
kcslf kcsgscn_ [7000000100122A8, 7000000100122D8) = 00000000 00000005 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 07000000 ...
SQL>
SQL> oradebug poke 0x7000000100122A8 4 3300
BEFORE: [7000000100122A8, 7000000100122AC) = 00000000
AFTER:[7000000100122A8, 7000000100122AC) = 00000CE4
SQL> oradebug DUMPvar SGA kcsgscn_
kcslf kcsgscn_ [7000000100122A8, 7000000100122D8) = 00000CE4 00000005 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 07000000 ...
SQL>

After modifying the SCN, the database was open successfully, but the database crashed quickly. The following is the log information:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139

Sun Mar 15 21:47:31 2015
Errors in file /oracle/app1/oracle/admin/xxxx/bdump/xxxx2_smon_774386.trc:
ORA-00339: archived log does not contain any redo
ORA-00334: archived log: '/xxx/xxxx/redo02a.log'
......
Errors in file /oracle/app1/oracle/admin/xxxx/bdump/xxxx2_smon_774386.trc:
ORA-00339: archived log does not contain any redo
ORA-00334: archived log: '/xxx/xxxx/redo02a.log'
Sun Mar 15 21:47:33 2015
Errors in file /oracle/app1/oracle/admin/xxxx/bdump/xxxx2_smon_774386.trc:
ORA-00600: internal error code, arguments: [6006], [1], [], [], [], [], [], []
QMNC started with pid=32, OS id=520520
Sun Mar 15 21:47:35 2015
Errors in file /oracle/app1/oracle/admin/xxxx/bdump/xxxx2_smon_774386.trc:
ORA-00339: archived log does not contain any redo
ORA-00334: archived log: '/xxx/xxxx/redo02a.log'
ORA-00600: internal error code, arguments: [6006], [1], [], [], [], [], [], []
Sun Mar 15 21:47:35 2015
ORACLE Instance xxxx2 (pid = 22) - Error 600 encountered while recovering transaction (44, 26) on object 47098.
Sun Mar 15 21:47:35 2015
Errors in file /oracle/app1/oracle/admin/xxxx/bdump/xxxx2_smon_774386.trc:
ORA-00600: internal error code, arguments: [6006], [1], [], [], [], [], [], []
Sun Mar 15 21:47:36 2015
LOGSTDBY: Validating controlfile with logical metadata
Sun Mar 15 21:47:36 2015
LOGSTDBY: Validation complete
Sun Mar 15 21:47:36 2015
Errors in file /oracle/app1/oracle/admin/xxxx/bdump/xxxx2_smon_774386.trc:
ORA-00339: archived log does not contain any redo
ORA-00334: archived log: '/xxx/xxxx/redo02a.log'
Sun Mar 15 21:47:36 2015
Errors in file /oracle/app1/oracle/admin/xxxx/bdump/xxxx2_smon_774386.trc:
ORA-00339: archived log does not contain any redo
ORA-00334: archived log: '/xxx/xxxx/redo02a.log'
Sun Mar 15 21:47:36 2015
Errors in file /oracle/app1/oracle/admin/xxxx/bdump/xxxx2_smon_774386.trc:
ORA-00600: internal error code, arguments: [4137], [], [], [], [], [], [], []
Sun Mar 15 21:47:37 2015
ORACLE Instance xxxx2 (pid = 22) - Error 600 encountered while recovering transaction (48, 25).
Sun Mar 15 21:47:37 2015
Errors in file /oracle/app1/oracle/admin/xxxx/bdump/xxxx2_smon_774386.trc:
ORA-00600: internal error code, arguments: [4137], [], [], [], [], [], [], []
Sun Mar 15 21:47:39 2015
Completed: alter database open
Sun Mar 15 21:47:39 2015
Errors in file /oracle/app1/oracle/admin/xxxx/bdump/xxxx2_smon_774386.trc:
ORA-00339: archived log does not contain any redo
ORA-00334: archived log: '/xxx/xxxx/redo02a.log'
Sun Mar 15 21:47:39 2015
Errors in file /oracle/app1/oracle/admin/xxxx/bdump/xxxx2_smon_774386.trc:
ORA-00339: archived log does not contain any redo
ORA-00334: archived log: '/xxx/xxxx/redo02a.log'
Sun Mar 15 21:47:39 2015
Errors in file /oracle/app1/oracle/admin/xxxx/bdump/xxxx2_smon_774386.trc:
ORA-00600: internal error code, arguments: [4137], [], [], [], [], [], [], []
Sun Mar 15 21:47:40 2015
ORACLE Instance xxxx2 (pid = 22) - Error 600 encountered while recovering transaction (65, 7).
Sun Mar 15 21:47:40 2015
Errors in file /oracle/app1/oracle/admin/xxxx/bdump/xxxx2_smon_774386.trc:
ORA-00600: internal error code, arguments: [4137], [], [], [], [], [], [], []
Sun Mar 15 21:47:40 2015
Trace dumping is performing id=[cdmp_20150315214740]
Sun Mar 15 21:47:41 2015
Errors in file /oracle/app1/oracle/admin/xxxx/bdump/xxxx2_j009_790778.trc:
ORA-12012: error on auto execute of job 524
ORA-01552: cannot use system rollback segment for non-system tablespace 'xxx_ADMIN'
ORA-06512: at "SYS.xxx_LOGINHISTORY", line 3
ORA-06512: at line 1
Sun Mar 15 21:47:41 2015
Errors in file /oracle/app1/oracle/admin/xxxx/bdump/xxxx2_j009_790778.trc:
ORA-12012: error on auto execute of job 524
ORA-01552: cannot use system rollback segment for non-system tablespace 'xxx_ADMIN'
ORA-06512: at "SYS.xxx_LOGINHISTORY", line 3
ORA-06512: at line 1
Sun Mar 15 21:47:41 2015
Errors in file /oracle/app1/oracle/admin/xxxx/bdump/xxxx2_j002_475534.trc:
ORA-00600: internal error code, arguments: [kdsgrp1], [], [], [], [], [], [], []
Sun Mar 15 21:47:41 2015
Errors in file /oracle/app1/oracle/admin/xxxx/bdump/xxxx2_smon_774386.trc:
ORA-00339: archived log does not contain any redo
ORA-00334: archived log: '/xxx/xxxx/redo02a.log'
Sun Mar 15 21:47:41 2015
Errors in file /oracle/app1/oracle/admin/xxxx/bdump/xxxx2_smon_774386.trc:
ORA-00339: archived log does not contain any redo
ORA-00334: archived log: '/xxx/xxxx/redo02a.log'
Sun Mar 15 21:47:41 2015
Errors in file /oracle/app1/oracle/admin/xxxx/bdump/xxxx2_smon_774386.trc:
ORA-00600: internal error code, arguments: [4137], [], [], [], [], [], [], []
Sun Mar 15 21:47:42 2015
Errors in file /oracle/app1/oracle/admin/xxxx/bdump/xxxx2_j008_586068.trc:
ORA-12012: error on auto execute of job 526
ORA-01552: cannot use system rollback segment for non-system tablespace 'xxx_ADMIN'
ORA-06512: at "SYS.xxx_SEG_xxx", line 3
ORA-06512: at line 1
Sun Mar 15 21:47:42 2015
Errors in file /oracle/app1/oracle/admin/xxxx/bdump/xxxx2_j009_790778.trc:
ORA-00339: archived log does not contain any redo
ORA-00334: archived log: '/xxx/xxxx/redo02a.log'
ORA-12012: error on auto execute of job 524
ORA-01552: cannot use system rollback segment for non-system tablespace 'xxx_ADMIN'
ORA-06512: at "SYS.xxx_LOGINHISTORY", line 3
ORA-06512: at line 1
Sun Mar 15 21:47:42 2015
Errors in file /oracle/app1/oracle/admin/xxxx/bdump/xxxx2_j002_475534.trc:
ORA-00339: archived log does not contain any redo
ORA-00334: archived log: '/xxx/xxxx/redo02a.log'
ORA-00600: internal error code, arguments: [kdsgrp1], [], [], [], [], [], [], []
Sun Mar 15 21:47:43 2015
Errors in file /oracle/app1/oracle/admin/xxxx/bdump/xxxx2_j009_790778.trc:
ORA-00339: archived log does not contain any redo
ORA-00334: archived log: '/xxx/xxxx/redo02a.log'
ORA-12012: error on auto execute of job 524
ORA-01552: cannot use system rollback segment for non-system tablespace 'XXXX_ADMIN'
ORA-06512: at "SYS.XXXX_LOGINHISTORY", line 3
ORA-06512: at line 1
Sun Mar 15 21:47:43 2015
Errors in file /oracle/app1/oracle/admin/xxxx/bdump/xxxx2_j009_790778.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-08102: index key not found, obj# 239, file 1, block 1674 (2)
Sun Mar 15 21:47:43 2015
Errors in file /oracle/app1/oracle/admin/xxxx/bdump/xxxx2_j002_475534.trc:
ORA-00339: archived log does not contain any redo
ORA-00334: archived log: '/xxx/xxxx/redo02a.log'
ORA-00600: internal error code, arguments: [kdsgrp1], [], [], [], [], [], [], []
Sun Mar 15 21:47:43 2015
Errors in file /oracle/app1/oracle/admin/xxxx/bdump/xxxx2_j002_475534.trc:
ORA-00600: internal error code, arguments: [], [], [], [], [], [], [], []
ORA-06512: at "xxxx.PKG_XXXXX", line 126
ORA-06512: at line 3
Sun Mar 15 21:47:43 2015
Errors in file /oracle/app1/oracle/admin/xxxx/bdump/xxxx2_pmon_565700.trc:
ORA-00474: SMON process terminated with error
Sun Mar 15 21:47:43 2015
PMON: terminating instance due to error 474
Sun Mar 15 21:47:47 2015
Dump system state for local instance only
System State dumped to trace file /oracle/app1/oracle/admin/xxxx/bdump/xxxx2_diag_377122.trc
Sun Mar 15 21:47:48 2015
Instance terminated by PMON, pid = 565700

According to the above log information, some major errors have occurred:
ORA-00600 [6006]，ORA-00600 [4137]，ORA-00600 [kdsgrp1]
For the first two errors, it is obvious that the Oracle SMON process failed when using the rollback segment to perform transaction rollback, as follows:

ORACLE Instance xxxx2 (pid = 22) – Error 600 encountered while recovering transaction (44, 26) on object 47098.
ORACLE Instance xxxx2 (pid = 22) – Error 600 encountered while recovering transaction (48, 25).

it is easy to see that there are still some rollback segments in the database with active transactions.

For the ORA-00600 [kdsgrp1] error, it usually appears on the Index, that is, the Index data and the table data are inconsistent. Generally, they can be resolved through reconstruction.
Second, for the ORA-08102: index key not found, obj# 239, file 1, block 1674 (2) errors, the main job call appears, so we can temporarily disable job scheduling.
For the ORA-08102 error, my blog also wrote related articles a few years ago. Essentially, there is no related key value in the Index block.

In the end I decided that it would be the best that all the rollback segments of the database were masked and the database undo tablespace was rebuilt. Here is the command to get the rollback segment:
strings system01.dbf | grep _SYSSMU | cut -d $ -f 1 | sort -u
I found out that the inventory was in about 2600 rollback segments.So after restart the instance, I rebuilt the undo tablespace.

3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27

SQL> conn /as sysdba
Connected to an idle instance.
SQL> startup upgrade pfile='/tmp/gb2.ora';
ORACLE instance started.

Total System Global Area 1.5032E+10 bytes
Fixed Size2110096 bytes
Variable Size5704256880 bytes
Database Buffers 9311354880 bytes
Redo Buffers 14663680 bytes
Database mounted.
Database opened.
SQL> create undo tablespace undotbs11 datafile '/xxx/xxxx/undotbs11_01.dbf' size 100m;
Tablespace created.
SQL> create undo tablespace undotbs22 datafile '/xxx/xxxx/undotbs22_01.dbf' size 100m;
Tablespace created.
SQL> drop tablespace undotbs1 xxxluding contents and datafiles;
Tablespace dropped.
SQL> drop tablespace undotbs2 xxxluding contents and datafiles;
Tablespace dropped.

Finally, restart the database instance, let the client export the core configuration table and then perform application recovery first. If data is needed, then just directly extracted from the database.
Here to add that the library is about a little more than 5TB, although there is a backup, but the recovery time is too long, if only there is a dataguard!