Quantcast
Channel: Learn. Share. Repeat.
Viewing all 147 articles
Browse latest View live

Flashback : Guaranteed Restore Point

$
0
0
Oracle Flashback database and restore points enables us to rewind the database back in time to correct any problems caused by logical data corruption or user errors and it doesn’t require any restoration of backup. There are 2 types of restoration points – 1. Normal Restore Point –> assigns a restore point name to an

Identifying Bloated Index in Oracle

$
0
0
Indexes have always been a topic of interest for DBA/Developers. When it comes to index rebuild there have been many opinions floating across internet on when to rebuild these indexes. Many do say when the BLEVEL is > 3 one should rebuild the indexes. I don’t believe in that and i think i have never [&hellip

2013 in review

$
0
0
The WordPress.com stats helper monkeys prepared a 2013 annual report for this blog. Here’s an excerpt: The Louvre Museum has 8.5 million visitors per year. This blog was viewed about 140,000 times in 2013. If it were an exhibit at the Louvre Museum, it would take about 6 days for that many people to see [&hellip

Few Scripts for Identify Performance Issues using DBA_HIST view

$
0
0
It has been pretty long that i had blogged.Past year was little busy on both personal and professional front. But this year i am planning to be more active in sharing and learning and with not only Oracle DBMS but could be few others too. Now, coming back to this blog, i wanted to share [&hellip

Plan change using load_plans_from_cursor_cache

$
0
0
This post is more of a note for myself and might be helpful to few other. Assuming db is 11gR2 and baselines/spm is used. When a new query is introduced in db, it might be that it runs with the good plan, but sometimes it picks up wrong plan. It could be that Index Range [&hellip

ORA-26723: user “XXXXX” requires the role “DV_GOLDENGATE_REDO_ACCESS”

$
0
0

While starting the extract on UAT DB env which had been recently moved to exadata we got the below error.As we had redo and archive logfiles on ASM, we used “TRANLOGOPTIONS DBLOGREADER” in extract parameter file.

2012-10-24 22:35:48  ERROR   OGG-00446  Oracle GoldenGate Capture for Oracle, emos_cc.prm:  Opening ASM file +RECO_UMO1/archivelog/2012_10_24/thread_1_seq_224.955.797517005 in DBLOGREADER mode: (26723) ORA-26723: user "GGATE" requires the role "DV_GOLDENGATE_REDO_ACCESS"

The first thing which we did is checked whether the role exists or not.


22:41:53 SYS@xxxxx1 > select role from dba_roles where role like 'DV_%';

no rows selected

Ahh, No roles starting with DV_ exists in the db. Then why is GOldenGate asking for this role.Doing some search on tahiti.oracle.com pointed to a document which mentioned

Grant the DV_GOLDENGATE_REDO_ACCESS role to any user who is responsible for using the Oracle GoldenGate TRANLOGOPTIONS DBLOGREADER method to access redo logs in an Oracle Database Vault environment. This enables the management of Oracle GoldenGate processes to be tightly controlled by Database Vault, but does not change or restrict the way an administrator would normally configure Oracle GoldenGate.

So, now we have a clue. Its something to do with Database Vault.The UAT env had recently been moved to exadata box, prior to which it on a normal server where the extract was running fine.

22:42:05 SYS@xxxxx1 > SELECT * FROM V$OPTION WHERE PARAMETER = 'Oracle Database Vault';

PARAMETER                                                        VALUE
---------------------------------------------------------------- ----------------------------------------------------------------
Oracle Database Vault                                            TRUE

Above, shows Database Vault option enabled, but as the database was restored from the backup of the db on normal server, we didn’t had any DVSYS and DVF schemas.

Oracle Database Vault has the following schemas:

DVSYS Schema: Owns the Oracle Database Vault schema and related objects

DVF Schema: Owns the Oracle Database Vault functions that are created to retrieve factor identities

As, vault wasn’t required, we used CHOPT utility available from 11.2 for enabling/disabling database features.


After shutting down the db, ran chopt on all the nodes --

abcde0025: (abncu1) /u01/abncu/admin> chopt disable dv

Writing to /u01/app/oracle/product/11.2.0.3/dbhome_1/install/disable_dv.log...
/usr/bin/make -f /u01/app/oracle/product/11.2.0.3/dbhome_1/rdbms/lib/ins_rdbms.mk dv_off ORACLE_HOME=/u01/app/oracle/product/11.2.0.3/dbhome_1
/usr/bin/make -f /u01/app/oracle/product/11.2.0.3/dbhome_1/rdbms/lib/ins_rdbms.mk ioracle ORACLE_HOME=/u01/app/oracle/product/11.2.0.3/dbhome_1

abcde0025: (abncu1) /u01/abncu/admin>

Started the db and checked for the value which was disabled (FALSE) and GoldenGate extract started working.

SYS@xxxxx1 > SELECT * FROM V$OPTION WHERE PARAMETER = 'Oracle Database Vault';

PARAMETER VALUE
----------------------------------------------------------------
----------------------------------------------------------------
Oracle Database Vault   FALSE

References
http://docs.oracle.com/cd/E11882_01/server.112/e23090/db_objects.htm#DVADM71151

http://docs.oracle.com/cd/E11882_01/install.112/e17214/postinst.htm#CHDBDCGE


Filed under: 11gR2, GoldenGate, Oracle Tagged: 11gR2, chopt, DV_GOLDENGATE_REDO_ACCESS, GoldenGate, ORA-26723

UPGRADE CHECKPOINTTABLE – Goldengate

$
0
0

We have a goldengate setup wherein 3 different GoldenGate clients connects and replicat to one target database.Below are the versions being currently used 11.1.1.1, 11.2.1.0.0 and 11.2.1.0.3.

The version 11.2.1.0.3 was recently added, and below are the steps performed

GGSCI (myhost) 2> obey ./dirprm/add_rep.oby

GGSCI (myhost) 3>

GGSCI (myhost) 3> DBLOGIN USERID ggate@test PASSWORD 'xxxxxxxx'

ERROR: Unable to connect to database using user ggate@test. Please check privileges.
ORA-12170: TNS:Connect timeout occurred.

GGSCI (myhost) 4>

GGSCI (myhost) 4> ADD REPLICAT rep, extTrail /app/trail/rep/rp, checkpointTable ggate.OGG_CHECKPOINT

REPLICAT added.

dblogin failed but then the replicat got added. Now what happens if we try to delete it

GGSCI (myhost) 5> delete replicat rep
ERROR: Could not delete DB checkpoint for REPLICAT rep (Database login required to delete database checkpoint).

GGSCI (host) 6> info all

Program     Status      Group   Lag at Chkpt  Time Since Chkpt

MANAGER     RUNNING
EXTRACT     RUNNING     ext     00:00:00      00:00:05
REPLICAT    STOPPED     rep     00:00:00      00:00:38

So, now what can be done to delete it. Its simple,

In GG_HOME/dirchk

TEST:/u01/app/oracle/product/ggate/dirchk->ls -lrt
total 20
-rw-rw-r-- 1 ggate dba 4096 Nov 17 01:51 EXT.cpb
-rw-rw-r-- 1 ggate dba 2048 Nov 17 03:39 REP.cpr
-rw-rw-r-- 1 ggate dba   52 Nov 17 03:59 EXT.cps
-rw-rw-r-- 1 ggate dba 8192 Nov 17 03:59 EXT.cpe

Remove "REP.cpr" file.

TEST:/u01/app/oracle/product/ggate/dirchk->rm REP.cpr


TEST:/u01/app/oracle/product/ggate->ggsci

Oracle GoldenGate Command Interpreter for Oracle
Version 11.2.1.0.3 14400833 OGGCORE_11.2.1.0.3_PLATFORMS_120823.1258_FBO
Linux, x64, 64bit (optimized), Oracle 11g on Aug 23 2012 20:20:21

Copyright (C) 1995, 2012, Oracle and/or its affiliates. All rights reserved.



GGSCI (host) 1> info all

Program     Status      Group  Lag at Chkpt  Time Since Chkpt

MANAGER     RUNNING
EXTRACT     RUNNING     EXT     00:00:01      00:00:07

Once the port issue was resolved, added the replicat “rep” again successfully.

GGSCI (myhost) 2>  DBLOGIN USERID ggate@test PASSWORD "xxxxxxxx"
Successfully logged into database.

GGSCI (myhost) 3> ADD REPLICAT REP, extTrail /app/trail/rep/rp, checkpointTable ggate.OGG_CHECKPOINT
REPLICAT added.


GGSCI (myhost) 4> info all

Program     Status      Group   Lag at Chkpt  Time Since Chkpt

MANAGER     RUNNING
EXTRACT     RUNNING     EXT     00:00:01      00:00:02
REPLICAT    STOPPED     REP     00:00:00      00:00:20


GGSCI (myhost) 2> start REP

Sending START request to MANAGER ...
REPLICAT REP starting


GGSCI (myhost) 3> info all

Program     Status      Group  Lag at Chkpt  Time Since Chkpt

MANAGER     RUNNING
EXTRACT     RUNNING     EXT     00:00:00      00:00:06
REPLICAT    STOPPED     REP     00:00:00      00:04:15

Why is the status ‘STOPPED’? ggserr.log shows

2012-11-17 04:04:46  ERROR   OGG-00446  Oracle GoldenGate Delivery for Oracle, rep.prm:  Supplemental Checkpoint table does not exist.  Create a supplemental checkpoint table with the UPGRADE CHECKPOINTTABLE command in GGSCI if you have upgraded from release 11.2.1.0.0 or earlier.
2012-11-17 04:04:46  ERROR   OGG-01668  Oracle GoldenGate Delivery for Oracle, rep.prm:  PROCESS ABENDING.

The target db is on 11.2.0.2.0 version. Checking the checkpoint table

SYS@test > desc ggate.ogg_checkpoint
 Name                                                                       Null?    Type
 -------------------------------------------------------------------------- -------- --------------------------------------------------
 GROUP_NAME                                                                 NOT NULL VARCHAR2(8)
 GROUP_KEY                                                                  NOT NULL NUMBER(19)
 SEQNO                                                                               NUMBER(10)
 RBA                                                                        NOT NULL NUMBER(19)
 AUDIT_TS                                                                            VARCHAR2(29)
 CREATE_TS                                                                  NOT NULL DATE
 LAST_UPDATE_TS                                                             NOT NULL DATE
 CURRENT_DIR                                                                NOT NULL VARCHAR2(255)

04:00:55 SYS@test1 > /

GROUP_NA  GROUP_KEY      SEQNO        RBA AUDIT_TS                      CREATE_TS                   LAST_UPDATE_TS
-------- ---------- ---------- ---------- ----------------------------- --------------------------- ---------------------------
CURRENT_DIR
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
RABC     2253238980          0  230112172 2012-11-17 00:18:32.000000    16-NOV-12-13:56:33          17-NOV-12-04:08:45
/u01/app/oracle/product/ggate

RDEF     451369050        174   13320072 2012-11-17 04:08:48.000000    08-OCT-12-22:02:16          17-NOV-12-04:08:51
/app/ggate

RGBAA    3979228817         85   30933007 2012-11-17 03:03:33.000000    26-OCT-12-10:06:50          17-NOV-12-04:04:09
/app/ggate

RDONE    3150503361        365  276978037 2012-11-17 04:05:33.000000    14-OCT-12-10:34:54          17-NOV-12-04:05:40
/app/ggate

Though we have added and started REP, we don’t see any row for it in checkpoint table. Lets try to run upgrade checkpointtable command

GGSCI (myhost) 2> DBLOGIN USERID ggate@test PASSWORD "xxxxxxx"
Successfully logged into database.

GGSCI (myhost) 3> UPGRADE CHECKPOINTTABLE ggate.OGG_CHECKPOINT

Successfully upgraded checkpoint table ggate.OGG_CHECKPOINT.

GGSCI (myhost) 4> start rep

Sending START request to MANAGER ...
REPLICAT REP starting


GGSCI (myhost) 5> info all

Program     Status      Group  Lag at Chkpt  Time Since Chkpt

MANAGER     RUNNING
EXTRACT     RUNNING     EXT     00:00:02      00:00:26
REPLICAT    RUNNING     REP     02:35:52      00:00:16

From db

SYS@test1 > desc ggate.ogg_checkpoint
 Name                                                                       Null?    Type
 -------------------------------------------------------------------------- -------- --------------------------------------------------
 GROUP_NAME                                                                 NOT NULL VARCHAR2(8)
 GROUP_KEY                                                                  NOT NULL NUMBER(19)
 SEQNO                                                                               NUMBER(10)
 RBA                                                                        NOT NULL NUMBER(19)
 AUDIT_TS                                                                            VARCHAR2(29)
 CREATE_TS                                                                  NOT NULL DATE
 LAST_UPDATE_TS                                                             NOT NULL DATE
 CURRENT_DIR                                                                NOT NULL VARCHAR2(255)
 LOG_CSN                                                                             VARCHAR2(129)
 LOG_XID                                                                             VARCHAR2(129)
 LOG_CMPLT_CSN                                                                       VARCHAR2(129)
 LOG_CMPLT_XIDS                                                                      VARCHAR2(2000)
 VERSION                                                                             NUMBER(3)

04:12:24 SYS@test1 > select * from ggate.ogg_checkpoint;

GROUP_NA  GROUP_KEY      SEQNO        RBA AUDIT_TS                      CREATE_TS                   LAST_UPDATE_TS
-------- ---------- ---------- ---------- ----------------------------- --------------------------- ---------------------------
CURRENT_DIR
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
LOG_CSN
---------------------------------------------------------------------------------------------------------------------------------
LOG_XID
---------------------------------------------------------------------------------------------------------------------------------
LOG_CMPLT_CSN
---------------------------------------------------------------------------------------------------------------------------------
LOG_CMPLT_XIDS
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   VERSION
----------
RABC     2253238980          3  119752892 2012-11-20 02:12:30.000000    16-NOV-12-13:56:33          20-NOV-12-04:12:35
/u01/app/oracle/product/ggate






RDEF   451369050        183  229511885 2012-11-20 04:12:22.000000    08-OCT-12-22:02:16          20-NOV-12-04:12:24
/app/ggate






RGBAA   3979228817         97   80060759 2012-11-20 10:07:02.000000    26-OCT-12-10:06:50          20-NOV-12-04:10:09
/app/ggate






REP   2249640216         12  211547890 2012-11-20 04:12:32.000000    17-NOV-12-04:00:48          20-NOV-12-04:12:34
/u01/app/oracle/product/ggate
12957879573832
780.26.260408
12957879573832
780.26.260408
         1

RDONE    3150503361        404  290793846 2012-11-20 04:12:19.000000    14-OCT-12-10:34:54          20-NOV-12-04:12:23
/app/ggate



Filed under: 11gR2, GoldenGate, Oracle Tagged: checkpointtable, GoldenGate

Internals of Active Dataguard – A must read


DBCA failing with “Diskgroup XXX is not compatible for database usage”

$
0
0

Today a friend of mine pinged me an error while he was trying to create database using DBCA from 11.2.0.3 RDBMS home on an exadata box.Screenshot of the error is below

dbca_error

The error was easy to understand as compatible parameter set in the database initialization parameter was lower than the compatible.rdbms set for the diskgroup in ASM. The db compatible parameter was set to 11.2.0.0.0 whereas for the diskgroup compatible.rdbms was set to 11.2.0.2.0.

So checking the ASM instance showed

SYS@+ASM7 > col COMPATIBILITY form a10
SYS@+ASM7 > col DATABASE_COMPATIBILITY form a10
SYS@+ASM7 > col NAME form a20
SYS@+ASM7 > select group_number, name, compatibility, database_compatibility from v$asm_diskgroup;

GROUP_NUMBER NAME               COMPATIBIL DATABASE_C
------------ ------------------ ---------- ----------
           1 DATA_O1            11.2.0.2.0 11.2.0.2.0
           2 RECO_O1            11.2.0.2.0 11.2.0.2.0

The compatible.asm diskgroup attribute controls the format of the ASM diskgroup metadata. Only ASM instances with a software version equal to or greater than compatible.asm can mount the diskgroup.

The compatible.rdbms diskgroup attribute determines the format of ASM files themselves. The diskgroup can be accessed by any database instance with a compatible init.ora parameter set equal to or higher than the compatible.rdbms attribute.

DBCA (choose General Purpose) doesn’t provide any screen wherein we can change the parameter value. But as we want to create the database using dbca, we need to change the parameter value in the template stored in ORACLE_HOME/assistants/dbca/templates


xxxxxx: (test1) /u01/oraadmin/test/admin> cd $ORACLE_HOME
xxxxxx: (test1) /u01/app/oracle/product/11.2.0.3/dbhome_1> cd assistants/dbca/templates/
xxxxxx: (test1) /u01/app/oracle/product/11.2.0.3/dbhome_1/assistants/dbca/templates> ls -lrt
total 292272
-rwxrw-r-- 1 oracle oinstall        83 Oct  4 07:25 create_bct.sql
-rwxrw-r-- 1 oracle oinstall       718 Oct  4 07:25 crt_cluster_interconnect.sql
-rwxrw-r-- 1 oracle oinstall      5104 Oct  4 07:25 Data_Warehouse.dbc
-rwxrw-r-- 1 oracle oinstall       122 Oct  4 07:25 drop_cluster_interconnect.sql
-rwxrw-r-- 1 oracle oinstall     13756 Oct  4 07:25 dw_x2_2.dbt
-rwxrw-r-- 1 oracle oinstall       178 Oct  4 07:25 exadata_miscellaneous.sql
-rwxrw-r-- 1 oracle oinstall   1507328 Oct  4 07:25 example.dmp
-rwxrw-r-- 1 oracle oinstall  21889024 Oct  4 07:25 example01.dfb
-rwxrw-r-- 1 oracle oinstall      4984 Oct  4 07:25 General_Purpose.dbc
-rwxrw-r-- 1 oracle oinstall       803 Oct  4 07:25 logs.sql
-rwxrw-r-- 1 oracle oinstall      4104 Oct  4 07:25 logs.wk1
-rwxrw-r-- 1 oracle oinstall       320 Oct  4 07:25 logs_to_add.lst
-rwxrw-r-- 1 oracle oinstall       311 Oct  4 07:25 logs_to_add.sql
-rwxrw-r-- 1 oracle oinstall      8208 Oct  4 07:25 logs_to_add.wk1
-rwxrw-r-- 1 oracle oinstall     11489 Oct  4 07:25 New_Database.dbt
-rwxrw-r-- 1 oracle oinstall     13558 Oct  4 07:25 oltp_x2_2.dbt
-rwxrw-r-- 1 oracle oinstall       369 Oct  4 07:25 recreate_temp.sql
-rwxrw-r-- 1 oracle oinstall   9748480 Oct  4 07:26 Seed_Database.ctl
-rwxrw-r-- 1 oracle oinstall 265691136 Oct  4 07:26 Seed_Database.dfb
-rwxrw-r-- 1 oracle oinstall       761 Oct  4 07:26 set_cluster_interconnect.sql
-rwxrw-r-- 1 oracle oinstall        18 Oct  4 07:26 set_fra_size.lst
-rwxrw-r-- 1 oracle oinstall       806 Oct  4 07:26 set_fra_size.sql
-rwxrw-r-- 1 oracle oinstall       513 Oct  4 07:26 set_fra_size.wk1
-rwxrw-r-- 1 oracle oinstall       199 Oct  4 07:26 set_use_large_pages_false.sql
xxxxxx: (test1) /u01/app/oracle/product/11.2.0.3/dbhome_1/assistants/dbca/templates>

As we choose the General Purpose template, we need to edit the value in General_Purpose.dbc and search of compatible parameter and you would see

..............
  initParam name="compatible" value="11.2.0.0.0"/
..............
..............

Edit the value to either equal to compatible.rdbms of diskgroup or higher. In our case, we set it “11.2.0.3” and dbca didn’t threw error the next time :)

To know more on ASM diskgroup compatibility read

http://www.pythian.com/news/1078/oracle-11g-asm-diskgroup-compatibility/


Filed under: 11gR2, ASM, dbca, General, Oracle Tagged: 11gR2, asm, DBCA, diskgroup, Diskgroup is not compatible for database usage

2012 in review

$
0
0

The WordPress.com stats helper monkeys prepared a 2012 annual report for this blog.

Here’s an excerpt:

19,000 people fit into the new Barclays Center to see Jay-Z perform. This blog was viewed about 140,000 times in 2012. If it were a concert at the Barclays Center, it would take about 7 sold-out performances for that many people to see it.

Click here to see the complete report.


Filed under: Oracle

Lets welcome 2013

$
0
0

Wishing everyone a very happy and prosperous New Year 2013.In this New Year, may you always be blessed with contentment, peace and abundance.

With 12c releasing in 2013, it would be a happening Year and lots of new things to learn and playaround :)

happy-new-year-2013-wallpapers-hd-03


Filed under: Oracle

ORA-38500: Unsupported operation: Oracle XML DB not present

$
0
0

While trying to import using impdp got the below error

Table "ANAND"."TEST" exists and has been truncated. Data will be loaded but all dependent metadata will be skipped due to table_exists_action of truncate
Processing object type TABLE_EXPORT/TABLE/TABLE_DATA
ORA-31693: Table data object "ANAND"."TEST":"Y2012_Q2_M06" failed to load/unload and is being skipped due to error:
ORA-38500: Unsupported operation: Oracle XML DB not present
..............

As it said XML db nor present, check for XDB status in dba_registry


COMP_NAME                            VERSION      STATUS
------------------------------------ ------------ --------
Oracle XML Database                  11.2.0.3.0   VALID

03:15:37 SYS > select owner, object_name, object_type, status from dba_objects where status = 'INVALID' and owner = 'XDB';

no rows selected

As per DOC ID 1424643.1, the error can be generated when tables metadata of the exported table in the dumpfile and the existing table at the target DB are different.On checking the table structure there wasn’t any difference.

After spending some more time on it, finally decided to deinstall/install XDB.

Deinstall -- @?/rdbms/admin/catnoqm.sql
Install --@?/rdbms/admin/catqm.sql {XDB pwd} {XDB default tbs} {XDB temporary tbs} {SecureFiles = YES/NO}

For more details one can refer to Doc id 1292089.1

After this, impdp completed successfully. :)


Filed under: 11gR2, DATAPUMP, EXPDP, IMPDP, Oracle Tagged: impdp, ORA-38500, xdb, xml database

ASM Diskgroup shows USABLE_FILE_MB value in Negative

$
0
0

Today while working on ASM diskgroup i noticed Negative value for USABLE_FILE_MB. I was little surprised as it has been pretty long that i worked on ASM. So i started looking around for blogs and mos docs and found few really nice one around.

A negative value for USABLE_FILE_MB means that you do not have sufficient free space to tolerate a disk failure. If a disk were to fail, the subsequent rebalance would run out of space before full redundancy could be restored to all files.

I would really recommend reading :-

http://prutser.wordpress.com/2013/01/03/demystifying-asm-required_mirror_free_mb-and-usable_file_mb/

The box i was working on was exadata server quarter rack, so it had 3 storage server. Each storage server on an exadata server has 12 cell disk. Grid disk are created within Cell Disks.In a simple configuration, One Grid Disk can be created per Cell Disk and Grid disks are what the storage cell presents to db servers. So basically

GRID DISK = ASM DISK.

When creating disk groups, ASM automatically puts all grid disks from the same storage cell into the same failgroup. The failgroup is then named after the storage cell.


[oracle@test~]$ asmcmd lsdg
State    Type    Rebal  Sector  Block       AU  Total_MB  Free_MB  Req_mir_free_MB  Usable_file_MB  Offline_disks  Voting_files  Name
MOUNTED  NORMAL  N         512   4096  4194304  40697856  8464936         13565952        -2550508              0             N  DATA1/
MOUNTED  NORMAL  N         512   4096  4194304    415296   367220           138432          114394              0             Y  DBFS_DG/
MOUNTED  NORMAL  N         512   4096  4194304  10176480  9018276          3392160         2813058              0             N  RECO1/

compute sum Label total_FG of total_mb on FAILGROUP
compute sum Label total of total_mb on report
col diskgroup for a20
col failgroup for a30
col name for a30
select g.name diskgroup, d.failgroup,  d.name, d.total_mb from v$asm_disk d, v$asm_diskgroup g where g.name = 'DATA1' and d.GROUP_NUMBER = g.GROUP_NUMBER order by g.name, d.failgroup;

DISKGROUP            FAILGROUP                      NAME                                   TOTAL_MB
-------------------- ------------------------------ ------------------------------ ----------------
DATA1               CELL01                         DATA1_CD_00_CELL01             2260992
DATA1                                              DATA1_CD_05_CELL01             2260992
DATA1                                              DATA1_CD_03_CELL01             2260992
DATA1                                              DATA1_CD_04_CELL01             2260992
DATA1                                              DATA1_CD_01_CELL01             2260992
DATA1                                              DATA1_CD_02_CELL01             2260992
                     ******************************                                ----------------
                     total_FG                                                              13565952
DATA1               CELL02                         DATA1_CD_01_CELL02             2260992
DATA1                                              DATA1_CD_05_CELL02             2260992
DATA1                                              DATA1_CD_02_CELL02             2260992
DATA1                                              DATA1_CD_03_CELL02             2260992
DATA1                                              DATA1_CD_00_CELL02             2260992
DATA1                                              DATA1_CD_04_CELL02             2260992
                     ******************************                                ----------------
                     total_FG                                                              13565952
DATA1               CELL03                         DATA1_CD_02_CELL03             2260992
DATA1                                              DATA1_CD_05_CELL03             2260992
DATA1                                              DATA1_CD_01_CELL03             2260992
DATA1                                              DATA1_CD_04_CELL03             2260992
DATA1                                              DATA1_CD_03_CELL03             2260992
DATA1                                              DATA1_CD_00_CELL03             2260992
                     ******************************                                ----------------
                     total_FG                                                              13565952
                                                                                   ----------------
total                                                                                      40697856

For DATA1 diskgroup the USABLE_FILE_MB shows value in Negative (-2550508 MB).

SQL> select name, state, type, total_mb, free_mb, required_mirror_free_mb req_free,  usable_file_mb use_mb from v$asm_diskgroup where name = 'DATA1';

NAME                      STATE       TYPE     TOTAL_MB    FREE_MB   REQ_FREE     USE_MB
------------------------- ----------- ------ ---------- ---------- ---------- ----------
DATA1                      MOUNTED     NORMAL   40697856    8464936   13565952   -2550508
                                                                                                              ----------
total                                                                                                           40697856

TOTAL_MB:- Refers to total capacity of the diskgroup
FREE_MB :- Refers to raw free space available in diskgroup in MB.

FREE_MB = (TOTAL_MB – (HOT_USED_MB + COLD_USED_MB))

REQUIRED_MIRROR_FREE_MB :- Indicates how much free space is required in an ASM disk group to restore redundancy after the failure of an ASM disk or ASM failure group.In exadata it is the disk capacity of one failure group.

USABLE_FILE_MB :- Indicates how much space is available in an ASM disk group considering the redundancy level of the disk group.

Its calculated as :-

USABLE_FILE_MB=(FREE_MB – REQUIRED_MIRROR_FREE_MB ) / 2 –> For Normal Redundancy
USABLE_FILE_MB=(FREE_MB – REQUIRED_MIRROR_FREE_MB ) / 3 –> For High Redundancy

Also to note here is ASM diskgroup do not set aside the space based on reuqired_mirror_free_mb. Its merely calculated and used to derive usable_file_mb.

While reading Mos Doc Id 1551288.1 i came across some interesting terms and script which i wanted to share to everyone (atleast some of you who might not have been familiar)

Failure coverage refers to the amount of space in a disk group that will be used to re-mirror data in the event of some storage failure.

1. Disk Failure Coverage :- Refers to having enough free space to allow data to be re-mirrored (rebalanced) after a single disk failure in Normal redundancy.

2. Cell Failure Coverage :- Refers to having enough free space to allow data to be re-mirrored after loss of One entire Cell Disk.

Reserving space in the disk group means that you monitor the disk group to ensure that FREE_MB never goes below minimum amount needed for disk or cell failure coverage.

I ran the script provided in Mos Docid 1551288 and below was the output :-

Description of Derived Values:
One Cell Required Mirror Free MB : Required Mirror Free MB to permit successful rebalance after losing largest CELL regardless of redundancy type
Disk Required Mirror Free MB     : Space needed to rebalance after loss of single or double disk failure (for normal or high redundancy)
Disk Usable File MB              : Usable space available after reserving space for disk failure and accounting for mirroring
Cell Usable File MB              : Usable space available after reserving space for SINGLE cell failure and accounting for mirroring
.  .  .
ASM Version: 11.2.0.4
.  .  .
----------------------------------------------------------------------------------------------------------------------------------------------------
|          |         |     |          |            |            |            |Cell Req'd  |Disk Req'd  |            |            |    |    |       |
|          |DG       |Num  |Disk Size |DG Total    |DG Used     |DG Free     |Mirror Free |Mirror Free |Disk Usable |Cell Usable |    |    |PCT    |
|DG Name   |Type     |Disks|MB        |MB          |MB          |MB          |MB          |MB          |File MB     |File MB     |DFC |CFC |Util   |
----------------------------------------------------------------------------------------------------------------------------------------------------
|DATA1    |NORMAL   |   18| 2,260,992|  40,697,856|  32,233,944|   8,463,912|  14,922,547|   2,761,008|   2,851,452|  -3,229,318|PASS|FAIL|  79.2%|
|DBFS_DG  |NORMAL   |   12|    34,608|     415,296|      48,076|     367,220|     152,275|      59,425|     153,898|     107,472|PASS|PASS|  11.6%|
|RECO1    |NORMAL   |   18|   565,360|  10,176,480|   1,171,220|   9,005,260|   3,731,376|     703,460|   4,150,900|   2,636,942|PASS|PASS|  11.5%|
----------------------------------------------------------------------------------------------------------------------------------------------------
Cell Failure Coverage Freespace Failures Detected. Warning Message Follows.
Enough Free Space to Rebalance after loss of ONE cell: WARNING (However, cell failure is very rare)
.  .  .
Script completed.

So here i am good with one disk failure but not with One celldisk failure. Basically i need to either add disk to the disk group or free up some space.

This post is more of a note for myself to refer back. Hope it useful for some of you too :)


Filed under: 11gR2, ASM, Oracle Tagged: +ASM, Exadata, Negative usable_file_mb, required_mirror_free_mb, usable_file_mb

SYSAUX Growing rapidly!!! What can be done

$
0
0

Recently i have been working on cleaning up SYSAUX tablespace for few of clients, so thought to put down my steps which might be helpful to some of you out there.

Why does SYSAUX tablespace grows much larger than expected?

There could be number of potential reasons:

1. ASH data has grown too large (SM/AWR)
2. High Retention Period
3. Segment Advisor has grown too large
4. Increase in older version of Optimizer Statistics (SM/OPTSTAT)
5. Bugs Bugs Bugs!!!!!

How do we identify the SYSAUX space Usage?

There are basically 2 ways to identify that i know of

1. Running @?/rdbms/admin/awrinfo.sql –> Detailed Info like Schema breakdown, SYSAUX occupants space usage etc.

2.

COLUMN "Item" FORMAT A25
COLUMN "Space Used (GB)" FORMAT 999.99
COLUMN "Schema" FORMAT A25
COLUMN "Move Procedure" FORMAT A40

SELECT  occupant_name "Item",
   space_usage_kbytes/1048576 "Space Used (GB)",
   schema_name "Schema",
   move_procedure "Move Procedure"
   FROM v$sysaux_occupants
   ORDER BY 2
   /

 col owner for a6
 col segment_name for a50
  select * from
 (select owner,segment_name||'~'||partition_name segment_name,bytes/(1024*1024) size_m
 from dba_segments
 where tablespace_name = 'SYSAUX' ORDER BY BLOCKS desc) where rownum < 11;

In my case, below 2 were occupying most of the space :-

1. SM/AWR
2. SM/OPTSTAT

SM/AWR — It refers to Automatic Workload Repository.Data in this section is retained for a certain amount of time (default 8 days). Setting can be checked through DBA_HIST_WR_CONTROL.

SM/OPSTAT — Stores older data of optimizer statistics.Setting can be checked through dbms_stats.get_stats_history_retention. This is not a part of AWR and is not controlled by AWR retention.

When looking at the top segments, i saw WRH$_ACTIVE_SESSION_HISTORY occupying most of the space. Sometimes AWR tables are not purged to settings in sys.wrm$_wr_control.

As per Oracle :-

Oracle decides what rows need to be purged based on the retention policy. There is a special mechanism which is used in the case of the large AWR tables where we store the snapshot data in partitions. One method of purging data from these tables is by removing partitions that only contain rows that have exceeded the retention criteria. During the nightly purge task, we only drop the partition if all the data in the partition has expired. If the partition contains at least one row which, according to the retention policy shouldn’t be removed, then the partition won’t be dropped and as such the table will contain old data.

If partition splits do not occur (for whatever reason), then we can end up with a situation where we have to wait for the latest entries to expire before the partition that they sit in can be removed. This can mean that some of the older entries can be retained significantly past their expiry date. The result of this is that the data is not purged as expected.

Diagnose and Reduce Used Space of SYSAUX.

Once the major occupants and top segments is identified as discussed above, we can start with the steps to rectify it.

Expecting SM/AWR occupying most of the space , i think we can follow 3 methods. In this blog i will be posting one of the method only :)

To check Orphaned ASH rows :-

 SELECT COUNT(1) Orphaned_ASH_Rows FROM wrh$_active_session_history a
  WHERE NOT EXISTS
  (SELECT 1
  FROM wrm$_snapshot
  WHERE snap_id       = a.snap_id
  AND dbid            = a.dbid
  AND instance_number = a.instance_number
  );

Check minimum snap_id in ASH table and then compare to the minimum snap_id in dba_hist_snapshot.

select min(snap_id) from WRH$_ACTIVE_SESSION_HISTORY;
select min(snap_id) from dba_hist_snapshot;

Example :-

select min(snap_id),MAX(snap_id) from dba_hist_snapshot;

MIN(SNAP_ID) MAX(SNAP_ID)
------------ ------------
       17754        18523

select min(snap_id),MAX(snap_id) from WRH$_ACTIVE_SESSION_HISTORY;

MIN(SNAP_ID) MAX(SNAP_ID)
------------ ------------
           1        18523

Above as per the retention period, we should have data from snap_id 17754 till 18523, but the WRH$_ASH table has data from snap_id 1.

From Oracle MOS Doc :-

A potential solution to this issue is to manually split the partitions of the partitioned AWR objects such that there is more chance of the split partition being purged.You will still have to wait for all the rows in the new partitions to reach their retention time but with split partitions there is more chance of this happening. you can manually split the partitions using the following undocumented command:

alter session set “_swrf_test_action” = 72;



select table_name, count(*) from dba_tab_partitions where table_name like 'WRH$%' and table_owner = 'SYS'
group by table_name order by 1;

TABLE_NAME                                           COUNT(*)
-------------------------------------------------- ----------
WRH$_ACTIVE_SESSION_HISTORY                                 2
WRH$_DB_CACHE_ADVICE                                        2
WRH$_DLM_MISC                                               2
WRH$_EVENT_HISTOGRAM                                        2
WRH$_FILESTATXS                                            11
WRH$_INST_CACHE_TRANSFER                                    2
WRH$_INTERCONNECT_PINGS                                     2
........................
25 rows selected.

SQL>  alter session set "_swrf_test_action"=72; 

Session altered.

SQL>  select table_name,partition_name from dba_tab_partitions where table_name = 'WRH$_ACTIVE_SESSION_HISTORY';

TABLE_NAME                                         PARTITION_NAME
------------------------------  -------------------------------------------------------
WRH$_ACTIVE_SESSION_HISTORY                        WRH$_ACTIVE_1798927129_0
WRH$_ACTIVE_SESSION_HISTORY                        WRH$_ACTIVE_1798927129_18531  --> New Partition created 
WRH$_ACTIVE_SESSION_HISTORY                        WRH$_ACTIVE_SES_MXDB_MXSN

col table_name for a80
select table_name, count(*) from dba_tab_partitions where table_name like 'WRH$%' and table_owner = 'SYS' group by table_name order by 1

TABLE_NAME                                   COUNT(*)
------------------------------------------- ----------
WRH$_ACTIVE_SESSION_HISTORY                     3
WRH$_DB_CACHE_ADVICE                            3
WRH$_DLM_MISC                                   3
WRH$_EVENT_HISTOGRAM                            3
......................

25 rows selected.

In the above example, WRH$_ACTIVE_1798927129_18531 is the new partition created where 1798927129 being the DBID and 18531 is the max(snap_id) when it was partitioned. So, now we can start dropping the snapshots range,which in my case is from 1 to 17753 as 17754 is the min(snap_id) in dba_hist_snapshot.

SQL> EXEC dbms_workload_repository.drop_snapshot_range(1,17753,1798927129);

It can generate good amount of redo and use undo. So keep monitoring undo tablespace and make sure you have sufficient space.

So, what happens when run the above :-

SQL> @sqlid ft7m07stk3dws
old   9:        sql_id = ('&1')
new   9:        sql_id = ('ft7m07stk3dws')

SQL_ID                                  HASH_VALUE SQL_TEXT
--------------------------------------- ---------- ------------------------------------------------------------------------------------------------------------------------------------------------------
ft7m07stk3dws                            857847704 delete from WRH$_SYSTEM_EVENT tab where (:beg_snap <= tab.snap_id and         tab.snap_id = b.start_snap_id) and
                                                   (tab.snap_id  @sqlid 854knbb15976z
old   9:        sql_id = ('&1')
new   9:        sql_id = ('854knbb15976z')

SQL_ID                                  HASH_VALUE SQL_TEXT
--------------------------------------- ---------- ------------------------------------------------------------------------------------------------------------------------------------------------------
854knbb15976z                           3260325087 delete from WRH$_SQLSTAT tab where (:beg_snap <= tab.snap_id and         tab.snap_id = b.start_snap_id) and
                                                   (tab.snap_id <= b.end_snap_id))

So, internally oracle runs delete command which cause high redo and undo generation :)

Once the procedure is completed successfully, check the min(snap_id) in WRH$_ACTIVE_SESSION_HISTORY and perform shrink space cascade.


elect owner,segment_name,round(sum(bytes/1024/1024),2)MB, tablespace_name from dba_segments where segment_name = upper('WRH$_ACTIVE_SESSION_HISTORY') group by owner,segment_name,tablespace_name

OWNER       SEGMENT_NAME                      MB           TABLESPACE_NAME
-------  ---------------------------------- -----------  -------------------
SYS        WRH$_ACTIVE_SESSION_HISTORY        3538.06          SYSAUX

SQL> alter table WRH$_ACTIVE_SESSION_HISTORY shrink space cascade;

Table altered.


OWNER       SEGMENT_NAME                      MB           TABLESPACE_NAME
-------  ---------------------------------- -----------  -------------------
SYS        WRH$_ACTIVE_SESSION_HISTORY        46.75          SYSAUX

In similar fashion, other WRH$ tables can be shrink ed to free up space in SYSAUX.

Hope this helps!!!

Reference :-

WRH$_ACTIVE_SESSION_HISTORY Does Not Get Purged Based Upon the Retention Policy (Doc ID 387914.1)
Suggestions if Your SYSAUX Tablespace Grows Rapidly or Too Large (Doc ID 1292724.1)


Filed under: 10gR2, 11gR2, Oracle, SYSAUX Tagged: awrinfo.sql, dba_hist_snapshot, drop_snapshot_range, Orphaned_ASH_Rows, sysaux, sysaux growing, sysaux purging, v$sysaux_occupants, WRH$, WRH$_active_session_history, _swrf_test_action

Purging SYSAUX

$
0
0

In continuation to my previous post “SYSAUX Growing Rapidly” , here i wanted to present the second method of purging sysaux.

Basically i tried to perform the steps as mentioned in previous post and drop_snapshot_range was taking too long (> 24hrs) and still running on test db.Again WRH$_ACTIVE_SESSION_HISTORY was in top of the list occupying most of the SYSAUX space.

SYS01> EXEC dbms_workload_repository.drop_snapshot_range(25155,26155,3179571572);


From Another session after some time 

SYS01> @asw

USERNAME		      SID    SERIAL# SPID	EVENT			       LAST_CALL_ET  WAIT_TIME SECONDS_IN_WAIT STATE		   SQL_ID	 PLAN_HASH_VALUE
------------------------- ------- ---------- ---------- ------------------------------ ------------ ---------- --------------- ------------------- ------------- ---------------
SYS			     7654	8641 47879	db file sequential read 		 28	    -1		     0 WAITED SHORT TIME   fqq01wmb4hgt8       763705880

SYS01> @orax fqq01wmb4hgt8
old   7:       sql_id  = '&&1'
new   7:       sql_id  = 'fqq01wmb4hgt8'

SQL_ID
-------------
SQL_FULLTEXT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
fqq01wmb4hgt8
delete from WRH$_FILESTATXS tab where (:beg_snap <= tab.snap_id and	    tab.snap_id = b.start_snap_id) and			  (tab.snap_id <= b.end_snap_id))

SELECT * FROM table(DBMS_XPLAN.DISPLAY_CURSOR('fqq01wmb4hgt8',NULL,'typical +peeked_binds allstats last'))

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID	fqq01wmb4hgt8, child number 0
-------------------------------------
delete from WRH$_FILESTATXS tab where (:beg_snap <= tab.snap_id and
    tab.snap_id =
b.start_snap_id) and			      (tab.snap_id <=
b.end_snap_id))

Plan hash value: 763705880

---------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation			 | Name 	      | E-Rows |E-Bytes| Cost (%CPU)| E-Time   | Pstart| Pstop |  OMem |  1Mem | Used-Mem |
---------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | DELETE STATEMENT		 |		      |        |       |   325K(100)|	       |       |       |       |       |	  |
|   1 |  DELETE 			 | WRH$_FILESTATXS    |        |       |	    |	       |       |       |       |       |	  |
|*  2 |   FILTER			 |		      |        |       |	    |	       |       |       |       |       |	  |
|   3 |    MERGE JOIN ANTI		 |		      |    494M|    23G|   325K  (1)| 01:05:08 |       |       |       |       |	  |
|   4 |     PARTITION RANGE ITERATOR	 |		      |    494M|  8957M|   325K  (1)| 01:05:08 |   KEY |   KEY |       |       |	  |
|*  5 |      INDEX RANGE SCAN		 | WRH$_FILESTATXS_PK |    494M|  8957M|   325K  (1)| 01:05:08 |   KEY |   KEY |       |       |	  |
|*  6 |     FILTER			 |		      |        |       |	    |	       |       |       |       |       |	  |
|*  7 |      SORT JOIN			 |		      |      1 |    33 |     2	(50)| 00:00:01 |       |       | 73728 | 73728 |	  |
|*  8 |       TABLE ACCESS BY INDEX ROWID| WRM$_BASELINE      |      1 |    33 |     1	 (0)| 00:00:01 |       |       |       |       |	  |
|*  9 |        INDEX RANGE SCAN 	 | WRM$_BASELINE_PK   |      1 |       |     1	 (0)| 00:00:01 |       |       |       |       |	  |
---------------------------------------------------------------------------------------------------------------------------------------------------

Peeked Binds (identified by position):
--------------------------------------

   1 - (NUMBER): 0
   2 - (NUMBER): 95781
   3 - (NUMBER): 3179571572

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(:BEG_SNAP=:BEG_SNAP AND "TAB"."SNAP_ID"="B"."START_SNAP_ID" AND "TAB"."SNAP_ID"=:BEG_SNAP AND "B"."START_SNAP_ID"<=:END_SNAP))
   9 - access("B"."DBID"=:DBID)

SYS01> col name format a10;
col VALUE_STRING format a30;
select name, position, datatype_string, was_captured, value_string,
anydata.accesstimestamp(value_anydata) from v$sql_bind_capture where sql_id = '&sqlid';
SYS01>   2  Enter value for sqlid: fqq01wmb4hgt8
old   2: anydata.accesstimestamp(value_anydata) from v$sql_bind_capture where sql_id = '&sqlid'
new   2: anydata.accesstimestamp(value_anydata) from v$sql_bind_capture where sql_id = 'fqq01wmb4hgt8'

NAME	     POSITION DATATYPE_STRING						   WAS VALUE_STRING
---------- ---------- ------------------------------------------------------------ --- ------------------------------
ANYDATA.ACCESSTIMESTAMP(VALUE_ANYDATA)
---------------------------------------------------------------------------
:BEG_SNAP	    1 NUMBER							   YES 0


:END_SNAP	    2 NUMBER							   YES 95781


:DBID		    3 NUMBER							   YES 3179571572

Interestingly, looking at the bind values shows value_string 0 and 95781 for BEG_SNAP and END_SNAP respectively, though the input range for drop snapshot was between 25155 and 26155.

The database was refreshed by client (so my session was no more) and so i thought not to take drop_snapshot_range approach. After going through few blogs and MOS documents, i thought we had 2 approaches :-

1. “Recreate the AWR tables as in the MOS note 782974.1″ , which would basically drop all WRH$* table and then recreate. The AWR tables contains wealth of important performance data which can be very useful in performance tuning trend analysis and also in comparing performance between two separate periods of time.Hence recreating AWR,I believe should be the last resort. The activity needs to be done in startup restrict mode so requires downtime.

And if you plan to go forward with it, I would recommend to export the AWR snapshot data using @?/rdbms/admin/awrextr.sql and keep the dump. In future it can used by simply importing to some other repository db to get the AWR data.

2. Simply delete the Orphaned rows from WRH$_ACTIVE_SESSION_HISTORY table and perform shrink space cascade.

I went ahead with 2nd approach and performed the below steps (Note: – DB was a single instance db)


SYS01> SELECT COUNT(1) Orphaned_ASH_Rows
FROM wrh$_active_session_history a
WHERE NOT EXISTS
  (SELECT 1
  FROM wrm$_snapshot
  WHERE snap_id       = a.snap_id
  AND dbid            = a.dbid
  AND instance_number = a.instance_number
  );

ORPHANED_ASH_ROWS
-----------------
        301206452

SYS01> alter table wrh$_active_session_history parallel 4;

Table altered.

SYS01> alter session force parallel dml;

Session altered.

SYS01> DELETE /*+ PARALLEL(a,4) */
FROM wrh$_active_session_history a
WHERE NOT EXISTS
  (SELECT 1
  FROM wrm$_snapshot
  WHERE snap_id       = a.snap_id
  AND dbid            = a.dbid
  AND instance_number = a.instance_number
  );


From Another session :-

SYS01> @asw

USERNAME		      SID    SERIAL# SPID	EVENT			       LAST_CALL_ET  WAIT_TIME SECONDS_IN_WAIT STATE
------------------------- ------- ---------- ---------- ------------------------------ ------------ ---------- --------------- -------------------
SQL_ID	      PLAN_HASH_VALUE
------------- ---------------
SYS			      921	1329 107213	db file sequential read 		 60	     0		     0 WAITING
144bpj4qg68m1	   2217072169

SYS			     1227	 889 107215	db file sequential read 		 60	     0		     0 WAITING
144bpj4qg68m1	   2217072169

SYS			     9181	3277 107211	db file sequential read 		 60	     1		     0 WAITED KNOWN TIME
144bpj4qg68m1	   2217072169

SYS			     3370	 455 107727	SQL*Net message to client		  0	    -1		     0 WAITED SHORT TIME
8tfjp8cd2xtd1	    193683216

SYS			     1840	 809 107217	PX Deq Credit: need buffer		 60	     0		     0 WAITING
144bpj4qg68m1	   2217072169

SYS			     8875	3889 107209	db file sequential read 		 60	     1		     0 WAITED KNOWN TIME
144bpj4qg68m1	   2217072169

SYS			     8266	3139 90257	PX Deq: Execute Reply	 60	     0		    60 WAITING
144bpj4qg68m1	   2217072169


SYS01> @parallel_sess

Username     QC/Slave SlaveSet SID					Slave INS STATE    WAIT_EVENT			  QC SID QC INS Req. DOP Actual DOP
------------ -------- -------- ---------------------------------------- --------- -------- ------------------------------ ------ ------ -------- ----------
SYS	     QC 	       8266					1	  WAIT	   PX Deq: Execute Reply	  8266
 - p000      (Slave)  1        8875					1	  WAIT	   db file sequential read	  8266	 1	       4	  4
 - p003      (Slave)  1        1227					1	  WAIT	   db file sequential read	  8266	 1	       4	  4
 - p001      (Slave)  1        9181					1	  WAIT	   db file sequential read	  8266	 1	       4	  4
 - p002      (Slave)  1        921					1	  WAIT	   db file sequential read	  8266	 1	       4	  4
 - p004      (Slave)  2        1840					1	  WAIT	   PX Deq Credit: send blkd	  8266	 1	       4	  4
 - p007      (Slave)  2        2757					1	  WAIT	   PX Deq: Execution Msg	  8266	 1	       4	  4
 - p006      (Slave)  2        2450					1	  WAIT	   PX Deq: Execution Msg	  8266	 1	       4	  4
 - p005      (Slave)  2        2147					1	  WAIT	   PX Deq: Execution Msg	  8266	 1	       4	  4

SYS01> SELECT * FROM table(DBMS_XPLAN.DISPLAY_CURSOR('144bpj4qg68m1',NULL,'typical +peeked_binds allstats last'))

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID	144bpj4qg68m1, child number 0
-------------------------------------
DELETE /*+ PARALLEL(a,4) */ FROM wrh$_active_session_history a WHERE
NOT EXISTS   (SELECT 1	 FROM wrm$_snapshot   WHERE snap_id	  =
a.snap_id   AND dbid		= a.dbid   AND instance_number =
a.instance_number   )

Plan hash value: 2217072169

-----------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation			 | Name 			  | E-Rows |E-Bytes| Cost (%CPU)| E-Time   | Pstart| Pstop |	TQ  |IN-OUT| PQ Distrib |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | DELETE STATEMENT		 |				  |	   |	   |   106K(100)|	   |	   |	   |	    |	   |		|
|   1 |  PX COORDINATOR 		 |				  |	   |	   |		|	   |	   |	   |	    |	   |		|
|   2 |   PX SEND QC (RANDOM)		 | :TQ10001			  |    298M|	11G|   106K  (1)| 00:21:14 |	   |	   |  Q1,01 | P->S | QC (RAND)	|
|   3 |    DELETE			 | WRH$_ACTIVE_SESSION_HISTORY	  |	   |	   |		|	   |	   |	   |  Q1,01 | PCWP |		|
|   4 |     PX RECEIVE			 |				  |    298M|	11G|   106K  (1)| 00:21:14 |	   |	   |  Q1,01 | PCWP |		|
|   5 |      PX SEND HASH (BLOCK ADDRESS)| :TQ10000			  |    298M|	11G|   106K  (1)| 00:21:14 |	   |	   |  Q1,00 | P->P | HASH (BLOCK|
|   6 |       NESTED LOOPS ANTI 	 |				  |    298M|	11G|   106K  (1)| 00:21:14 |	   |	   |  Q1,00 | PCWP |		|
|   7 |        PX PARTITION RANGE ALL	 |				  |    298M|  7404M|   106K  (1)| 00:21:14 |	 1 |	 3 |  Q1,00 | PCWC |		|
|   8 | 	INDEX FULL SCAN 	 | WRH$_ACTIVE_SESSION_HISTORY_PK |    298M|  7404M|   106K  (1)| 00:21:14 |	 1 |	 3 |  Q1,00 | PCWP |		|
|*  9 |        INDEX UNIQUE SCAN	 | WRM$_SNAPSHOT_PK		  |	 1 |	15 |	 0   (0)|	   |	   |	   |  Q1,00 | PCWP |		|
-----------------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   9 - access("DBID"="A"."DBID" AND "SNAP_ID"="A"."SNAP_ID" AND "INSTANCE_NUMBER"="A"."INSTANCE_NUMBER")

The deletion of rows (301206452 rows) completed with elapsed time of 12:59:38.44.


301206452 rows deleted.

Elapsed: 12:59:38.44

SYS01> alter table wrh$_active_session_history noparallel ;

Table altered.

SYS01> select degree from dba_tables where table_name=upper('wrh$_active_session_history');

DEGREE
----------
         1

SYS01> SELECT COUNT(1) Orphaned_ASH_Rows FROM wrh$_active_session_history a
WHERE NOT EXISTS
  (SELECT 1
  FROM wrm$_snapshot
  WHERE snap_id       = a.snap_id
  AND dbid            = a.dbid
  AND instance_number = a.instance_number
  );

ORPHANED_ASH_ROWS
-----------------
	   309984

SYS01> DELETE /*+ PARALLEL(a,4) */ FROM wrh$_active_session_history a
WHERE NOT EXISTS
  (SELECT 1
  FROM wrm$_snapshot
  WHERE snap_id       = a.snap_id
  AND dbid            = a.dbid
  AND instance_number = a.instance_number
  );  

309984 rows deleted.

Elapsed: 00:00:19.08
SYS01> commit;

Commit complete.

Elapsed: 00:00:00.07

SYS01> SELECT COUNT(1) Orphaned_ASH_Rows FROM wrh$_active_session_history a
WHERE NOT EXISTS
  (SELECT 1
  FROM wrm$_snapshot
  WHERE snap_id       = a.snap_id
  AND dbid            = a.dbid
  AND instance_number = a.instance_number
  );

ORPHANED_ASH_ROWS
-----------------
		0

SYS01> alter table wrh$_active_session_history shrink space cascade;

Table altered.

Elapsed: 06:47:21.36

Before this activity SM/AWR was occupying 339Gb which reduced to 209Gb. Also had SM/OPTSTAT occupying 143Gb space and after confirmation from client purged the stats as it was test db.

SYS01> exec DBMS_STATS.PURGE_STATS(DBMS_STATS.PURGE_ALL);

PL/SQL procedure successfully completed.

Elapsed: 00:00:02.54

SYS01> COLUMN "Item" FORMAT A25
 COLUMN "Space Used (GB)" FORMAT 999.99
 COLUMN "Schema" FORMAT A25
 COLUMN "Move Procedure" FORMAT A40

    SELECT  occupant_name "Item",
    space_usage_kbytes/1048576 "Space Used (GB)",
    schema_name "Schema",
    move_procedure "Move Procedure"
    FROM v$sysaux_occupants
    WHERE occupant_name in  ('SM/AWR','SM/OPTSTAT')
    ORDER BY 1
    /23:47:31 EMTSYS01> 

Item                      Space Used (GB) Schema                    Move Procedure
------------------------- --------------- ------------------------- ----------------------------------------
SM/AWR                             209.16 SYS
SM/OPTSTAT                          19.72 SYS

Saving in SYSAUX


TABLESPACE_NAME                  TSP_SIZE USED_SPACE FREE_SPACE   PCT_FREE
------------------------------ ---------- ---------- ---------- ----------
SYSAUX                             505856     496310       9546       1.89  --> Before Size

SYSAUX                             505856     237833     268023      52.98  --> After Size

Hope this helps :)


Filed under: 10gR2, 11gR2, Oracle, SYSAUX Tagged: awrextr.sql, awrinfo.sql, dba_hist_snapshot, drop_snapshot_range, Orphaned_ASH_Rows, purge, sysaux, sysaux growing, sysaux purging, v$sysaux_occupants, WRH$, WRH$_active_session_history, _swrf_test_action

Sangam 2014

$
0
0

I am back from AIOUG meet “SANGAM 14 – Meeting of Minds” and it was a wonderful experience. Had really nice time meeting some old friends and making few new ones :). I think i should mention here, finally i met Amit Bansal , mind behind http://askdba.org/ . It was fun meeting him :)

It was 3 day conference with

1st day :- Optimizer Master Class by Tom Kyte. It was full day seminar. Learned some new optimizer features of 12c. We all were very thankful to Tom for being in India for the 3rd time to present in Sangam.

2nd day :- The day started with “What You Need To Know About Oracle Database In-Memory Option” by Maria Colgan. If i need to describe in a word i would say “Awesome!!!!” . I loved every minute of the presentation and was well presented and very informative. Rest of the day i moved from room to room, attending some good sessions on 12c and optimization.

3rd day: – I wasn’t much serious on the 3rd day :D . Mostly spent time meeting people around and discussing about oracle. What i loved on 3rd day was an hour session on “Time to Reinvent Yourself – Through Learning, Leading, and Failing” by Dr. Rajdeep Manwani. He shared his life experiences and some truth about human nature. It was amazing.

Overall it was nice to be part of Sangam 14 and learn some really new things mostly on 12c. Thanks to all the speakers and organizing committee for the efforts and valuable time they have put in.

Thanks all!!!!!


Filed under: AIOUG, Oracle, Sangam Tagged: AIOUG, Oracle, Sangam 14

ORA-00600 – [kfnsInstanceReg00]

$
0
0

An interesting case of ORA-600 :)

I was paged for standby lag and started looking into the issue. The db version was 11.1.0.7.0 version. Standby was lagging behind by 36 archives

SQL> @mrp

   INST_ID PROCESS   STATUS          THREAD#  SEQUENCE#     BLOCK# DELAY_MINS
---------- --------- ------------ ---------- ---------- ---------- ----------
         1 MRP0      WAIT_FOR_LOG          1     568330          0          0 


   INST_ID PROCESS   STATUS          THREAD#  SEQUENCE#     BLOCK# DELAY_MINS
---------- --------- ------------ ---------- ---------- ---------- ----------
         1 RFS       IDLE                  0          0          0          0
         1 RFS       IDLE                  0          0          0          0
         1 RFS       WRITING               1     568330          1          0 
         1 RFS       IDLE                  0          0          0          0
         1 RFS       IDLE                  1     568366     309529          0

After sometime checked back again and still on the same sequence#. Status of MRP and RFS

SQL> @mrp

   INST_ID PROCESS   STATUS          THREAD#  SEQUENCE#     BLOCK# DELAY_MINS
---------- --------- ------------ ---------- ---------- ---------- ----------
         1 MRP0      WAIT_FOR_LOG          1     568330          0          0


   INST_ID PROCESS   STATUS          THREAD#  SEQUENCE#     BLOCK# DELAY_MINS
---------- --------- ------------ ---------- ---------- ---------- ----------
         1 RFS       IDLE                  0          0          0          0
         1 RFS       IDLE                  0          0          0          0
         1 RFS       WRITING               1     568330          1          0 
         1 RFS       IDLE                  0          0          0          0
         1 RFS       IDLE                  1     568366     795006          0

No error was reported in alert log or was in v$archive_dest_status.Archive log genertaion on Primary was usual like any other day, nothing changed.Seq# 568330 was present on Primary and size was 497Mb whereas the same seq# size on standby was 513Mb. I am not sure why!!!

So, i thought to copy the archive from the primary, stop mrp , perform manual recovery using ‘recover standby standby’ as other 35 archives were present on disk and then start real time apply.

Once the copy completed, i stopped the MRP and started manual recovery

SQL> alter database recover managed standby database cancel;

Database altered.

SQL> select name,open_mode,database_role from v$database;

NAME      OPEN_MODE  DATABASE_ROLE
--------- ---------- ----------------
TEST      MOUNTED    PHYSICAL STANDBY

SQL> recover standby database;
ORA-00279: change 69956910076 generated at 10/26/2014 21:46:36 needed for
thread 1
ORA-00289: suggestion :
/oraclebackup/fra/TEST/archivelog/2014_10_26/o1_mf_1_568330_b4vdl546_.arc
ORA-00280: change 69956910076 for thread 1 is in sequence #568330


Specify log: {=suggested | filename | AUTO | CANCEL}
AUTO
ORA-16145: archival for thread# 1 sequence# 568330 in progress 
SQL> @mrp

no rows selected


   INST_ID PROCESS   STATUS          THREAD#  SEQUENCE#     BLOCK# DELAY_MINS
---------- --------- ------------ ---------- ---------- ---------- ----------
         1 RFS       IDLE                  0          0          0          0
         1 RFS       IDLE                  0          0          0          0
         1 RFS       WRITING               1     568330          1          0 
         1 RFS       IDLE                  0          0          0          0
         1 RFS       IDLE                  1     568372     563688          0

At this point i thought to bounce the standby and then perform recovery. I issued shutdown immediate and it was hung. Almost after 10mins alert log showed

Active call for process 45660 user 'oracle' program 'oracle@test.com'
SHUTDOWN: waiting for active calls to complete.

I checked the PID 45660 and as it showed LOCAL=NO, did kill -9

[oracle@test:~/app/oracle (test)]$ ps -ef | grep 45660
oracle   41463 16050  0 00:43 pts/0    00:00:00 grep 45660
oracle   45660     1  1 Oct23 ?        01:09:22 oracletest (LOCAL=NO)
[oracle@test:~/app/oracle (test)]$ kill -9 45660 
[oracle@test:~/app/oracle (test)] ps -ef | grep 45660
oracle   42181 16050  0 00:44 pts/0    00:00:00 grep 45660
oracle   45660     1  1 Oct23 ?        01:09:22 oracletest (LOCAL=NO)

Kill -9 didn’t kill the process. As it was standby, i thought to go ahead and Abort the instance and this is what happend when i tried startup mount

SQL> startup mount
ORACLE instance started.

Total System Global Area 6.1466E+10 bytes
Fixed Size                  2174600 bytes
Variable Size            2.0535E+10 bytes
Database Buffers         4.0802E+10 bytes
Redo Buffers              126640128 bytes

ORA-03113: end-of-file on communication channel
Process ID: 47331
Session ID: 995 Serial number: 3

Alert log showed

Mon Oct 27 00:48:09 2014
ALTER DATABASE MOUNT
Mon Oct 27 00:53:10 2014
System State dumped to trace file /home/oracle/app/oracle/diag/rdbms/test_TEST_db/test/trace/test_ckpt_47252.trc
Mon Oct 27 00:53:11 2014
Errors in file /home/oracle/app/oracle/diag/rdbms/test_TEST_db/test/trace/test_asmb_47260.trc (incident=491160):
ORA-00600: internal error code, arguments: [600], [ORA_NPI_ERROR], [ORA-00600: internal error code, arguments: [kfnsInstanceReg00], [test:test_TEST_DB], [30000], [], [], [], [], [], [], [], [], []
], [], [], [], [], [], [], [], [], []
Incident details in: /home/oracle/app/oracle/diag/rdbms/test_TEST_db/test/incident/incdir_491160/test_asmb_47260_i491160.trc
Errors in file /home/oracle/app/oracle/diag/rdbms/test_TEST_db/test/trace/test_asmb_47260.trc:
ORA-15064: communication failure with ASM instance

ORA-00600: internal error code, arguments: [600], [ORA_NPI_ERROR], [ORA-00600: internal error code, arguments: [kfnsInstanceReg00], [test:test_TEST_DB], [30000], [], [], [], [], [], [], [], [], []
], [], [], [], [], [], [], [], [], []
Mon Oct 27 00:53:12 2014
Mon Oct 27 00:53:12 2014
Sweep Incident[491160]: completed
Mon Oct 27 00:53:12 2014
Trace dumping is performing id=[cdmp_20141027005312]
Errors in file /home/oracle/app/oracle/diag/rdbms/test_TEST_db/prod/trace/test_ckpt_47252.trc:
ORA-15083: failed to communicate with ASMB background process
CKPT (ospid: 47252): terminating the instance due to error 15083
Mon Oct 27 00:53:20 2014
Instance terminated by CKPT, pid = 47252

The CKPT trace showed


15083: Timeout waiting for ASMB to be ready (state=1)
ASMB is ALIVE
—– Abridged Call Stack Trace —–

SYSTEM STATE (level=10, with short stacks)
————
System global information:
processes: base 0xe90ea0088, size 900, cleanup 0xea0ec6598
allocation: free sessions 0xeb8f53d68, free calls (nil)
control alloc errors: 0 (process), 0 (session), 0 (call)
PMON latch cleanup depth: 0
seconds since PMON's last scan for dead processes: 1
system statistics:
22 logons cumulative
19 logons current
5 opened cursors cumulative
1 opened cursors current
0 user commits
0 user rollbacks
51 user calls
19 recursive calls
0 recursive cpu usage
0 session logical reads
0 session stored procedure space
0 CPU used when call started
0 CPU used by this session
1 DB time

Looking at the ASM instance alert log

System State dumped to trace file /home/oracle/app/oracle/diag/asm/+asm/+ASM/trace/+ASM_ora_33024.trc
Errors in file /home/oracle/app/oracle/diag/asm/+asm/+ASM/trace/+ASM_ora_33024.trc:

more /home/oracle/app/oracle/diag/asm/+asm/+ASM/trace/+ASM_ora_33024.trc


*** ACTION NAME:() 2014-10-27 02:18:39.197

kfnmsg_wait: KSR failure reaping message (0xffffff83)

*** 2014-10-27 02:18:39.197
Communication failure sending message to UFGs (0xffffff83)
kfnmsg_wait: opcode=35 msg=0x6b3f0928
2014-10-27 02:18:39.197762 :kfnmsg_Dump(): kfnmsg_SendtoUFGs: (kfnmsg) op=35 (KFNMS_GROUP_RLS) (gnum) [1.0]
CommunicationFailure 0xffffff83 after 300s

The server was hosting 2 standby db instances and 1 ASM instance. We thought to bring down the other standby db instance and then the ASM instance and then finally restart all. The shutdown immediate for 2nd standby instance was hung for long time so we aborted it. After ASM was brought down, still we could see below 5 pids

$ ps -ef | grep oracle
oracle 12032 16050 0 02:44 pts/0 00:00:00 grep arc
oracle 23135 1 0 Oct18 ? 00:02:15 ora_arc5_test
oracle 25390 1 0 Oct18 ? 00:01:19 ora_arc1_test2
oracle 25392 1 0 Oct18 ? 00:00:50 ora_arc2_test2

oracle 45660 1 1 Oct23 ? 01:09:22 oracleprod (LOCAL=NO)

Even after all the instances were down on server we could see few archiver Pids and a Pid with LOCAL=NO. Kill -9 also didn’t kill the pid.

Started the ASM instance and then when i started the standby instance TEST , ASM crashed

NOTE: check client alert log.
NOTE: Trace records dumped in trace file /home/oracle/app/oracle/diag/asm/+asm/+ASM/trace/+ASM_ora_57380.trc
Errors in file /home/oracle/app/oracle/diag/asm/+asm/+ASM/trace/+ASM_ora_57380.trc (incident=280159):
ORA-00600: internal error code, arguments: [kfupsRelease01], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /home/oracle/app/oracle/diag/asm/+asm/+ASM/incident/incdir_280159/+ASM_ora_57380_i280159.trc

At this point escalated to client for server reboot. After the server reboot was done, all the instances came up properly. Started MRP and standby came insync.

Also you can reference to the below Metalink documents

Database startup Reports ORA-00600:[kfnsInstanceReg00] (Doc ID 1552422.1)
Bug 6934636 – Hang possible for foreground processes in ASM instance (Doc ID 6934636.8)
ORA-600 [kfnsInstanceReg00] and long waits on “ASM FILE METADATA OPERATION” (Doc ID 1480320.1)


Filed under: 11gR1, Oracle Tagged: ASMB is ALIVE, kfnsInstanceReg00, kfupsRelease01, ORA_NPI_ERROR, UFGs

Upgrading to 11.2.0.4 – Dictionary View Performing Poor

$
0
0

Just a quick blog post on things you might see after upgrading to 11.2.0.4. We recently upgraded database from 11.2.0.3 to 11.2.0.4 and query on some data dictionary views ran too slow.

1. Performace of query on dba_free_space degraded
2. Performance of query involving dba_segments is slow

DEV01> select ceil(sum(b.bytes)/1024/1024) b from sys.dba_free_space b;

Elapsed: 01:31:45.78

Searching MOS pointed to these Doc Ids :-

Insert Statement Based Upon a Query Against DBA_SEGMENTS is Slow After Applying 11.2.0.4 Patchset (Doc ID 1915426.1)

Query Against DBA_FREE_SPACE is Slow After Applying 11.2.0.4 (Doc ID 1904677.1)

DEV01> select ceil(sum(b.bytes)/1024/1024) b from sys.dba_free_space b;

Elapsed: 00:00:01.38

Filed under: 11gR2, Oracle, Upgrade Tagged: 11.2.0.4, dba_free_space, dba_segments, dictionary views, upgrade

ORA-00600: [ktbdchk1: bad dscn]

$
0
0

Last week I had performed switchover activity of database on version 11.2.0.3 The switchover was performed using dgmgrl “swicthover to standby” command. After sometime we started receiving “ORA-00600: [ktbdchk1: bad dscn]” on the primary database.

Tue Dec 16 10:33:26 2014
Errors in file /ora_software/diag/rdbms/db02_dbv/dbv/trace/db02_ora_16271.trc  (incident=434103):
ORA-00600: internal error code, arguments: [ktbdchk1: bad dscn], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /ora_software/diag/rdbms/db02_dbv/dbv/incident/incdir_434103/db02_ora_16271_i434103.trc

The trace file showed

*** ACTION NAME:() 2014-12-16 10:33:26.857

Dump continued from file: /ora_software/diag/rdbms/db02_dbv/dbv/trace/db02_ora_16271.trc
ORA-00600: internal error code, arguments: [ktbdchk1: bad dscn], [], [], [], [], [], [], [], [], [], [], []

========= Dump for incident 434103 (ORA 600 [ktbdchk1: bad dscn]) ========
----- Beginning of Customized Incident Dump(s) -----
[ktbdchk] -- ktbgcl4 -- bad dscn
dependent scn: 0x0008.f197f24e recent scn: 0x0008.c7313a4c current scn: 0x0008.c7313a4c
----- End of Customized Incident Dump(s) -----
*** 2014-12-16 10:33:26.961
dbkedDefDump(): Starting incident default dumps (flags=0x2, level=3, mask=0x0)
----- Current SQL Statement for this session (sql_id=dmrpvzvbbsuy5) -----
INSERT INTO MSG_OPEN( SITE_ID, CLIENT_ID, CAMP_ID, MESSAGE_ID, ID, USER_ID, ADDRESS, DATE_OPENED) VALUES( :B7 , :B6 , :B5 , :B4 , LOG_SEQ.NEXTVAL, :B3 , :B2 , :B1 )
----- PL/SQL Stack -----
----- PL/SQL Call Stack -----
  object      line  object
  handle    number  name
0x32276ee18       611  package body TP.EM_PKG
0x31cbac388         3  anonymous block
----- Call Stack Trace -----
calling              call     entry                argument values in hex
location             type     point                (? means dubious value)
-------------------- -------- -------------------- ----------------------------
skdstdst()+36        call     kgdsdst()            000000000 ? 000000000 ?
                                                   7FFFFDA13028 ? 000000001 ?
                                                   000000001 ? 000000002 ?
ksedst1()+98         call     skdstdst()           000000000 ? 000000000 ?
                                                   7FFFFDA13028 ? 000000001 ?
                                                   000000000 ? 000000002 ?
ksedst()+34          call     ksedst1()            000000000 ? 000000001 ?
                                                   7FFFFDA13028 ? 000000001 ?
                                                   000000000 ? 000000002 ?
dbkedDefDump()+2741  call     ksedst()             000000000 ? 000000001 ?
                                                   7FFFFDA13028 ? 000000001 ?
                                                   000000000 ? 000000002 ?
........................

Searching for the issue on Metalink pointed to the below Document:-

ALERT Description and fix for Bug 8895202: ORA-1555 / ORA-600 [ktbdchk1: bad dscn] ORA-600 [2663] in Physical Standby after switchover (Doc ID 1608167.1)

As per metalink

In a Data Guard environment with Physical Standby (including Active Data Guard), invalid SCNs can be introduced in index blocks after a switchover.

Symptoms ORA-1555 / ORA-600 [ktbdchk1: bad dscn] / ktbGetDependentScn / Dependent scn violations as the block ITL has higher COMMIT SCN than block SCN. DBVERIFY reports the next error when the fix of Bug 7517208 is present; reference Note 7517208.8 for interim patches: itl[] has higher commit scn(aaa.bbb) than block scn (xx.yy) Page failed with check code 6056 There is NO DATA CORRUPTION in the block.

To Resolve

The fix of Bug 8895202 is the workaround.

Although the fix of Bug 8895202 is included in patchset 11.2.0.2 and later, the fix needs to be enabled by setting parameter _ktb_debug_flags = 8.

SQL> alter system set "_ktb_debug_flags"=8 scope=both sid='*';

System altered.

SQL> exit

If you are using Oracle version less than 11.2.0.2, then rebuilding index is the option, as we did for one of the client on 11.1.0.7 version.

One thing to note is –

In rare cases blocks healed by this fix may cause queries to fail with an ORA-600 [ktbgcl1_KTUCLOMINSCN_1] as described in Note 13513004.8 / Bug 13513004.

For more detail Metalink Doc 1608167.1


Filed under: 11gR1, 11gR2, ORa-600, Oracle Tagged: 11.2.0.3, ORA - 600, [ktbdchk1: bad dscn]

Primary on FileSystem and Standby on ASM

$
0
0

For one of the client, standby server went down. We had another standby server which was kept down for more than a month. Decision was taken to start the server and apply incremental SCN based backup on the standby database.

The standby was on ASM and the Primary on filesystem.Incremental backup was started from the SCN reported by below query

select min(fhscn) from x$kcvfh;

Once the backup completed, it was transferred to standby, standby was mounted (using the old controlfile), backups were cataloged and recovery performed using ‘recover database noredo’.

The recovery was going on, and was handed over to me. After the recovery completed I restored the latest controlfile from Primary and mounted the standby. At this point the controlfile was with the information of the filesystem as they were on the Primary side. The next step was to register everything we had on Standby side:

[oracle@oracle3:~ (db)]$ rman target /

Recovery Manager: Release 11.2.0.3.0 - Production on Thu Jan 8 04:57:01 2015

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

connected to target database: ABCD (DBID=1463580380, not open)

RMAN> catalog  start with '+DATA/adbc_oracle3/DATAFILE/';

searching for all files that match the pattern +DATA/adbc_oracle3/DATAFILE/

List of Files Unknown to the Database
=====================================
File Name: +data/adbc_oracle3/DATAFILE/TEST.256.844395985
File Name: +data/adbc_oracle3/DATAFILE/TEST.257.844397067
.................
.................
File Name: +data/adbc_oracle3/DATAFILE/TEST.416.865953683

Do you really want to catalog the above files (enter YES or NO)? YES
cataloging files...
cataloging done

List of Cataloged Files
=======================
File Name: +data/adbc_oracle3/DATAFILE/TEST.256.844395985
File Name: +data/adbc_oracle3/DATAFILE/TEST.257.844397067
................
...............
File Name: +data/adbc_oracle3/DATAFILE/TEST.416.865953683

RMAN> report schema;

RMAN-06139: WARNING: control file is not current for REPORT SCHEMA
Report of database schema for database with db_unique_name ABCD_ORACLE3

List of Permanent Datafiles
===========================
File Size(MB) Tablespace           RB segs Datafile Name
---- -------- -------------------- ------- ------------------------
1    0        SYSTEM               ***     /oradata/abcd_oracle2/datafile/o1_mf_system_825xkscr_.dbf
2    0        SYSAUX               ***     /oradata/abcd_oracle2/datafile/o1_mf_sysaux_825y451r_.dbf
3    0        TEST                 ***     /oradata/abcd_oracle2/datafile/o1_mf_test_825s84mw_.dbf
4    0        TEST                 ***     /oradata2/abcd_oracle2/datafile/o1_mf_test_8dr1v332_.dbf
................
................
................
147  0        TEST                   ***     /oradata4/abcd_oracle2/datafile/o1_mf_test_b8k8hcrh_.dbf
148  0        TEST                   ***     /oradata4/abcd_oracle2/datafile/o1_mf_test_b8k8hdhf_.dbf
149  0        TEST                   ***     /oradata4/abcd_oracle2/datafile/o1_mf_test_b8k8hf6o_.dbf
150  0        TEST                   ***     /oradata4/abcd_oracle2/datafile/o1_mf_test_b8k8hg1j_.dbf
151  0        TEST                   ***     /oradata4/abcd_oracle2/datafile/o1_mf_test_bb318bhs_.dbf
152  0        TEST                   ***     /oradata4/abcd_oracle2/datafile/o1_mf_test_bb318cff_.dbf
153  0        TEST_INDEX             ***     /oradata4/abcd_oracle2/datafile/o1_mf_test_index_bb318pmy_.dbf
154  0        TEST_NOLOGGING         ***     /oradata3/abcd_oracle2/datafile/o1_mf_test_nolog_bbm2s7vk_.dbf
155  0        TESTINDEX             ***     /oradata3/abcd_oracle2/datafile/o1_mf_test_index_bbm2z7nv_.dbf
156  0        PERFSTAT             ***     /oradata3/abcd_oracle2/datafile/o1_mf_perfstat_bbm312pf_.dbf

List of Temporary Files
=======================
File Size(MB) Tablespace           Maxsize(MB) Tempfile Name
---- -------- -------------------- ----------- --------------------
3    15104    TEMP                 32767       /oradata4/abcd_oracle2/datafile/o1_mf_temp_b633ppbr_.tmp
4    25600    TEMP                 32767       /oradata4/abcd_oracle2/datafile/o1_mf_temp_b633ppcf_.tmp

After catalog completed, it was time to switch database to copy and it failed with below error

RMAN> switch database to copy;

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of switch to copy command at 01/08/2015 05:00:25
RMAN-06571: datafile 151 does not have recoverable copy

After some analysis found, it was due to missing datafile on standby. The datafile was created on Primary after the standby was down and the recovery using the incremental backup was done with the older controlfile which had no information about the new datafiles.

Datafile # 151 – 156 were missing, so cataloged the backup pieces again as the controlfile was restored, and started restoring the datafile

RMAN> restore datafile 151;

Starting restore at 08-JAN-15
using channel ORA_DISK_1

channel ORA_DISK_1: starting datafile backup set restore
channel ORA_DISK_1: specifying datafile(s) to restore from backup set
channel ORA_DISK_1: restoring datafile 00151 to /oradata4/ABCD_ORACLE2/datafile/o1_mf_ct_bb318bhs_.dbf
channel ORA_DISK_1: reading from backup piece +BACKUP/ABCD_ORACLE3/backupset/restore/incr_standby_5sps58rd_1_1

channel ORA_DISK_1: piece handle=+BACKUP/ABCD_ORACLE3/backupset/restore/incr_standby_5sps58rd_1_1 tag=TAG20150107T192825
channel ORA_DISK_1: restored backup piece 1
channel ORA_DISK_1: restore complete, elapsed time: 00:19:05
Finished restore at 08-JAN-15

RMAN>

After the restoration completed, report schema showed

RMAN> report schema;

RMAN-06139: WARNING: control file is not current for REPORT SCHEMA
Report of database schema for database with db_unique_name adbc_oracle3

List of Permanent Datafiles
===========================
File Size(MB) Tablespace           RB segs Datafile Name
---- -------- -------------------- ------- ------------------------
1    0        SYSTEM               ***     /oradata/ABCD_ORACLE2/datafile/o1_mf_system_825xkscr_.dbf
2    0        SYSAUX               ***     /oradata/ABCD_ORACLE2/datafile/o1_mf_sysaux_825y451r_.dbf
.................
151  30720    TEST                  ***     +DATA/adbc_oracle3/datafile/test.417.868424659
152  30720    TEST                  ***     +DATA/adbc_oracle3/datafile/test.418.868424659
................

Tried to perform “switch database to copy” which again failed with the same error “RMAN-06571: datafile 151 does not have recoverable copy” . At this point I though to use “switch datafile to copy” for which generated dynamic sql from primary and ran in to standby.

Generated switch command sql from Primary :-

SQL> select 'switch datafile '||file#||' to copy;' from v$datafile;

[oracle@oracle3:~ (db)]$ vi swtch_copy.rman
[oracle@oracle3:~ (db)]$ rman target /

Recovery Manager: Release 11.2.0.3.0 - Production on Thu Jan 8 06:09:46 2015

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

connected to target database: ABCD (DBID=1463580380, not open)

RMAN> @swtch_copy.rman

RMAN> switch datafile 2 to copy;
using target database control file instead of recovery catalog
datafile 2 switched to datafile copy "+DATA/adbc_oracle3/datafile/sysaux.277.844418721"

............................
...........................
...........................
RMAN> switch datafile 149 to copy;
datafile 149 switched to datafile copy "+DATA/adbc_oracle3/datafile/ct.415.865953681"

RMAN> switch datafile 156 to copy;
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of switch to copy command at 01/08/2015 06:28:54
RMAN-06571: datafile 156 does not have recoverable copy

RMAN>
RMAN> **end-of-file**

Performed ‘recover database noredo’ again and this time it was pretty quick and then tried recover standby database

[oracle@oracle3:~/working/anand (abcd)]$ sqlplus / as sysdba

SQL*Plus: Release 11.2.0.3.0 Production on Thu Jan 8 06:31:38 2015

Copyright (c) 1982, 2011, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Automatic Storage Management, OLAP, Data Mining
and Real Application Testing options

SQL> recover standby database;
ORA-00279: change 38328244436 generated at 01/07/2015 19:28:32 needed for thread 1
ORA-00289: suggestion : +FRA
ORA-00280: change 38328244436 for thread 1 is in sequence #98501


Specify log: {=suggested | filename | AUTO | CANCEL}
^C
ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below
ORA-01152: file 1 was not restored from a sufficiently old backup
ORA-01110: data file 1: '+DATA/adbc_oracle3/datafile/system.357.844454911'


Did few changes in DGMGRL parameters and enabled standby configuration. After few hours, standby was insync with Primary. Stopped the MRP , dropped the standby
redo logfiles as they showed filesystem, and created on ASM. Opened the standby in read only mode and started the MRP.


Filed under: 10gR2, 11gR1, 11gR2, ASM, Data Guard, General, Oracle, standby Tagged: datafile does not have recoverable copy, Primary on Filesystem, recover database noredo, RMAN -06571, standby on ASM, switch database to copy
Viewing all 147 articles
Browse latest View live