Thursday, December 27, 2012

How to recreate the AWR Repository

last week we were upgrading our test databases from 11.2.0.2 to 11.2.0.3
All most all of them went fine but one wouldn't complete the upgrade. It didn't show
an error it just hung and that was it.
We stop the upgrade and downgrade the database back to 11.2.0.2

The name of database got me thinking as this was a database where awr reports
took forever or couldn't even be started. Also purging the awr repository cause the
database to hang.
Oracle suggested to upgrade to 11.2.0.3 as this would fix te problem but we got the problem that
the database hung during upgrade.

The real solution was to recreate the awr reporsitory.
and this is how it goes in an Rac cluster.

First diabled AWR statistics gathering parameter
SQL> alter system set statistics_level=BASIC scope=both sid='*';
ERROR at line 1:
ORA-02097: parameter cannot be modified because specified value is invalid
ORA-00830: cannot set statistics_level to BASIC with auto-tune SGA enabled

Yep you have to set sga_target or memory_target to 0.
so create a pfile from spfile and adjust the following parameters
SQL>create pfile from spfile;
this create as pfile in $ORACLE_HOME/dbs

set the following parameters to a appropriate value
shared_pool_size =
db_cache_size =
java_pool_size =
large_pool_size =
sga_target=0
memory_target=0
statistics_level=basic
cluster_database=false

shutdown database and startup in resticted mode and with the pfile in $ORACLE_HOME/dbs
SQL>shutdown immediate;
SQL>startup restrict;

Now drop an recreate the awr objects.

SQl>@/rdbms/admin/catnoawr.sql
SQL>alter system flush shared_pool;
SQl>@/rdbms/admin/catawrtb.sql
SQL>@/rdbms/admin/execsvrm.sql  # Only if database is 11g

recompile the invalid objects
SQL>@/rdbms/admin/utlrp.sql 

SQL>shutdown immediate;

Now start the database with the spfile with the old values;

After that we started the upgrade to 11.2.0.3 and that went fine without any problem.

That it.

CRS-5005: IP Address: is already in use in the network

After a change in ASM process parameter i restared the crs stack.
Everything went fine... almost the following resource are offline

ora.srvtst01.vip
ora.LISTENER.lsnr
ora.srvtst01.LISTENER_SRVTST01.LSNR

when i tried start the ora.srvtst01.vip i got the following error
$ crsctl start res ora.srvtst01.vip
CRS-2672: Attempting to start 'ora.srvtst01.vip' on 'srvtst01'
CRS-5017: The resource action "ora.srvtst01.vip start" encountered the following error:
CRS-5005: IP Address: 10.110.140.173 is already in use in the network
 CRS-2674: Start of 'ora.srvtst01.vip' on 'srvtst01' failed
CRS-2679: Attempting to clean 'ora.srvtst01.vip' on 'srvtst01'
CRS-2681: Clean of 'ora.srvtst01.vip' on 'srvtst01' succeeded
CRS-2632: There are no more servers to try to place resource 'ora.srvtst01.vip' on that would satisfy its placement policy
CRS-4000: Command Start failed, or completed with errors.
That is strange the vip ip address should not be in use off course
$ nslookup srvtst01
Server:         145.72.247.10
Address:        145.72.247.10#53

Name:   srvtst01.italie.nl
Address: 10.110.140.73

root@oesv7040:CRS:/root
$ nslookup srvtst01-vip.italie.nl.
Server:         145.72.247.10
Address:        145.72.247.10#53

Name:   srvtst01-vip.italie.nl
Address: 10.110.140.173

As you see the ip address for the host and vip are different. When a do a SSH whith the vip ip 10.110.140.173  i  connect to the host srvtst01 what is good.When i look with ifconfig command i don't see the  IP address of the vip.
$ ifconfig
eth0      Link encap:Ethernet  HWaddr 00:16:3E:70:40:01
          inet addr:10.110.140.73  Bcast:10.233.240.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:43716169 errors:0 dropped:0 overruns:0 frame:0
          TX packets:10090483 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:8036689632 (7.4 GiB)  TX bytes:98860268867 (92.0 GiB)

eth1      Link encap:Ethernet  HWaddr 00:16:3E:70:40:02
          inet addr:172.29.96.64  Bcast:172.29.96.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:19477710 errors:0 dropped:0 overruns:0 frame:0
          TX packets:9399 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:2589277692 (2.4 GiB)  TX bytes:9266153 (8.8 MiB)

eth1:1    Link encap:Ethernet  HWaddr 00:16:3E:70:40:02
          inet addr:169.254.168.11  Bcast:169.254.255.255  Mask:255.255.0.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:25295206 errors:0 dropped:0 overruns:0 frame:0
          TX packets:25295206 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:37308951844 (34.7 GiB)  TX bytes:37308951844 (34.7 GiB)
  As you see i'm missing the vip ip address. I only see the host ip address.
To correct this i've to make a network alias and adjust it to ETH0 as it shoud be.

$ ifconfig eth0:1 10.110.140.173 up
$ ifconfig
eth0      Link encap:Ethernet  HWaddr 00:16:3E:70:40:01
          inet addr:10.233.240.73  Bcast:10.233.240.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:43718610 errors:0 dropped:0 overruns:0 frame:0
          TX packets:10090883 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:8036957710 (7.4 GiB)  TX bytes:98860318155 (92.0 GiB)

eth0:1    Link encap:Ethernet  HWaddr 00:16:3E:70:40:01
          inet addr:10.110.140.173  Bcast:10.255.255.255  Mask:255.0.0.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

As you see there is now a eth0:1 with the ip address of the vip.
Now i start the vip
$ crsctl start res ora.srvtst01.vip
CRS-2672: Attempting to start 'ora.srvtst.vip' on 'srvtst01'
CRS-2676: Start of 'ora.srvtst01.vip' on 'srvtst7040' succeeded
Why O why the eth0:1 was missing is still a ridle.

Thats it

RMAN-20052: invalid datafile create SCN

I ran last week into a bug in rman. Some backup's didn't run anymore.
The error was

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03015: error occurred in stored script arch_backup
RMAN-03002: failure of sql command at 12/24/2012 11:10:08
RMAN-03014: implicit resync of recovery catalog failed
RMAN-03009: failure of full resync command on default channel at 12/24/2012 11:10:08
RMAN-20052: invalid datafile create SCN

There is a bug written for this.
Resync Catalog returns RMAN-20052: invalid datafile create SCN error [ID 343213.1]

The solution that is given missed 1 step.
They suggest you should unregister the database and then register the database.
When more backups are in the controlfile than in the catalog you will wipe information with this action.

How to check this

1. Look what the oldest backup entry is in the controlfile.
   start rman
   rman> connect target;
   rman> list backup of summary;

   look for the oldest backup entry.

2.check what the oldest backup entry is in the catalog
            sqlplus owner_rmn@rman_catalog
select incr_level, min(to_char(completion_time,'DD-MM-YYYY/HH24:mi:ss')) as end_time_backup
from owner_rmn.bs where status='A'
and db_key in (select db_key from owner_rmn.dbinc where db_name = UPPER('&dbnaam') )
group by incr_level;
è Look what the oldest backup entry is.
 
3. Compare these entries,
    if the catalog contains older backup entries than the control file
    then action 3a
    else action 3b.

Action 3a:
Buy time by making archive backup and full backup connected to the rman catalog only.
The solution is to make insert commando’s before the unregister, based on the current catalog info to re-insert the backupsets you will lose with the unregister and register action.

Action 3b:
Else: do an unregister and a register action:
SQL> select 'EXECUTE dbms_rcvcat.unregisterdatabase(' || db_key||','||dbid||');' from rc_database where  name = '&dbname';
è You will be asked for the database name and you will execute the outcome of the query.
exit

<SID of the database, thus on the host were the database resides>
rman target / catalog owner_rmn@rman_catalog
register database;
show all;
è Check if everything is ok

That's it.