Monday, October 8, 2007

Oracle Enterprise Manager 10g Diagnostics : EMDiag

Recently I had a issue where lot of targets were reporting "Status Pending" after blackout was over during scheduled maintenance. This happens sometimes due to various reasons. If its only 1 or 2 targets you can use the method I gave in my post

But if you many targets having issues, it is recommended to use EMDiagKit providded by Oracle Support to diagnose & fix the issues. Also it is good to know this utility if you are managing a large EM 10g environment. Also this will be handy when you deal with Oracle Support to log a ticket regarding any issue you are facing with EM repository.

What is EMDiag
The EMDiag kit is a troubleshooting tool that will enable you to extract necessary troubleshooting data from the EM Repository Schema.

Go through the Note:421638.1:EMDiagkit - Overview on Metalink if you are first time using this script.

I had this issue where more than 100 targets were reporting issues of status. to fix this I applied the following action plan :

EMDiag Installation
===============
1. Set ORACLE_HOME= the path to repository database.
2. Set ORACLE_SID=sid_repository
3. Unzip the file I sent you, emdiag.zip, into the directory /emdiag
4. cd to /emdiag/cfg and create the file repvfy.cfg by copying the
template
cd /emdiag/cfg
cp repvfy.cfg.template repvfy.cfg
5. Edit repvfy.cfg and change the following lines:

#ora_tns=my_tns_alias
level=2
to
ora_tns=
level=9

If you don't have an alias you can enter the value of the following property from the
OMS_HOME/sysman/config/emoms.properties file: oracle.sysman.eml.
mntr.emdRepConnectDescriptor

Make sure you remove all the escape characters: \

6. Make the files in /emdiag/bin executable

chmod +x /emdiag/bin/*

7. Run the install command:

cd \emdiag\bin
./repvfy install

VERIFY the Installation
==================
SQL> set serveroutput on
SQL> exec mgmt_diag.validate;
Repository version : 10.2.0.2.0
Repository type : CENTRAL
MGMT_DIAG version : 2007.0331
Number of repository tests : 285
Total enabled repository tests: 265
Number of object tests : 135
Total enabled object tests : 124

PL/SQL procedure successfully completed.

SQL> SELECT mgmt_diag.version FROM dual;

VERSION
----------
2007.0331

Run the Repository verification script to diagnose the issues
===========================================
[oracle@myserver bin]$ ./repvfy verify -level 9

Please enter the SYSMAN password:

Following the out of the above command
==============================
verifyAGENTS
101. Active Agents with clock-skew problems: 1
105. Agents not uploading any data: 3
600. Agents running in the future: 1
verifyASLM
100. Beacons with tests running behind schedule: 17
verifyBLACKOUTS
101. Active blackouts with no more targets in blackout: 1
verifyCREDENTIALS
100. Credential sets not pointing to the latest host metadata version: 2
verifyECM
100. Missing ECM snapshot metadata: 4
702. Generic snapshot delete backlog: 3
verifyJOBS
100. Job backlog: 1
105. Job Executions with no valid steps: 6
111. Stale DiscardState jobs: 91
112. Active executions without active steps: 6
201. Duplicate DiscardState jobs: 13
202. System jobs running for more than 24hours: 24
701. Orphaned job output records: 67777
verifyMETRICS
002. Disabled repository collections: 1
004. Duplicate metric threshold definitions: 1
700. Outstanding cleared metric errors: 15
704. Metric errors for mismatched category properties: 4
800. Cleared cluster repository metric failures: 2
verifyPOLICIES
002. Mismatches between Violations and Availability: 2
201. Oscillating Policy/Metric violations: 1
verifyPROVISIONING
600. Unconfigured software library: 1
verifyREPOSITORY
002. Missing DBMS_JOBS: 3
100. Missing RAW partitions: 6
101. Invalid objects in repository schema: 1
601. Database Timezone mismatch: 1
700. Partitioned tables with too many partitions: 1
804. PL/SQL packages without a package body: 1
805. Unanalyzed repository tables: 4
verifySYSTEM
700. Duplicate OMS parameters: 1
verifyTARGETS
102. Targets with Response Metric Errors: 1
106. Targets in questionnable state: 117 - This is the problem
109. Target types without host credential sets defined: 4
111. Targets not uploading any data: 11
602. Groups without members: 4
701. Duplicate targets from decomissioned Agents: 1
703. Unresolved deleted targets left-overs: 169
801. Unconfigured targets: 3
verifyUSERS
200. EM Accounts not granted MGMT_USER: 1
700. Non existing DB users for GC administrators: 1

Now we want to fix the Issue as per the code 106:
========================================
[oracle@myserver bin]$ ./repvfy verify targets -test 106 -fix

Please enter the SYSMAN password

Following is the output of the above command
===================================
verifyTARGETS
106. Targets in questionnable state: 117
Fix: 108 (Difference=9)

wait for few minutes and check the Enterprise Manager 1og console & you should see that the "Status" has been fixed for 108 targets mentioned as per the script. Well before you run to fix this, make sure that you review the detail diagnostics output by giving following command :

./repvfy verify -level 9 -detail

This will give you list of all targets under each category for your to review before you try to understand what fix to apply.

It is good to get familiar with this EMDiagkit on TEST environment before using it on production server.