Showing posts with label Oracle Enterprise Manager 10g. Show all posts
Showing posts with label Oracle Enterprise Manager 10g. Show all posts

Monday, October 8, 2007

Oracle Enterprise Manager 10g Diagnostics : EMDiag

Recently I had a issue where lot of targets were reporting "Status Pending" after blackout was over during scheduled maintenance. This happens sometimes due to various reasons. If its only 1 or 2 targets you can use the method I gave in my post

But if you many targets having issues, it is recommended to use EMDiagKit providded by Oracle Support to diagnose & fix the issues. Also it is good to know this utility if you are managing a large EM 10g environment. Also this will be handy when you deal with Oracle Support to log a ticket regarding any issue you are facing with EM repository.

What is EMDiag
The EMDiag kit is a troubleshooting tool that will enable you to extract necessary troubleshooting data from the EM Repository Schema.

Go through the Note:421638.1:EMDiagkit - Overview on Metalink if you are first time using this script.

I had this issue where more than 100 targets were reporting issues of status. to fix this I applied the following action plan :

EMDiag Installation
===============
1. Set ORACLE_HOME= the path to repository database.
2. Set ORACLE_SID=sid_repository
3. Unzip the file I sent you, emdiag.zip, into the directory /emdiag
4. cd to /emdiag/cfg and create the file repvfy.cfg by copying the
template
cd /emdiag/cfg
cp repvfy.cfg.template repvfy.cfg
5. Edit repvfy.cfg and change the following lines:

#ora_tns=my_tns_alias
level=2
to
ora_tns=
level=9

If you don't have an alias you can enter the value of the following property from the
OMS_HOME/sysman/config/emoms.properties file: oracle.sysman.eml.
mntr.emdRepConnectDescriptor

Make sure you remove all the escape characters: \

6. Make the files in /emdiag/bin executable

chmod +x /emdiag/bin/*

7. Run the install command:

cd \emdiag\bin
./repvfy install

VERIFY the Installation
==================
SQL> set serveroutput on
SQL> exec mgmt_diag.validate;
Repository version : 10.2.0.2.0
Repository type : CENTRAL
MGMT_DIAG version : 2007.0331
Number of repository tests : 285
Total enabled repository tests: 265
Number of object tests : 135
Total enabled object tests : 124

PL/SQL procedure successfully completed.

SQL> SELECT mgmt_diag.version FROM dual;

VERSION
----------
2007.0331

Run the Repository verification script to diagnose the issues
===========================================
[oracle@myserver bin]$ ./repvfy verify -level 9

Please enter the SYSMAN password:

Following the out of the above command
==============================
verifyAGENTS
101. Active Agents with clock-skew problems: 1
105. Agents not uploading any data: 3
600. Agents running in the future: 1
verifyASLM
100. Beacons with tests running behind schedule: 17
verifyBLACKOUTS
101. Active blackouts with no more targets in blackout: 1
verifyCREDENTIALS
100. Credential sets not pointing to the latest host metadata version: 2
verifyECM
100. Missing ECM snapshot metadata: 4
702. Generic snapshot delete backlog: 3
verifyJOBS
100. Job backlog: 1
105. Job Executions with no valid steps: 6
111. Stale DiscardState jobs: 91
112. Active executions without active steps: 6
201. Duplicate DiscardState jobs: 13
202. System jobs running for more than 24hours: 24
701. Orphaned job output records: 67777
verifyMETRICS
002. Disabled repository collections: 1
004. Duplicate metric threshold definitions: 1
700. Outstanding cleared metric errors: 15
704. Metric errors for mismatched category properties: 4
800. Cleared cluster repository metric failures: 2
verifyPOLICIES
002. Mismatches between Violations and Availability: 2
201. Oscillating Policy/Metric violations: 1
verifyPROVISIONING
600. Unconfigured software library: 1
verifyREPOSITORY
002. Missing DBMS_JOBS: 3
100. Missing RAW partitions: 6
101. Invalid objects in repository schema: 1
601. Database Timezone mismatch: 1
700. Partitioned tables with too many partitions: 1
804. PL/SQL packages without a package body: 1
805. Unanalyzed repository tables: 4
verifySYSTEM
700. Duplicate OMS parameters: 1
verifyTARGETS
102. Targets with Response Metric Errors: 1
106. Targets in questionnable state: 117 - This is the problem
109. Target types without host credential sets defined: 4
111. Targets not uploading any data: 11
602. Groups without members: 4
701. Duplicate targets from decomissioned Agents: 1
703. Unresolved deleted targets left-overs: 169
801. Unconfigured targets: 3
verifyUSERS
200. EM Accounts not granted MGMT_USER: 1
700. Non existing DB users for GC administrators: 1

Now we want to fix the Issue as per the code 106:
========================================
[oracle@myserver bin]$ ./repvfy verify targets -test 106 -fix

Please enter the SYSMAN password

Following is the output of the above command
===================================
verifyTARGETS
106. Targets in questionnable state: 117
Fix: 108 (Difference=9)

wait for few minutes and check the Enterprise Manager 1og console & you should see that the "Status" has been fixed for 108 targets mentioned as per the script. Well before you run to fix this, make sure that you review the detail diagnostics output by giving following command :

./repvfy verify -level 9 -detail

This will give you list of all targets under each category for your to review before you try to understand what fix to apply.

It is good to get familiar with this EMDiagkit on TEST environment before using it on production server.


Thursday, September 20, 2007

Resolving Issues from Oracle Enterprise Manager 10g Blackouts

There are instances when you put all or some targets in blackout using EM Grid Control 10g. But sometimes even after the blackout is ended either manually or expired itself. Some or all Targets shows up "Status Pending" on EM Grid Control. When you try to check the status of Agent on the Targets, they seems to be running fine.

Now workaround to fix this issue if you have few targets having this problem :

  • Login to Target you want to fix the Status "Status Pending"
  • Source the required environment variables
  • run command "emctl stop agent"
  • Goto $ORACLE_HOME/sysman/emd
  • mv or rm lastupld.xml and agntstmp.txt - mv to different file is also fine if you dont want to do rm !
  • run "emctl clearstate agent" - it will show following :
    • Oracle Enterprise Manager 10g Release 10.2.0.2.0.
      Copyright (c) 1996, 2006 Oracle Corporation. All rights reserved.
      EMD clearstate completed successfully
  • Start the agent using command "emctl start agent" - it will show following
    • Oracle Enterprise Manager 10g Release 10.2.0.2.0.
      Copyright (c) 1996, 2006 Oracle Corporation. All rights reserved.
      Starting agent ............... started.
  • Goto EM Grid Control and check the Status of the Target for which you performed above actions. The Target should show status "Up" with Green UP Arrow :)
The above procedure holds good generally with Agent status issues after blackouts. But if you have 100+ targets having this issue ? Then doing the above will take lot of time as well as effort.

So there comes a handy utility by Oracle called EMDiag Kit

EMDiag Kit :
Available to download from Metalink Note : 421053.1 - Downloading/installing/Using the EM Diagnostic Kit

I will post my experience with EMDiag Kit and how well it could help you resolve issues with EM Grid Control effectively. Specially when you are logging Service Request with Oracle Support, this will save a lot of time if you have it already installed!

As always said "Please use caution while using any information published in blog to apply in your production or test databases/environments. Make sure you understand what actions you are performing. Always have backups before performing any actions"