Grid control therapy

I've had a lot of bad experiencing installing and managing the OEM grid control.  Today has been the worst day ever and - as an alternative to self-mutilation - I decided that I would document my woes as they occur as a sort of therapy

<p>oem blog</p>  

Here's the background to my latest woes:

On Monday I realized that the database that holds my test grid control repository was severly corrupt: a bunch of ora-600s with internal codes that were associated with rollback segment corruption.  Since the DB had no "real" data and I was lacking a recent backup I decided to rebuild the (RAC cluster) database.

The rebuild was no drama, but since this DB had my grid control repository I decided I'd better re-install grid control as well.  That's when everything went south.

The installation would proceed normally without errors until it reported errors configuring the agent. 

Logs showed that the agent was installed apparently OK,but that the management server was no-where to be found.  opmn status showed every other friggin process but not even an entry for the OMS.  The SYSMAN schema had not even been properly installed.  Somehow, installing the OMS for the second time was screwing things up.

[oracle@mel601416 ~]$ emctl status oms
Oracle Enterprise Manager 10g Release 10.2.0.1.0
Copyright (c) 1996, 2005 Oracle Corporation.  All rights reserved.
Oracle Management Server is not functioning because of the following reason:
Unexpected error occurred. Check error and log files.

There's no trace files indicating OMS has ever really started (No emoms.trc for instance).

The only logs are emctl logs - created when I do a status request - with AgentStatus.pm: Unknown command encountered.

So next I tried tearing everything down and installing grid control with a new database, thinking that surely this would avoid the apparent "I don't need to install a DB" problem.  No luck.  No database created - no attempt to start OMS.

So I tried renaming the /etc/oraInst.loc file so my installer was unaware of all previous installs. Sigh.  Same result.   

I spent many hours scouring technet, metalink and googling.... eventually found http://forums.oracle.com/forums/thread.jspa?threadID=337544&start=15&tstart=0 in which a bunch of poor souls who have encountered this problem commiserate.  Suggestions from there include:

     
  • Remove any 'legacy' listeners (9i)
  •  
  • This mysterious sequence:

echo `find / -name libdb.so.2 -print` >> /etc/ld.so.conf
find . -name libdb.so.2 -print
.......
vi /etc/ldso.conf
- add result from find
ldconfig -v  (run as root)                   

     
  • clean up /etc/hosts (remove ipv6 and check hostname is exactly right)
  •  
  • turn off seLinux
  •  
  • remove all symlinks in the installation directory path
     

The last one really rang a bell with me, since only a few weeks ago I had in fact symlinked the oracle home to a new filesystem.  So....

Hooray!  Avoiding the symlinks allowed the OMS configuration to proceed.  Still had many hours of trying to install:  first had to remove SYSMAN and MGMT_VIEW schemas (see metalink note 358627.1).  Then I had mysterious communication failures during the final phase of opmn setup.  Sigh.  Being as it was close to midnight I blew it all away in the hope that a straight forward install would now work and (at about 1am) .... Kapla!  (Klingon for Victory: see http://www.khemorex-klinzhai.de/e/Hol/). 

Alls well that ends well?

This isn't the first time I've struggled with grid control install. My original attempt to install on a windows system never succeeded and while my previous linux based control worked fine most of the time, I still must have spend many hours fiddling with configuration files to make everything talk together properly.

I'm hardly an unbiased source - our Spotlight and Foglight products are sometimes seen to be competitive so you might want to take my opinions with a grain of salt.  However, I've become convinced that while Grid control and OEM undoubtedly reduce DBA overheads (though I think in some areas - particularly diagnostics and RAC OEM is way short of acceptable) the overhead of managing and configuration OEM/Grid is just too high.  And the reason is that the OEM stack is just way too complex a solution for the task at hand.  If you do a opmnctl status you'll see a list of at list eight seperate entities that interact to perform OMS services - and I bet that very few DBAs know what each of them are.   And that's before you add in the agents and internal web apps that perform the various OEM functions.

Compare the OEM/Grid implementation to the far simpler MySQLs LAMP based solution: the number of moving parts in the MySQL soluiton are far less than in OEM, and yet it appears to provide virtually the same functionality (for the DBA at least).

I can't help thinking that in the future every Oracle DBA is going to need to be an expert in the OEM software stack - and consequently an expert in J2EE, Apache, OC4J as well as OEM specifically - to be able to manage enterprise deployments. That's good news for the Oracle DBA job market - as automation reduces the overhead of managing the database, the overhead of managing the automation itself keeps us all in work :-).