Northern California Oracle Users Group

Home » NoCOUG » Solve the Oracle Database murder mystery and win two free tickets to the first ever YesSQL Summit

Solve the Oracle Database murder mystery and win two free tickets to the first ever YesSQL Summit

You may remember this children’s song from kindergarten or you can listen to this YouTube video:

“Ten green bottles hanging on the wall
Ten green bottles hanging on the wall
And if one green bottle should accidentally fall
There’ll be nine green bottles hanging on the wall.”

In this Oracle Database murder mystery, there were no green bottles left hanging on the wall after the first bottle fell. Post your solution in the Comments below. The prize for the best solution that fits all the facts is two free tickets to the first ever YesSQL Summit at the Oracle conference center in Redwood City on January 26 and 27.

It was a beautiful spring day. Popcorn was popping on the apricot tree. The time was exactly 9:12:00 AM PST. A database user noticed that her favorite database was down and called her favorite DBA—let’s call him Jack—for help.

Jack jumped to it and restarted the database lickety-split. Then disaster struck! The nine other databases on that database server—a Linux box with NetApp storage—crashed like bowling pins!

An unseen hand restarted all the databases immediately but the damage was done. Jack was dragged to the DBA interrogation chamber—the DBA manager’s office—and made to sit on the hot seat. It was a sunny day and the sun was streaming in through the plate glass windows, which explains why the seat was so hot. Besides, the air-conditioning was not working that day.

“WHAT HAVE YOU DONE,” bellowed the furious DBA manager. “I was only trying to help,” said poor Jack.

“HELP!? DO YOU CALL THAT HELPING!?” bellowed the furious DBA manager. The database alert logs were examined. The first database log showed that someone had used the command “STARTUP FORCE” at precisely 9:12:00 AM PST.

“DID YOU DO THAT!? DID YOU DO THAT!?” bellowed the furious DBA manager. “Yes, I did that,” said poor Jack, “but I was only trying to help.” A single tear slowly streamed down Jack’s cheek.

“HELP!? DO YOU CALL THAT HELPING!?” bellowed the furious DBA manager, unmoved by Jack’s obvious distress. The remaining database alert logs were examined. Each of them showed that someone had used the command “SHUTDOWN IMMEDIATE” followed by “STARTUP” right after the first database was restarted by Jack. “DID YOU DO THAT!? DID YOU DO THAT!?” bellowed the furious DBA manager.

“I didn’t do any of that,” protested poor Jack.

Let us draw the curtain of charity over the rest of the scene. If you believe Jack’s protestations of innocence, figure out how and why nine databases were mysteriously stopped and restarted. Post your solution in the Comments below. The prize for the best solution that fits all the facts is two free tickets to the first ever YesSQL Summit at the Oracle conference center in Redwood City on January 26 and 27.

Happy sleuthing and best wishes for 2016.

Advertisements

34 Comments

  1. Narendra says:

    Could Jack have used a bespoke script, incorrectly, to restart the database? That script could have been created to restart all databases, which could explain why all databases were restarted.

    Like

  2. Narendra says:

    Another possibility (and I have not tested this) could be that all these databases were configured in Oracle Restart and registered as dependent resources on the one database that Jack restarted.

    Like

    • nocoug says:

      The remaining database alert logs were examined. Each of them showed that someone had used the command “SHUTDOWN IMMEDIATE” followed by “STARTUP” right after the first database was restarted by Jack.

      Like

  3. Jian says:

    “STARTUP FORCE” will trigger “SHUTDOWN IMMEDIATE” first then “STARTUP”, That mean Jack accesident to shutdown all 10 database and restart one.

    Like

  4. Could there be a possibility when Jack issued “Startup Force”. OS/Storage level clustering recognized that as a failure whole package/All databases were restarted to relocate/fix the issue?

    Like

    • nocoug says:

      A database user noticed that her favorite database was down and called her favorite DBA—let’s call him Jack—for help. Jack jumped to it and restarted the database lickety-split.

      Like

      • Rajan Jagtap says:

        Jack has not followed the protocol if he found the Database is down ,he should have informed the DBA manager first, Then investigate the matter what made the database suddenly crash, and without investigating the issue he initiated the commands. also if it was a database issue or OS issue.

        Like

  5. Maheshwar says:

    One possibility is Netapp storage system LUNs might not be available as disks on the linux machine for some time and brought online after 9:12am.

    Like

  6. Maheshwar says:

    Please provide all alertlog of all databases running on linux machine 🙂

    Like

    • nocoug says:

      The first database log showed that someone had used the command “STARTUP FORCE” at precisely 9:12:00 AM PST. The remaining database alert logs were examined. Each of them showed that someone had used the command “SHUTDOWN IMMEDIATE” followed by “STARTUP” right after the first database was restarted by Jack.

      Like

  7. Anil says:

    If Startup force command is given While the instances are up, then it leads to immediate shutdown of the database.

    Like

  8. Chaitanyag says:

    What version of database is that? Is it 12c? If that is 12c then the CDB might be closed instead of a PDB..

    Like

    • nocoug says:

      The prize for the best solution that fits all the facts is two free tickets to the first ever YesSQL Summit at the Oracle conference center in Redwood City on January 26 and 27.

      Like

  9. Elamaran says:

    Jack proabbaly shutdown the container DB…..

    Like

  10. Elamaran says:

    Should have saved the pluggable databases state( SAVE STATE), so that when the CDB restarted ,the PDBS also gets into the open mode after the CDB comes up.No manual intervention in this case.

    Like

  11. Rajan Jagtap says:

    Jack has not followed the protocol if he found the Database is down ,he should have informed the DBA manager first, Then investigate the matter what made the database suddenly crash, and without investigating the issue he initiated the commands. also if it was a database issue or OS issue.

    Like

  12. John says:

    The database that Jack restarted is the NetApp SMO repository. Once it restarted, the scheduled nightly cold backups for the other databases kicked in.

    Like

  13. DBA RJ says:

    In my opinion, all those databases share the same Grid Infrastructure.

    Each DB is a different resource and all the other 9 databases have a HARD dependency for START and STOP with the DB that Jack shut down.
    Let’s name the DB’s as DB1,DB2,DB3,…,DB10. Let’s say Jack touched in the DB1 .. the other 9 DB’s had as configuration:

    START_DEPENDENCIES=hard(ora.DB1.db)
    STOP_DEPENDENCIES=hard(ora.DB1.db)

    So when Jack issued STARTUP FORCE in the DB1, the GI would run “SHUTDOWN IMMEDIATE” on the others as soon as the DB1 is down and then “STARTUP” on the others as soon as the DB1 is up.

    Regards,

    RJ

    Like

    • nocoug says:

      In this scenario, all databases should have been unavailable at 9:12 AM not just DB1. However, the story implies that DB2,DB3,…,DB10 were available at 9:12 AM even though DB1 was unavailable at the time.

      Like

      • DBA RJ says:

        Well, in this case the CRS may have had a delay to realize that the resource of DB1 was offline. This is controlled by CHECK_INTERVAL parameter. It should be a number of seconds big enough to generate some gap between those unavailabilities.

        Like

  14. Narendra says:

    Could the database that Jack restarted have an AFTER STARTUP database trigger, which explicitly restarts all other databases (using SHUTDOWN IMMEDIATE, which is seen in the alert logs of individual databases) ?

    Like

    • nocoug says:

      The Occam’s Razor principle can be paraphrased as follows: “We consider it a good principle to explain the phenomena by the simplest hypothesis possible.” If Jack was not lying, there must have been some “trigger” (for lack of a better word). The question is what was the trigger? Could the databases have been shut down by “seven funny little men, each one not more than three feet ten, ex horse-race jockeys, all of them” who were trying to get Jack into trouble? That’s an unlikely scenario, to say the least, because we don’t see it happen often. By the same token, how often is is that database administrators create AFTER STARTUP triggers to restart other databases?

      The best solution to the murder mystery would not only be simple and plausible but would also take into account all the clues in the story. For example, what, if anything, does Netapp storage have to do with this?

      Like

  15. Lisandro Fernigrini says:

    As someone else mentioned, he was probablt logged on to the CDB and did an startup force, thus shutting down the CDB and all nine remaining PDB’s, and then starting the CDB and 10 PDBs.

    Like

  16. was it 12c Container database Jack was playing with ? CDB startup will cause all other PDB to be restarted.

    Like

  17. Arian says:

    Since the startup force will crash the database, (shutdown abort), there were nfs locks left. So Jack had to run a script to release the locks. This script included shutting down and starting up all databases on the host.

    Like

  18. […] noticing which means that they were not even being monitored correctly. In fact, the yet unsolved NoCOUG murder mystery is based on my time there. Every time we tested disaster recovery there were frustrating glitches. […]

    Like

  19. Anju Garg says:

    It could be that these databases are in cascaded standby configuration in max protection mode,Jack executed startup force command on the standby database lowest in the chain which caused its primary to shutdown and restart. Since its .primary is standby to the next database in the chain, its shutdown / restart further caused the next database in the chain to shutdown and restart and so on.

    Like

  20. Chuck Firment says:

    I’m surprised no one has mentioned the possibility of a hardware failure caused by excessive heat. AC was not working, it was a hot day, and sunlight was streaming in through the plate-glass windows.

    It leads one to wonder… if a NetApp device overheats, what could happen?

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Next NoCOUG meeting
August 20 at Chevron, San Ramon

Follow me on Twitter

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 280 other followers

%d bloggers like this: