Northern California Oracle Users Group

Simplicity Is Good!

by James Morle

This article is about the importance of appropriately simplistic architectures. I frequently get involved with the creation of full-stack architectures—in particular the architecture of the database platform. There are some golden rules when designing such systems, but one of the most important ones is to keep the design as simple as possible. This isn’t a performance enhancement; this is an availability enhancement. Complexity, after all, is the enemy of availability.

Despite it being a sensible goal, it is incredibly common to come up against quite stubborn resistance to simplicity. Frequently, the objections will be based upon the principles of the complex solution being a “better way” to do things. I have two closely linked examples of this in action.

Case 1: Real Application Cluster Interconnects

A cluster interconnect is an incredibly important component of the architecture. The cluster exists, after all, as an availability feature (and possibly a scalability feature), and so the foundations of the cluster must be robust in order for it to deliver that availability. The cluster interconnect is the lifeblood of the cluster. And yet, it has such a very simple set of requirements:

  • Point-to-point communication between all nodes of the cluster
  • Low latency
  • n+1 availability of network paths
  • Multicast support between the nodes
  • (optionally) Jumbo frames support

It explicitly does not need any of the more “fancy” networking features, such as:

  • Routing of any kind
  • Spanning tree support
  • VLANs
  • Access to any other networks

It just needs a dedicated pair (or more) of discrete layer-2 networks. They don’t need to be bonded; the networks do not even need to be aware of each other—they are completely independent (at least, that is certainly the case since the HAIP functionality of Oracle 11g Release 2). They do need real switches, though—crossover cables fail ungracefully in the event of a peer host losing power. But they don’t need anything high end—just something better than crossover cables and with enough bandwidth for the required traffic rates. The latency difference between the majority of switches is barely a consideration. The switches don’t really even need redundant power supplies, though it’s not a terrible idea to insulate yourself from this type of failure, and it brings no detriment apart from a marginal cost increase.

So, something like a pair of unmanaged layer-2 GbE Ethernet switches are the perfect solution. Something like a Netgear JGS516 would probably do the job, from a brief scan of the specification. They are about $166 (£100) each, net cost of $332 (£200) for a nice, robust solution. Or if you wanted to really push the boat out, something like a fully managed L2 switch with redundant power such as the HP E2810-24G will set you back all of $1160 (£700) each. Cisco shops might spend a bit more and go for something like a 3750 G for about $4650 (£2800) each.

But… Somebody will always push back on this. They will plumb the cluster nodes into the full core/edge corporate dream stack topology with fully active failover between a pair of core switches. Surely, at a cost of more than four orders of magnitude more than the bargain basement Netgear solution, this must be better, right? Wrong.

There are numerous aspects that are incorrect in this assumption:

  1. Higher cost means better.
  2. There will be an increase in availability.
  3. Every networking requirement is the same as every other one.

First of all, these network topologies are not designed for cluster interconnects. They are designed for corporate networks, connecting thousands of ports into a flexible and secure network. RAC interconnects are tiny closed networks and need none of that functionality. More precisely, they need none of that complexity. Corporate networks also have a different level of failure sensitivity to cluster interconnects; if a user’s PC goes offline for a couple of minutes, or even half an hour, the recovery from that failure is instant once the fault is rectified—the user is immediately back in action. Cluster interconnects are not so forgiving; if a cluster’s networks go AWOL for a few minutes, the best you can hope for is a single node of the cluster still standing when the fault is rectified. That is how clusters are designed to operate: if the network disappears, the cluster must assume it is unsafe to allow multiple nodes to access the shared storage. The net result of this failure behavior is that a relatively short network outage can result in a potentially lengthy full (and manual) restart of the whole cluster, restart of the application, balancing of services, warming of caches, and so on. It would not be an exaggeration for this to be a one-hour or greater outage. Not terrific for a highly available cluster.

But hang on a minute—this über-expensive networking technology never goes down, right? Not true. What exactly is this active/active core switch topology? Think about it. It’s a kind of cluster itself, with each switch running complex software to determine the health of its peer and managing a ton of state information between them. The magic word in that sentence was the word “software”— anything that is running software has a great deal of failure potential. Not only that, but clustered software has a great deal of potential to fail on all nodes concurrently. This is a unique attribute of distributed software and one that does not exist in discrete hardware designs. In discrete hardware designs it is incredibly unlikely that more than one component will fail concurrently. Software is great at catastrophic failure, most particularly when it is combined with some element of human error during upgrades, reconfiguration, or just plain tinkering. Not even humans can make two independent hardware switches fail concurrently, unless they are being creative with power supply.

Just to highlight this point, I should state here that I have personally witnessed failures of entire core/edge switch topologies on three occasions in the last five years. It does not matter that the cluster nodes are connected to the same edge switches when this kind of failure occurs, because every component in the network is a logical contributor to the larger entity and will become unavailable as part of a larger meltdown. If you are a Blackberry user, you have experienced one yourself recently. The Blackberry issue proves the potential, but in their case the topology was at least appropriate—they have a requirement to interconnect thousands of devices. In our clusters, we have no such requirement, and we should not be implementing overly complex and thus unreliable network topologies accordingly.

Case 2: The Great SAN Splurge

Now let’s think about Storage Area Networking. And let’s not restrict this thought to Fibre Channel, because the same principles apply to an Ethernet-based SAN. In fact, let me just clear off the Ethernet SAN piece first: Don’t use your corporate network for storage connectivity. It’s the wrong thing to do for all the reasons stated in the first case on this page.

So, now we can focus on Fibre Channel SANs. Fibre Channel has become the backbone of the data center, allowing storage devices to be located in sensible locations, perhaps in different rooms to the servers, and for everything to be able to be connected to everything else with optimized structured cabling. The zoning of the fabric then determines which devices are allowed to see other devices. All very well and good, but how is this implemented? Unsurprisingly, it is implemented using an exactly analogous solution to the core/edge Ethernet network design in the previous case. Two active core switches lie at the heart of a multi-tier network and provide failover capability for each other. A cluster. This cluster can (and does) fail for exactly the same reasons given in the former case, and yes, I have also seen this occur in real life—twice in the last five years.

The failure implications for a SAN meltdown can be even more serious than a cluster meltdown. All I/O will stop and, if the outage goes on long enough, all databases in the data center will crash and need to be restarted.

There are a few other implications with this topology in large data centers. Notably, it is common for the storage arrays to be connected via different physical switches than the servers, implying that there are a number of Inter-Switch Links (ISLs) to go through. These ISLs can become congested and cause severe bottlenecks in throughput that can be extremely tricky to track down. In extreme cases, ISLs can be the cause of multi-minute I/O response times, which will also cause clusters and databases to crash.

So that preamble paints the SAN picture and sets the stage for the following questions:

Why are all devices in the SAN connected to all other devices? Why are the handful of nodes that make up your critical database part of a SAN of thousands of other devices? Why are they not just connected via simple switches to the storage array?

There is only one reason and that is data-center cabling. But it doesn’t really follow: If your database servers are in a rack, or a few racks next to each other, put a pair of physically and logically discrete switches into the top of the rack, attach all the nodes, and then connect the storage array using the same number ports that you would have connected to the switches if they had been edge switches. The destination of those cables would be the storage array rather than the core switches, but the number of cable runs is pretty much the same and results in a more robust solution. There is no exposure to catastrophic loss of service in the SAN, because there are two completely discrete SANs between the servers and the storage.

Fibre Channel networks are vertical in nature: server nodes do not communicate with other server nodes over the SAN; they only communicate with the storage array. Server nodes do not need to be connected to thousands of storage arrays, either. The connectivity requirement for a given platform is actually rather simple.

Note: I am writing from the viewpoint of a typical RDBMS implementation, not from the viewpoint of massively parallel HPC or big data systems. Clearly, if there truly are thousands of devices that do need to be connected, this argument does not apply.

Conclusion

The common theme between these two cases is this: Don’t connect things that don’t need to be connected. Yes, it is easier to cable up, and arguably easier to manage, but it has a knock-on effect of dictating an implementation that does not suit the requirement. It results in a less reliable, more complex solution, with the cart very much before the horse. Don’t trade off administrative simplicity against architectural simplicity: it will sneak up and bite you.

As Albert Einstein said, “Make things as simple as possible, but not simpler.” Wise words indeed. ▲

James Morle is a specialist large-scale Oracle-based systems consultant with over 20 years experience in IT consulting. His success is based on two key factors: deep domain expertise in the full Oracle stack (he speaks every language) together with huge credibility with senior IT and business leaders. James is one of the co-founders of the OakTable Network and an Oracle ACE Director. He is also the founder of Scale Abilities Ltd., based in the UK and serving clients across the world. Scale Abilities solves implementation, storage, and performance problems with large, complex Oracle-based systems. International companies rely on its extensive full-stack expertise, which successfully solves the wide range of cross-boundary problems that single-discipline consultancies and experts struggle to handle. Scale Abilities consultants are packed with Oracle-based experience ranging from application architecture to the mechanics of physical hardware and multivendor relationship management. They’re known for their candid, business-focused approach. They’ll tell you what others are afraid to share, and they understand how to unpick politics from performance.

Advertisements

Interview with Tim Gorman: How I accidentally dropped the 16TB main index

Tim Gorman is a technical consultant for Delphix (http://www.Delphix.com), who enable database and storage virtualization to increase the agility of IT development and testing operations. He has co-authored six books, tech-reviewed eight more, and written articles for RMOUG SQL_Update and IOUG SELECT magazines. He has been an Oracle ACE since 2007, an Oracle ACE Director since 2012, a member of the Oak Table Network since 2002, and has presented at Oracle OpenWorld, Collaborate, KScope, Hotsos, and local Oracle users groups in a lot of wonderful places around the world.  Tim lives in Westminster, Colo., with his partner, Kellyn Pot’Vin, and their children.

You are old, father Gorman (as the young man said) and your hair has become very white. You must have lots of stories. Tell us a story!

Well, in the first place, it is not my hair that is white. In point of fact, I’m as bald as a  cue ball, and it is my skin that is pale from a youth misspent in data centers and fluorescent-lit office centers.

It is a mistake to think of wisdom as something that simply accumulates over time. Wisdom accumulates due to one’s passages through the world, and no wisdom accumulates if one remains stationary. It has been said that experience is what one receives soon after they need it, and experience includes both success and failure. So wisdom accumulates with experience, but it accumulates fastest as a result of failure.

About four years ago, or 26 years into my IT career, I dropped an index on a 60 TB table with 24,000 hourly partitions; the index was over 15 TB in size. It was the main table in that production application, of course.

Over a quarter-century of industry experience as a developer, production support, systems administrator, and database administrator: if that’s not enough time to have important lessons pounded into one’s head, then how much time is needed?

My supervisor at the time was amazing. After the shock of watching it all happen and still not quite believing it had happened, I called him at about 9:00 p.m. local time and told him what occurred. I finished speaking and waited for the axe to fall—for the entirely justified anger to crash down on my head. He was silent for about 3 seconds, and then said calmly, “Well, I guess we need to fix it.”

And that was it.

No anger, no recriminations, no humiliating micro-management. We launched straight into planning what needed to happen to fix it.

He got to work notifying the organization about what had happened, and I got started on the rebuild, which eventually took almost 2 weeks to complete.

It truly happens to all of us. And anyone who pretends otherwise simply hasn’t been doing anything important.

How did I come to drop this index? Well, I wasn’t trying to drop it; it resulted from an accident. I was processing an approved change during an approved production outage. I was trying to disable a unique constraint that was supported by the index. I wanted to do this so that a system-maintenance package I had written could perform partition exchange operations (which were blocked by an enabled constraint) on the table. When I tested the disabling of the constraint in the development environment, I used the command ALTER TABLE … DISABLE CONSTRAINT and it indeed disabled the unique constraint without affecting the unique index. Then I tested the same operation again in the QA/Test environment successfully. But when it came time to do so in production, it dropped the index as well.

Surprise!

I later learned that the unique constraint and the supporting unique index had been created out of line in the development and QA/test environments. That is, first the table was created, then the unique index was created, and finally the table was altered to create the unique constraint on the already-existing unique index.

But in production, the unique constraint and the supporting unique index had been created in-line. When the table was created, the CREATE TABLE statement had the unique constraint within, along with the USING INDEX clause to create the unique index.

So when I altered the table in production, disabling the constraint also caused the index to be dropped.

After the mishap, I found the additional syntax for KEEP INDEX, which could have been added to the end of the ALTER TABLE … DISABLE CONSTRAINT command because Oracle recognized the difference in default behaviors.

But that was a discovery I experienced after I needed it.

As to why my supervisor was so calm and matter-of-fact throughout this disaster, I was not surprised; he was always that way, it seemed. What I learned over beers long after this incident is that, in his early life, he learned the true meaning of the words “emergency” and “catastrophe.” He was born in Afghanistan, and he was a young child during the 1980s after the Soviets invaded. His family decided to take refuge in Pakistan, so they sought the help of professional smugglers, similar to what we call “coyotes” on the Mexican-American border. These smugglers moved through the mountains bordering Afghanistan and Pakistan at night on foot, using camels to carry baggage and the very old, the sick and injured, and the very young.

My supervisor was about 9 years old at the time, so the smugglers put him on a camel so he would not slow them down. During the night, as they were crossing a ridge, they were spotted by the Soviets, who opened fire on them using machine guns with tracer bullets. Everyone in the caravan dove to the ground to take cover. Unfortunately, they all forgot about the 9-year-old boy on top of the 8-foot-high camel. My supervisor said he saw the bright tracer bullets arching up toward him from down below in the valley, passing over his head so close that he felt he could just reach up and grab them. He wanted to jump down, but he was so high off the ground he was terrified. Finally, someone realized that he was exposed and they pulled him down off the camel.

As he told this story, he laughed and commented that practically nothing he encountered in IT rose to the level of what he defined as an emergency. The worst that could happen was no more catastrophic than changing a tire on a car.

I’ve not yet been able to reach this level of serenity, but it is still something to which I aspire.

We love stories! Tell us another story!

A little over 10 years ago, I was working in downtown L.A. and arrived in the office early (5:00 a.m.) to start a batch job. I had a key card that got me into the building and into the office during the day, but I was unaware that at night they were locking doors in the elevator lobby. I banged on the doors and tried calling people, to no avail. Finally, after a half-hour, out of frustration, I grabbed one of the door handles and just yanked hard.

It popped open.

I looked at it in surprise, thought “sweet!”, walked in to the cubicle farm, sat down, and started my batch job. All was good.

Around 7:00 a.m., the LAPD showed up. There were about a dozen people in the office now, so the two officers began questioning folks nearest the door. From the opposite side of the room, I stood up, called out “Over here,” and ’fessed up.

They told me that if I hadn’t called them over immediately, they would have arrested me by the time they got to me. Have a nice day, sir.

The NoCOUG Blues

NoCOUG membership and conference attendance have been declining for years. Are user groups still relevant in the age of Google? Do you see the same trends in other user groups? What are we doing wrong? What can we do to reverse the dismal trend? Give away free stuff like T-shirts and baseball caps? Bigger raffles? Better food? 

Yes, the same trends are occurring in other users groups. IT organizations are lean and can’t spare people to go to training. The IT industry is trending older as more and more entry-level functions are sent offshore.

Users groups are about education. Education in general has changed over the past 20 years as online searches, blogs, and webinars have become readily available.

The key to users groups is the quality of educational content that is offered during live events as opposed to online events or written articles. Although online events are convenient, we all know that we, as attendees, get less from them than we do from face-to-face live events. They’re better than nothing, but communities like NoCOUG have the ability to provide the face-to-face live events that are so effective.

One of the difficulties users groups face is fatigue. It is difficult to organize events month after month, quarter after quarter, year after year. There is a great deal of satisfaction in running such an organization, especially one with the long and rich history enjoyed by NoCOUG. But it is exhausting. Current volunteers have overriding work and life conflicts. New volunteers are slow to come forward.

One thing to consider is reaching out to the larger national and international Oracle users groups, such as ODTUG, IOUG, OAUG, Quest, and OHUG. These groups have similar missions and most have outreach programs. ODTUG and IOUG in particular organize live onsite events in some cities, and have webinar programs as well. They have content, and NoCOUG has the membership and audience. NoCOUG members should encourage the board to contact these larger Oracle users groups for opportunities to partner locally.

Another growing trend is meet-ups, specifically through Meetup.com. This is a resource that has been embraced by all manner of tech-savvy people, from all points on the spectrum of the IT industry. I strongly urge all NoCOUG members to join Meetup.com, indicate your interests, and watch the flow of announcements visit your inbox. The meet-ups run the gamut from Hadoop to Android to Oracle Exadata to In-Memory to Big Data to Raspberry Pi to vintage Commodore. I think the future of local technical education lies in the intersection of online organization of local face-to-face interaction facilitated by Meetup.com.

Four conferences per year puts a huge burden on volunteers. There have been suggestions from multiple quarters that we organize just one big conference a year like some other user groups. That would involve changing our model from an annual membership fee of less than $100 for four single-day conferences (quarterly) to more than $300 for a single multiple-day conference (annual), but change is scary and success is not guaranteed. What are your thoughts on the quarterly vs. annual models?

I disagree with the idea that changing the conference format requires increasing annual dues. For example, RMOUG in Colorado (http://rmoug.org/) has one large annual conference with three smaller quarterly meetings, and annual dues are $75 and have been so for years. RMOUG uses the annual dues to pay for the three smaller quarterly education workshops (a.k.a. quarterly meetings) and the quarterly newsletter; the single large annual “Training Days” conference pays for itself with its own separate registration fees, which of course are discounted for members.

Think of a large annual event as a self-sufficient, self-sustaining organization within the organization, open to the public with a discount for dues-paying members.

Other Oracle users groups, such as UTOUG in Utah (http://utoug.org/), hold two large conferences annually (in March and November), and this is another way to distribute scarce volunteer resources. This offers a chance for experimentation as well, by hiring one conference-coordinator company to handle one event and another to handle the other, so that not all eggs are in one basket.

The primary goal of larger conferences is ongoing technical education of course, but a secondary goal is to raise funds for the continued existence of the users group and to help subsidize and augment the website, the smaller events, and the newsletter, if necessary.

It costs a fortune to produce and print the NoCOUG Journal, but we take a lot of pride in our unbroken 28-year history, in our tradition of original content, and in being one of the last printed publications by Oracle user groups. Needless to say it also takes a lot of effort. But is there enough value to show for the effort and the cost? We’ve been called a dinosaur. Should we follow the other dinosaurs into oblivion?

I don’t think so. There are all kinds of formats for publication, from tweets to LinkedIn posts to blogs to magazines to books. Magazines like the NoCOUG Journal are an important piece of the educational ecosystem. I don’t think that any of the Oracle users groups who no longer produce newsletters planned to end up this way. They ceased publishing because the organization could no longer sustain them.

I think today the hurdle is that newsletters can no longer be confined within the users group. Both NoCOUG and RMOUG have independently come to the realization that the newsletter must be searchable and findable online by the world, which provides the incentive for authors to submit content. Today, if it cannot be verified online, it isn’t real. If it isn’t real, then there is little incentive for authors to publish.

So making the NoCOUG Journal available online has been key to its own viability, and NoCOUG membership entitles one to a real hard-copy issue, which is a rare and precious bonus in this day and age.

Oracle Database 12c

Mogens Norgaard (the co-founder of the Oak Table Network) claims that “since Oracle 7.3, that fantastic database has had pretty much everything normal customers need, but the rest of us are not confirmed Luddites. What are the must-have features of Oracle 12c that give customers the incentive to upgrade from 11g to 12c? We’ve heard about pluggable databases and the in-memory option, but they are extra-cost options aren’t they?

I know for a fact that the Automatic Data Optimization (ADO) feature obsolesces about 3,000 lines of complex PL/SQL code that I had written for Oracle 8i, 9i, 10g, and 11g databases. The killer feature within ADO is the ability to move partitions online, without interrupting query operations. Prior to Oracle 12c, accomplishing that alone consumed hundreds of hours of code development, testing, debugging, and release management. Combining ADO with existing features like OLTP compression and HCC compression truly makes transparent “tiers” of storage within an Oracle database feasible and practical. The ADO feature alone is worth the effort of upgrading to Oracle 12c for an organization with longer data retention requirements for historical analytics or regulatory compliance.

What’s not to love about pluggable databases? How different is the pluggable database architecture from the architecture of SQL Server, DB2, and MySQL?

I think that first, in trying to explain Oracle pluggable databases, most people make it seem more confusing than it should be.

Stop thinking of an Oracle database as consisting of software, a set of processes, and a set of database files.

Instead, think of a database server as consisting of an operating system (OS) and an Oracle 12c container database software; a set of Oracle processes; and the basic control files, log files, and a minimal set of data files. When “gold images” of Oracle database servers are created, whether for jumpstart servers or for virtual machines, the Oracle 12c CDB should be considered part of that base operating system image.

Pluggable databases (PDBs) then are the data files installed along with the application software they support. PDBs are just tablespaces that plug into the working processes and infrastructure of the CDBs.

When PDBs are plugged in, all operational activities involving data protection—such as backups or redundancy like Data Guard replication—are performed at the higher CDB level.

Thus, all operational concerns are handled at the CDBs and the operational infrastructure from the PDBs and the applications.

Once the discussion is shifted at that high level, then the similarities are more visible between the Oracle 12c database and other multitenant databases, such as SQL Server and MySQL. Of course there will always be syntactic and formatting differences, but functionally Oracle 12c has been heavily influenced by its predecessors, such as SQL Server and MySQL.

Bonus Question

Do you have any career advice for the younger people reading this interview so that they can be like you some day? Other than actively participating in user groups!

This sounds corny and trite, but there is no such thing as a useless experience, and while it may be frustrating, it presents the opportunity to build. Understand that everyone starts at the bottom, and enjoy the climb.

Understand that learning causes stress. Stress is stress and too much can be unhealthy, but if it is a result of learning something new, then recognize it for what it is, know it is temporary and transitory, tough it out, and enjoy knowing the outcome when it arrives.

Also, don’t voice a complaint unless you are prepared to present at least one viable solution, if not several. Understand what makes each solution truly viable and what makes it infeasible. If you can’t present a solution to go with the complaint, then more introspection is needed. The term “introspection” is used deliberately, as it implies looking within rather than around.

Help people. Make an impact. Can we go wrong in pursuing either of those as goals? Sometimes I wish I had done more along these lines. Never do I wish I had done less. ▲

 

We Have a Winner!

Kim Berg Hansen from Denmark wins the SQL mini-challenge for his UNION ALL materialized view with fast refresh on commit. An analysis of his solution has been published in the NoCOUG Journal. Judge’s statement: “Kim won on participation, efficiency, and accuracy. He kept submitting refinements, lowering the bar until he reached the theoretical lower limit of 1. His solutions survived the test cases written by the judging committee.”

SQL Mini-challenge entry: Craig Martin v4

by Craig Martin

So from my last post Brendan Furey pointed out one minor detail.. that my code was seriously flawed. It only returned the results from one of the criteria found. Picky, picky 🙂 So this is my last effort.. Back up to 12 buffers, but still works when you change the parameters:

select /*+ ordered index_ffs(j job_id_pk) use_nl(jh) use_nl(e) use_nl(e2) use_nl(e3) use_nl(d) use_nl(l) */  distinct l.location_id, l.city
from dual
    left outer join jobs j
        on j.job_title = 'President'
    left outer join job_history jh
        on jh.job_id = j.job_id
    left outer join employees e
        on (e.first_name = 'Steven' and e.last_name = 'King')
    left outer join employees e2
        on e2.job_id = j.job_id
        and e2.rowid <> e.rowid
    left outer join employees e3
        on e3.employee_id = jh.employee_id
        and e3.rowid <> e.rowid
        and e3.rowid <> e2.rowid
    inner join employees e4
        on e4.rowid = e.rowid
        or e4.rowid = e2.rowid
        or e4.rowid = e3.rowid
    inner join departments d
        on d.department_id = e4.department_id
    inner join locations l
        on d.location_id = l.location_id;
PLAN_TABLE_OUTPUT
SQL_ID  0rt22nxkn3b1w, child number 0
-------------------------------------
select /*+ ordered index_ffs(j job_id_pk) use_nl(jh) use_nl(e) 
use_nl(e2) use_nl(e3) use_nl(d) use_nl(l) */  distinct l.location_id, 
l.city from dual     left outer join jobs j         on j.job_title = 
'Sales Manager'     left outer join job_history jh         on jh.job_id 
= j.job_id     left outer join employees e         on (e.first_name = 
'Steven' and e.last_name = 'King')     left outer join employees e2     
    on e2.job_id = j.job_id         and e2.rowid <> e.rowid     left 
outer join employees e3         on e3.employee_id = jh.employee_id      
   and e3.rowid <> e.rowid         and e3.rowid <> e2.rowid     inner 
join employees e4         on e4.rowid = e.rowid         or e4.rowid = 
e2.rowid         or e4.rowid = e3.rowid     inner join departments d    
     on d.department_id = e4.department_id     inner join locations l   
      on d.location_id = l.location_id
 
Plan hash value: 267334121
 
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                | Name              | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time   | A-Rows |   A-Time   | Buffers | Reads  |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                         |                   |      1 |        |       |    46 (100)|          |      2 |00:00:00.01 |      61 |      2 |
|   1 |  HASH UNIQUE                             |                   |      1 |     18 |  1404 |    46   (3)| 00:00:01 |      2 |00:00:00.01 |      61 |      2 |
|   2 |   NESTED LOOPS                           |                   |      1 |        |       |            |          |     15 |00:00:00.01 |      61 |      2 |
|   3 |    NESTED LOOPS                          |                   |      1 |     18 |  1404 |    45   (0)| 00:00:01 |     15 |00:00:00.01 |      46 |      2 |
|   4 |     NESTED LOOPS                         |                   |      1 |     18 |  1188 |    27   (0)| 00:00:01 |     15 |00:00:00.01 |      42 |      2 |
|   5 |      NESTED LOOPS                        |                   |      1 |     18 |  1062 |     9   (0)| 00:00:01 |     15 |00:00:00.01 |      23 |      2 |
|   6 |       NESTED LOOPS OUTER                 |                   |      1 |      6 |   264 |     6   (0)| 00:00:01 |      5 |00:00:00.01 |       8 |      2 |
|   7 |        VIEW                              |                   |      1 |      6 |   168 |     6   (0)| 00:00:01 |      5 |00:00:00.01 |       6 |      1 |
|   8 |         NESTED LOOPS OUTER               |                   |      1 |      6 |   318 |     6   (0)| 00:00:01 |      5 |00:00:00.01 |       6 |      1 |
|   9 |          VIEW                            |                   |      1 |      1 |    32 |     6   (0)| 00:00:01 |      1 |00:00:00.01 |       5 |      1 |
|  10 |           MERGE JOIN OUTER               |                   |      1 |      1 |    32 |     6   (0)| 00:00:01 |      1 |00:00:00.01 |       5 |      1 |
|  11 |            NESTED LOOPS OUTER            |                   |      1 |      1 |    20 |     5   (0)| 00:00:01 |      1 |00:00:00.01 |       4 |      1 |
|  12 |             NESTED LOOPS OUTER           |                   |      1 |      1 |     7 |     4   (0)| 00:00:01 |      1 |00:00:00.01 |       2 |      0 |
|  13 |              FAST DUAL                   |                   |      1 |      1 |       |     2   (0)| 00:00:01 |      1 |00:00:00.01 |       0 |      0 |
|  14 |              VIEW                        |                   |      1 |      1 |     7 |     2   (0)| 00:00:01 |      1 |00:00:00.01 |       2 |      0 |
|* 15 |               TABLE ACCESS BY INDEX ROWID| JOBS              |      1 |      1 |    27 |     2   (0)| 00:00:01 |      1 |00:00:00.01 |       2 |      0 |
|  16 |                INDEX FULL SCAN           | JOB_ID_PK         |      1 |     19 |       |     1   (0)| 00:00:01 |     19 |00:00:00.01 |       1 |      0 |
|  17 |             TABLE ACCESS BY INDEX ROWID  | JOB_HISTORY       |      1 |      1 |    13 |     1   (0)| 00:00:01 |      1 |00:00:00.01 |       2 |      1 |
|* 18 |              INDEX RANGE SCAN            | JHIST_JOB_IX      |      1 |      1 |       |     0   (0)|          |      1 |00:00:00.01 |       1 |      0 |
|  19 |            BUFFER SORT                   |                   |      1 |      1 |    12 |     5   (0)| 00:00:01 |      1 |00:00:00.01 |       1 |      0 |
|  20 |             VIEW                         |                   |      1 |      1 |    12 |     1   (0)| 00:00:01 |      1 |00:00:00.01 |       1 |      0 |
|* 21 |              INDEX RANGE SCAN            | EMP_NAME_IX       |      1 |      1 |    27 |     1   (0)| 00:00:01 |      1 |00:00:00.01 |       1 |      0 |
|* 22 |          INDEX RANGE SCAN                | EMP_JOB_IX        |      1 |      6 |   126 |     0   (0)|          |      5 |00:00:00.01 |       1 |      0 |
|* 23 |        INDEX UNIQUE SCAN                 | EMP_EMP_ID_PK     |      5 |      1 |    16 |     0   (0)|          |      5 |00:00:00.01 |       2 |      1 |
|* 24 |       INDEX FAST FULL SCAN               | EMP_DEPARTMENT_IX |      5 |      3 |    45 |     1   (0)| 00:00:01 |     15 |00:00:00.01 |      15 |      0 |
|  25 |      TABLE ACCESS BY INDEX ROWID         | DEPARTMENTS       |     15 |      1 |     7 |     1   (0)| 00:00:01 |     15 |00:00:00.01 |      19 |      0 |
|* 26 |       INDEX UNIQUE SCAN                  | DEPT_ID_PK        |     15 |      1 |       |     0   (0)|          |     15 |00:00:00.01 |       4 |      0 |
|* 27 |     INDEX UNIQUE SCAN                    | LOC_ID_PK         |     15 |      1 |       |     0   (0)|          |     15 |00:00:00.01 |       4 |      0 |
|  28 |    TABLE ACCESS BY INDEX ROWID           | LOCATIONS         |     15 |      1 |    12 |     1   (0)| 00:00:01 |     15 |00:00:00.01 |      15 |      0 |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
  15 - filter("J"."JOB_TITLE"='Sales Manager')
  18 - access("JH"."JOB_ID"="J"."JOB_ID")
  21 - access("E"."LAST_NAME"='King' AND "E"."FIRST_NAME"='Steven')
  22 - access("E2"."JOB_ID"="J"."JOB_ID")
       filter("E2".ROWID<>"E"."ROWID")
  23 - access("E3"."EMPLOYEE_ID"="JH"."EMPLOYEE_ID")
       filter(("E3".ROWID<>"E2"."ROWID" AND "E3".ROWID<>"E"."ROWID"))
  24 - filter(("E4".ROWID="E"."QCSJ_C000000000900017" OR "E4".ROWID="E2"."QCSJ_C000000000900016" OR "E4".ROWID="E3".ROWID))
  26 - access("D"."DEPARTMENT_ID"="E4"."DEPARTMENT_ID")
  27 - access("D"."LOCATION_ID"="L"."LOCATION_ID")

Thanks again to the NoCOUG for putting on this challenge!

SQL Mini-Challenge Update

The SQL Mini-Challenge is now closed. The judges will be Kyle Hailey, Tim Gorman, and Iggy Fernandez. The winner will be announced in the fall issue of the NoCOUG Journal.

Correctness is the primary screening criterion that will be used by the judges. Submissions are expected to produce the same results as the original query even if the data changes. The following script has been constructed to test submissions:

update employees
set
  first_name='Stevie',
  last_name='Wonder',
  hire_date=sysdate,
  job_id='AD_VP',
  department_id=50
where employee_id=100;

update employees
set
  hire_date='16-MAR-97'
where employee_id=101;

update employees
set
  job_id='AD_PRES',
  hire_date=sysdate,
  department_id=60
where employee_id=101;

update employees
set
  first_name='Steven',
  last_name='King'
where employee_id=102;

ALS Ice Bucket Challenge at NoCOUG Summer Conference

2014 Fall Conference - Ice Bucket Challenge

NoCOUG president Hanan Hit and NoCOUG Journal editor Iggy Fernandez perform the ALS ice bucket challenge at the NoCOUG summer conference.

SQL Mini-challenge entry: Kim Berg Hansen v6

by Kim Berg Hansen

Okay, okay, Iggy, you want one buffer get? I’ll do it 😉

Actually two different versions that might be useful in different circumstances. First one goes like this:

create materialized view log on employees with rowid, commit scn
   (employee_id, job_id, first_name, last_name, department_id)
   including new values;
create materialized view log on departments with rowid, commit scn
   (department_id, location_id)
   including new values;
create materialized view log on locations with rowid, commit scn
   (location_id, city)
   including new values;
create materialized view log on jobs with rowid, commit scn
   (job_id, job_title)
   including new values;
create materialized view log on job_history with rowid, commit scn
   (job_id, employee_id)
   including new values;

create materialized view emp_jobs_mv
build immediate
refresh fast on commit
enable query rewrite
as
select e.first_name, e.last_name, j.job_title, l.location_id, l.city, 'HIST' record_type
     , j.rowid j_rowid, jh.rowid jh_rowid, e.rowid e_rowid, d.rowid d_rowid, l.rowid l_rowid
  from jobs j, job_history jh, employees e, departments d, locations l
 where jh.job_id = j.job_id
   and e.employee_id = jh.employee_id
   and d.department_id = e.department_id
   and l.location_id = d.location_id
union all
select e.first_name, e.last_name, j.job_title, l.location_id, l.city, 'CURR' record_type
     , j.rowid j_rowid, null jh_rowid, e.rowid e_rowid, d.rowid d_rowid, l.rowid l_rowid
  from jobs j, employees e, departments d, locations l
 where e.job_id = j.job_id
   and d.department_id = e.department_id
   and l.location_id = d.location_id;

create index emp_jobs_kings_presidents on emp_jobs_mv (
   case
      when first_name = 'Steven' and last_name = 'King' then 1
      when job_title = 'President' then 1
      else 0
   end
 , location_id
 , city
);

begin
   dbms_stats.gather_schema_stats(
      'HR'
    , estimate_percent => null
    , cascade => true
   );
end;

We make a single fast refresh on commit mview that contains both names and job titles (current as well as historical.) And then to make it possible to just access a single index block we create a very specialized index with a “boolean” indicating whether the row is King or president.

Now we select the data where the “boolean” expression is “true”:

select /*+ gather_plan_statistics */
       distinct location_id, city
  from (
   select e.first_name, e.last_name, j.job_title, l.location_id, l.city
     from jobs j, job_history jh, employees e, departments d, locations l
    where jh.job_id = j.job_id
      and e.employee_id = jh.employee_id
      and d.department_id = e.department_id
      and l.location_id = d.location_id
   union all
   select e.first_name, e.last_name, j.job_title, l.location_id, l.city
     from jobs j, employees e, departments d, locations l
    where e.job_id = j.job_id
      and d.department_id = e.department_id
      and l.location_id = d.location_id
  )
 where case
          when first_name = 'Steven' and last_name = 'King' then 1
          when job_title = 'President' then 1
          else 0
       end = 1;

And we get the desired single buffer get:

-----------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation         | Name                      | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time   | A-Rows |   A-Time   | Buffers |
-----------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |                           |      1 |        |       |     2 (100)|          |      1 |00:00:00.01 |       1 |
|   1 |  HASH UNIQUE      |                           |      1 |      1 |    19 |     2  (50)| 00:00:01 |      1 |00:00:00.01 |       1 |
|*  2 |   INDEX RANGE SCAN| EMP_JOBS_KINGS_PRESIDENTS |      1 |      1 |    19 |     1   (0)| 00:00:01 |      1 |00:00:00.01 |       1 |
-----------------------------------------------------------------------------------------------------------------------------------------
                                                                                                                                         
Predicate Information (identified by operation id):                                                                                      
---------------------------------------------------                                                                                      
                                                                                                                                         
   2 - access("EMP_JOBS_MV"."SYS_NC00012$"=1)                                                                                            

That will scale and keep queries down to few buffer gets even for much more data – as long as there are few Kings and presidents 😉

Alternative solution puts the predicates inside the mview, so the mview only contains the desired data and nothing more. The mview becomes less generally usable, but for specialized cases it might be OK (in a way it is similar to creating indexes on expressions that are NULL for non-desired data to make a very small index with only desired data.)

create materialized view kings_presidents_mv
build immediate
refresh fast on commit
enable query rewrite
as
select e.first_name, e.last_name, j.job_title, l.location_id, l.city, 'HIST' record_type
     , j.rowid j_rowid, jh.rowid jh_rowid, e.rowid e_rowid, d.rowid d_rowid, l.rowid l_rowid
  from jobs j, job_history jh, employees e, departments d, locations l
 where jh.job_id = j.job_id
   and e.employee_id = jh.employee_id
   and d.department_id = e.department_id
   and l.location_id = d.location_id
   and j.job_title = 'President'
union all
select e.first_name, e.last_name, j.job_title, l.location_id, l.city, 'CURR' record_type
     , j.rowid j_rowid, null jh_rowid, e.rowid e_rowid, d.rowid d_rowid, l.rowid l_rowid
  from jobs j, employees e, departments d, locations l
 where e.job_id = j.job_id
   and d.department_id = e.department_id
   and l.location_id = d.location_id
   and ( (e.first_name = 'Steven' and e.last_name = 'King')
         or j.job_title = 'President' );

alter table kings_presidents_mv modify
(
  first_name   not null,
  last_name    not null,
  job_title    not null,
  location_id  not null,
  city         not null,
  record_type  not null,
  j_rowid      not null,
  e_rowid      not null,
  d_rowid      not null,
  l_rowid      not null
);

create index kings_presidents_ix on kings_presidents_mv (
   location_id, city
);

begin
   dbms_stats.gather_schema_stats(
      'HR'
    , estimate_percent => null
    , cascade => true
   );
end;

And we can just select the entire mview index via query rewrite:

select /*+ gather_plan_statistics */
       distinct location_id, city
  from (
   select l.location_id, l.city
     from jobs j, job_history jh, employees e, departments d, locations l
    where jh.job_id = j.job_id
      and e.employee_id = jh.employee_id
      and d.department_id = e.department_id
      and l.location_id = d.location_id
      and j.job_title = 'President'
   union all
   select l.location_id, l.city
     from jobs j, employees e, departments d, locations l
    where e.job_id = j.job_id
      and d.department_id = e.department_id
      and l.location_id = d.location_id
      and ( (e.first_name = 'Steven' and e.last_name = 'King')
            or j.job_title = 'President' )
  );
------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation          | Name                | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time   | A-Rows |   A-Time   | Buffers |
------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |                     |      1 |        |       |     2 (100)|          |      1 |00:00:00.01 |       1 |
|   1 |  SORT UNIQUE NOSORT|                     |      1 |      1 |    11 |     2  (50)| 00:00:01 |      1 |00:00:00.01 |       1 |
|   2 |   INDEX FULL SCAN  | KINGS_PRESIDENTS_IX |      1 |      1 |    11 |     1   (0)| 00:00:01 |      1 |00:00:00.01 |       1 |
------------------------------------------------------------------------------------------------------------------------------------

No predicates are used at all – just return the contents of the index and done.

The setting of mview columns to NOT NULL to match the nullability of the master tables is one way to make that query use the index, if the columns had been nullable we would have got a full table scan of the mview instead, which would be 3 buffer gets on my 12.1 instance.

Alternative to setting the mview columns to NOT NULL could be to write the query like this:

select /*+ gather_plan_statistics */
       distinct location_id, city
  from (
   select l.location_id, l.city
     from jobs j, job_history jh, employees e, departments d, locations l
    where jh.job_id = j.job_id
      and e.employee_id = jh.employee_id
      and d.department_id = e.department_id
      and l.location_id = d.location_id
      and j.job_title = 'President'
   union all
   select l.location_id, l.city
     from jobs j, employees e, departments d, locations l
    where e.job_id = j.job_id
      and d.department_id = e.department_id
      and l.location_id = d.location_id
      and ( (e.first_name = 'Steven' and e.last_name = 'King')
            or j.job_title = 'President' )
  )
 where location_id is not null
   and city is not null;

Which also gets us index full scan:

------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation          | Name                | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time   | A-Rows |   A-Time   | Buffers |
------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |                     |      1 |        |       |     2 (100)|          |      1 |00:00:00.01 |       1 |
|   1 |  SORT UNIQUE NOSORT|                     |      1 |      1 |    11 |     2  (50)| 00:00:01 |      1 |00:00:00.01 |       1 |
|*  2 |   INDEX FULL SCAN  | KINGS_PRESIDENTS_IX |      1 |      1 |    11 |     1   (0)| 00:00:01 |      1 |00:00:00.01 |       1 |
------------------------------------------------------------------------------------------------------------------------------------
                                                                                                                                    
Predicate Information (identified by operation id):                                                                                 
---------------------------------------------------                                                                                 
                                                                                                                                    
   2 - filter(("KINGS_PRESIDENTS_MV"."LOCATION_ID" IS NOT NULL AND "KINGS_PRESIDENTS_MV"."CITY" IS NOT NULL))                       

But that requires evaluating an access predicate 😉

Next NoCOUG meeting
August 20 at Chevron, San Ramon

Follow me on Twitter

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 280 other followers