The first step in any tuning project should be obvious, but is often overlooked. You need to clearly understand the problem as well as the desired result. More specifically, the problem and desired result need to be defined from the perspective of an end user and defined by wall time. Anything else will lead to what is commonly known as CTD.. compulsive tuning disorder.
It takes 45 seconds from the time I click this button to when I see the report. I need it to return in 5 seconds or less.
That is a perfect problem and requirements statement.
Query x runs for 45 seconds. Please make it faster.
That leads to disaster. If I get the query to return in 42 seconds, is that a success? Probably not. If I get the query to return in 4 seconds, do I spend the time looking into different options that might get it to return in 3.5 seconds? And once I get there, do I keep going until all options are exhausted? (hint: they never are!). This is the definition of CTD!
So after you have a good problem and requirements statement, you are ready for step 2.. figuring out where the time is going. So back to our example, from the time that the user clicks a button on a webpage, it takes 45 seconds. So what does that button do? For our example, we will say clicking the button:
- Client browser makes a post to an application server
- Application server processes the request and submits a query to the database
- Database processes query and returns results to application server
- Application server processes the results and returns the data to the client
- Client browser processes and displays the data
Note that this is a very simplistic representation and is missing what could be key steps (network hops between each layer, application server might go back to database several times to fetch more data, etc). Also, in reality each of these steps will typically have several steps of their own. However, this should be good for our simple example.
If you are extremely lucky, your entire application stack will be very well instrumented, and you can easily look at the timings for each of these steps. I have never been that lucky in a real-world example, so usually I am stuck trying to piece together what I do have to make a good enough representation. The fortunate thing is that Oracle is extremely well instrumented, so that makes for a great place to start.
For the purposes of this exercise, we will say that we have no instrumentation on anything except for Oracle. We trace our query and see that it takes 43 seconds to execute. That leaves 2 seconds in the other steps of the process. I now know that to meet the requirements (5 second response time), I need to get the query to return in 3 seconds (the 2 seconds is assumed to not change). If I get the query down to 4 seconds and would have to do something drastic to reduce it further, it would make more sense to start investigating where exactly in the stack the 2 non-Oracle seconds are going and see if there is a better solution there.
However, if Oracle were reporting that the query was completing in anything less than 40 seconds, I would know I need to immediately start getting other teams involved to look into where the perfomance issue actually lies. Otherwise, no matter how much I tune the query, I will never be able to meet the 5 second requirement.
The moral of the story is when you are looking at a performance issue, you need to understand where your time is actually going. This also applies when looking only at the query.. of the 43 seconds, where is Oracle spending its time? Is it wasted I/O operations? Is it contention for a resource that another processes is using? You don’t really know where to start with the query until, again, you know exactly where your time is going. I could make a series of posts on this subject alone, but would never be able to do justice to the work that Cary Millsap has already done. SO I will just urge you to read his work on the topic. A few must read’s are: