RAC local parallel query internalsRAC tuning tipsJanuary 10, 2015 |
One of the greatest benefits of running parallel SQL statements in an Oracle RAC environment is the ability to distribute the parallel slaves over multiple nodes. However, doing so requires extra overhead than if the parallel slaves were all processed on the local node. There are times when the SQL statement will execute more quickly with multiple nodes at work and times when the SQL statement will execute more quickly when run on a single node and avoid that extra overhead. Remember, each SQL statement is different as well as the system configuration it runs on, and parallel SQL statements should be tested to determine if a single node or multi-node execution works best for the specific environment. You should also be considering the impact of multiple parallel processes all running on the same node that could be impacting other processes competing for the same resources.
The parallel_force_local initialization parameter is used to control parallel SQL in Oracle RAC environments distribution over nodes other than the local, originating host. The default value for this parameter is FALSE and it is a best practice to leave this parameter at its default value. If this parameter is set to TRUE, then all parallel operations will only run on the local node. If this parameter does need to be set, it is best to do so at the session with the following command.
alter session set parallel_force_local=true;
To illustrate if forcing the parallel SQL to a specific node is beneficial or not, consider the following query.
select
/*+ parallel */
o.order_id,
max(d.order_date) as max_date,
count(*) as num_details
from
orders o
join
order_details d
on o.order_id = d.order_id
group by
o.order_id;
The DEGREE OF PARALLELISM for this statement is 8. The SQL statement execution was traced when running with parallel SQL across the cluster and when forcing the parallel slaves to all be local. The SQL statement completed in 35 seconds when executed across the cluster and in 20 seconds when forced locally. The wait events when executed across the cluster are shown below as captured from tkprof of a session trace file.
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
reliable message 2 0.01 0.02
enq: KO - fast object checkpoint 2 0.00 0.00
KJC: Wait for msg sends to complete 4 0.00 0.00
PX Deq: reap credit 620 0.01 0.02
PX Deq: Join ACK 12 0.00 0.01
IPC send completion sync 8 0.00 0.00
PX Deq: Parse Reply 8 12.35 12.35
resmgr:cpu quantum 2 0.01 0.01
SQL*Net message to client 1315 0.00 0.00
PX Deq: Execute Reply 250 1.13 19.77
SQL*Net message from client 1314 0.02 2.40
PX Deq: Signal ACK EXT 8 0.00 0.00
PX Deq: Slave Session Stats 8 0.00 0.00
enq: PS - contention 4 0.00 0.00
When executed all within the same node, the wait events look like the following.
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
reliable message 1 0.00 0.00
enq: KO - fast object checkpoint 1 0.00 0.00
KJC: Wait for msg sends to complete 4 0.00 0.00
PX Deq: Join ACK 4 0.00 0.00
enq: PS - contention 1 0.00 0.00
PX Deq: Parse Reply 4 0.00 0.00
SQL*Net message to client 1315 0.00 0.00
PX Deq: Execute Reply 115 2.52 20.57
resmgr:cpu quantum 1 0.00 0.00
SQL*Net message from client 1314 0.01 2.93
PX Deq: Signal ACK EXT 4 0.00 0.00
PX Deq: Slave Session Stats 4 0.00 0.00
Notice that the PX Deq: Execute Reply wait events have very similar timings. This wait event is essentially the time that the QC waits for the slave processes to finish their work. The PX Deq: Parse Reply wait event has a total time of zero when executed locally and a total wait time of 12.35 seconds when executed across the nodes. The PX Deq: Parse Reply wait event differences account for almost all of the 15-second difference in runtimes. This wait even occurs when the QC process is waiting for the parallel slaves to receive and parse their work from the QC. When the processes are all local, the wait time is hardly noticeable and it makes sense that when some slave processes are on another note, it takes additional time to complete this operation.
If the local node has plenty of CPU resources and I/O bandwidth available to it, you may want the parallel SQL statement to run locally. Doing so would be no different than running a parallel SQL statement on a single-instance database. If you need the parallel SQL statement to enjoy the scale out capabilities of multiple nodes, then the overhead of distributing the parallel slaves can be overcome by the quicker time to completion for each parallel slave. In the end, the only way to know if running the parallel SQL statement locally or across the cluster is to run the statement both ways.