Execute Queries in Parallel
This document describes the use cases, methods, limitations, and common issues for the parallel query feature in SynxDB Cloud. Parallel query aims to improve query performance by utilizing multiple CPU cores to process a single query.
Use cases
You can deploy a small number of segments on a single physical machine and use dynamic parallelism adjustment as an alternative to deploying a large number of segments, thereby improving performance.
Enabling operator parallelism provides a performance advantage when the host CPU and disk loads are not high.
How to use
SynxDB Cloud supports parallel query on heap tables.
Parallel query on heap tables
Before enabling the parallel query feature, you need to disable the GPORCA optimizer.
SET enable_parallel = ON; SET optimizer = OFF;
Set the maximum degree of parallelism.
-- This setting should take into account the number of CPU cores and segments. SET max_parallel_workers_per_gather = 4;
Query example:
CREATE TABLE t1 (c1 int,c2 int, c3 int, c4 box);
INSERT INTO t1 SELECT x, 2*x, 3*x, box('6,6,6,6') FROM generate_series(1,1000000) AS x;
SELECT count(*) from t1;
Parallel execution for window function queries
SynxDB Cloud supports parallel processing for window functions, which can greatly improve query efficiency, especially when a PARTITION BY clause is used. It works by redistributing data based on the PARTITION BY columns, ensuring that all rows with the same partition key are processed by the same worker process.
For example, for the following query:
SELECT sum(salary) OVER w, rank() OVER w FROM empsalary
WINDOW w AS (PARTITION BY depname ORDER BY salary DESC);
With parallel query enabled, the execution plan will show Parallel Seq Scan and Redistribute Motion, indicating that the underlying table scan and data distribution are performed in parallel.
Note
For window functions to be executed in parallel, make sure that your segment count is 3 or more.
Example of a non-parallel execution plan:
QUERY PLAN
----------------------------------------------
Gather Motion 3:1 (slice1; segments: 3)
-> WindowAgg
Partition By: depname
Order By: salary
-> Sort
Sort Key: depname, salary DESC
-> Seq Scan on empsalary
Example of a parallel execution plan:
QUERY PLAN
---------------------------------------------------------------------
Gather Motion 12:1 (slice1; segments: 12)
-> WindowAgg
Partition By: depname
Order By: salary
-> Sort
Sort Key: depname, salary DESC
-> Redistribute Motion 12:12 (slice2; segments: 12)
Hash Key: depname
-> Parallel Seq Scan on empsalary
Even if a query does not include a PARTITION BY clause, the optimizer might still partially leverage parallelism by performing a parallel scan on the underlying tables.
Parameter description
Parameter name |
Description |
Default value |
Required to set |
Example |
|---|---|---|---|---|
|
Enables or disables the parallel feature. |
|
Yes |
|
|
Enables or disables the GPORCA optimizer. |
|
Yes |
|
Frequently asked questions
Currently, parallel execution is supported for queries containing the following operators. SynxDB Cloud does not yet support queries with other operators.
sequence scan bitmap heap scan append hash join nestloop join merge join WindowAgg
Parallel query does not improve query performance in all situations. An excessively high degree of parallelism can cause excessive load, leading to a decrease in performance.
Enabling parallelism means a multiplied memory overhead, which may lead to “out of memory” errors.
Parallel processing is not currently supported for window functions that include a
CASE WHENexpression.