Execute Queries in Parallel

This document describes the use cases, methods, limitations, and common issues for the parallel query feature in SynxDB Cloud. Parallel query aims to improve query performance by utilizing multiple CPU cores to process a single query.

Use cases

You can deploy a small number of segments on a single physical machine and use dynamic parallelism adjustment as an alternative to deploying a large number of segments, thereby improving performance.

Enabling operator parallelism provides a performance advantage when the host CPU and disk loads are not high.

How to use

SynxDB Cloud supports parallel query on heap tables.

Parallel query on heap tables

Before enabling the parallel query feature, you need to disable the GPORCA optimizer.
```
SET enable_parallel = ON;
SET optimizer = OFF;
```

Set the maximum degree of parallelism.

-- This setting should take into account the number of CPU cores and segments.
SET max_parallel_workers_per_gather = 4;

Query example:

CREATE TABLE t1 (c1 int,c2 int, c3 int, c4 box);
INSERT INTO t1 SELECT x, 2*x, 3*x, box('6,6,6,6') FROM generate_series(1,1000000) AS x;
SELECT count(*) from t1;

Parallel execution for window function queries

SynxDB Cloud supports parallel processing for window functions, which can greatly improve query efficiency, especially when a PARTITION BY clause is used. It works by redistributing data based on the PARTITION BY columns, ensuring that all rows with the same partition key are processed by the same worker process.

For example, for the following query:

SELECT sum(salary) OVER w, rank() OVER w FROM empsalary 
WINDOW w AS (PARTITION BY depname ORDER BY salary DESC);

With parallel query enabled, the execution plan will show Parallel Seq Scan and Redistribute Motion, indicating that the underlying table scan and data distribution are performed in parallel.

Note

For window functions to be executed in parallel, make sure that your segment count is 3 or more.

Example of a non-parallel execution plan:

                  QUERY PLAN
----------------------------------------------
 Gather Motion 3:1  (slice1; segments: 3)
   ->  WindowAgg
         Partition By: depname
         Order By: salary
         ->  Sort
               Sort Key: depname, salary DESC
               ->  Seq Scan on empsalary

Example of a parallel execution plan:

                             QUERY PLAN
---------------------------------------------------------------------
 Gather Motion 12:1  (slice1; segments: 12)
   ->  WindowAgg
         Partition By: depname
         Order By: salary
         ->  Sort
               Sort Key: depname, salary DESC
               ->  Redistribute Motion 12:12  (slice2; segments: 12)
                     Hash Key: depname
                     ->  Parallel Seq Scan on empsalary

Even if a query does not include a PARTITION BY clause, the optimizer might still partially leverage parallelism by performing a parallel scan on the underlying tables.

Parameter description

Parameter name	Description	Default value	Required to set	Example
`enable_parallel`	Enables or disables the parallel feature.	`OFF`	Yes	`SET enable_parallel = ON;`
`optimizer`	Enables or disables the GPORCA optimizer.	`ON`	Yes	`SET optimizer = OFF;`

Frequently asked questions

Currently, parallel execution is supported for queries containing the following operators. SynxDB Cloud does not yet support queries with other operators.
```
sequence scan      
bitmap heap scan      
append
hash join      
nestloop join      
merge join
WindowAgg
```
Parallel query does not improve query performance in all situations. An excessively high degree of parallelism can cause excessive load, leading to a decrease in performance.
Enabling parallelism means a multiplied memory overhead, which may lead to “out of memory” errors.
Parallel processing is not currently supported for window functions that include a CASE WHEN expression.