Table Storage Format

SynxDB Cloud uses the Cloud storage format by default.

The Cloud storage format is a database storage format that combines the benefits of row-based storage (NSM, N-ary Storage Model) and column-based storage (DSM, Decomposition Storage Model). It is designed to improve query performance, particularly in terms of cache efficiency. In OLAP scenarios, Cloud offers batch write performance similar to row-based storage and read performance like column-based storage. The Cloud format can adapt to both cloud environments with object storage models and traditional offline physical file-based storage methods.

Compared to traditional storage formats, the Cloud format has the following features:

  • Data updates and deletions: The Cloud format uses a mark-and-delete approach for data updates and deletions. This effectively manages changes in physical files without immediately rewriting the entire data file.

  • Concurrency control and read-write isolation: The Cloud format uses Multi-Version Concurrency Control (MVCC) to achieve efficient concurrency control and read-write isolation. The control granularity reaches the level of individual data files, enhancing operation safety and efficiency.

  • Data encoding and compression: The Cloud format offers multiple data encoding methods (such as run-length encoding) and compression options (such as zstd and zlib), with various compression levels. These features help reduce storage space requirements while optimizing read performance.

  • Statistics: Data files contain detailed statistics that are used for quick filtering and query optimization, reducing unnecessary data scanning and speeding up query processing.

  • Vectorized engine: The Cloud format also supports a vectorized engine. It aims to further enhance data processing capabilities and query performance, especially in applications like data analysis and report generation.

Applicable scenarios

The hybrid storage capability of the Cloud format makes it suitable for complex OLAP applications that need to handle large amounts of data writes and frequent queries. Whether you are looking for a high-performance data analysis solution in a cloud infrastructure or dealing with large datasets in a traditional data center environment, Cloud can provide strong support.

Usage

Create a Cloud table

In SynxDB Cloud, the Cloud format is the default storage format for newly created tables. You do not need any special configuration.

The syntax for creating a table is the same as the standard SQL syntax:

-- t1 will automatically use the Cloud storage format
CREATE TABLE t1(a int, b int, c text);

If you specify another storage format in the CREATE TABLE statement using the USING clause (for example, heap or hybrid), the system will ignore the clause and automatically use the Cloud format. At the same time, the system will return a WARNING message indicating that the USING clause is kept only for grammar compatibility.

For example:

postgres=# CREATE TABLE my_test_heap (id int) USING heap;
WARNING:  table AM has beed auto defined, `using` clause is only for grammar compatible.
CREATE TABLE

When creating a table, you can also specify minimum and maximum value information for certain columns to speed up queries:

-- Use WITH(minmax_columns='b,c') to specify that columns b and c
-- should record min and max statistics.
-- This helps optimize queries involving these two columns,
-- because the system can quickly determine which data blocks might contain matching data.
CREATE TABLE p2(a INT, b INT, c INT) WITH(minmax_columns='b,c');

INSERT INTO p2 SELECT i, i * 2, i * 3 FROM generate_series(1,10)i;

-- because column b has minmax statistics,
-- the system can quickly locate the data blocks that might contain the value, speeding up the query.
SELECT * FROM p2 WHERE b = 4;

-- Similarly, because of the minmax information on column b, the system can quickly determine that no data blocks can meet this condition
-- (if all generated values are positive), possibly returning no data immediately and avoiding unnecessary data scans.
SELECT * FROM p2 WHERE b < 0;

View the format of an existing table

To check whether a table is in Cloud format, you can use one of the following methods:

  • Use the psql command \d+:

    gpadmin=# \d+ t1
                                                Table "public.t1"
    Column |  Type   | Collation | Nullable | Default | Storage  | Compression | Stats target | Description
    --------+---------+-----------+----------+---------+----------+-------------+--------------+-------------
    a      | integer |           |          |         | plain    |             |              |
    b      | integer |           |          |         | plain    |             |              |
    c      | text    |           |          |         | extended |             |              |
    Access method: cloud
    
  • Query the system catalog tables pg_class and pg_am:

    SELECT relname, amname FROM pg_class, pg_am WHERE relam = pg_am.oid AND relname = 't1';
    
    relname | amname
    ---------+--------
    t1      | cloud
    (1 row)
    

Clustering support

The CLUSTER feature in SynxDB Cloud is a mechanism for physically sorting and optimizing table data. Its primary purpose is to improve query performance by reorganizing the physical storage order of data.

This process is particularly effective for tables with minmax_columns set. When the sorting columns specified in cluster_columns are also present in minmax_columns, their statistics are automatically recalculated after the data is rearranged. Because the data becomes more physically clustered, the newly generated min/max ranges are more compact than before. This allows the min/max-based “block skipping” to bypass more irrelevant data during queries, significantly improving efficiency.

Cloud tables support Z-ordering and Lexical sorting, both of which are reloptions-based clustering.

Trigger clustering

The CLUSTER feature for Cloud tables is triggered manually. Executing the CLUSTER table_name; command immediately performs a complete, resource-intensive re-sorting of the entire table. This is suitable for a one-time reorganization of existing data.

Clustering policies

The CLUSTER feature in SynxDB Cloud is a mechanism for physically sorting and optimizing table data, with the goal of improving query performance by reorganizing the physical storage order.

  • reloptions-based Cluster: Cloud tables support CLUSTER operations based on cluster_columns specified in reloptions, offering two sorting methods:

    • Z-order Cluster: Encodes the values of multiple columns into a single byte array and then sorts this array.

      • Use Cases: Ideal for scenarios with multi-column query conditions (where the order of columns in the query is not fixed), significantly improving performance for multidimensional data queries.

      • Limitations: Not suitable for string columns with common prefixes. Additionally, during Z-order encoding, string types are padded if they are less than 8 bytes and truncated if they exceed 8 bytes, using only the first 8 bytes for encoding.

      • Syntax: When cluster_type is not specified, the Cloud format defaults to Z-order sorting as the default cluster policy:

        CREATE TABLE t2(c1 int, c2 float, c3 text) WITH (cluster_columns='c1,c2');
        INSERT INTO t2 SELECT i, i, i::text FROM generate_series(1, 100000) i;
        CLUSTER t2;
        DROP TABLE t2;
        

        In this example, the data in table t2 is sorted using Z-order based on columns c1 and c2. This sorting method significantly enhances performance for multidimensional query scenarios.

    • Lexical Cluster: Sorts the data sequentially based on the order of columns specified in cluster_columns.

      • Use Cases: Primarily used for sorting and query optimization on string columns with common prefixes. Because the data is stored in a fully physically ordered manner, this method can also greatly improve the performance of aggregation and sorting queries like GROUP BY and ORDER BY.

      • Cost Consideration: While the performance benefits are significant, lexical sorting has a higher computational cost and can become a bottleneck with very large datasets.

      • Syntax:

        CREATE TABLE t2(c1 int, c2 float, c3 text) WITH (cluster_columns='c1,c2', cluster_type='lexical');
        INSERT INTO t2 SELECT i, i, i::text FROM generate_series(1, 100000) i;
        CLUSTER t2;
        DROP TABLE t2;
        

        In this example, the data in table t2 is sorted using the Lexical method based on columns c1 and c2. This sorting method is suitable for columns with common prefixes.

Clustering considerations

To ensure normal business operations, the background tasks for CLUSTER follow these concurrency rules:

  • CLUSTER operations can be performed concurrently with INSERT operations.

  • CLUSTER operations are executed serially with UPDATE and DELETE operations and have a lower priority to avoid blocking data updates.

When using ALTER TABLE to modify a table that already has cluster_columns set, note the following rules:

  • Modify sorting columns: If the modified cluster_columns are not a sequential subset of the original sorting columns, the current physical sort order will be invalidated. The system will schedule the next background CLUSTER task based on the new column settings.

  • Delete sorting columns: If the deleted column is the last one in cluster_columns, the current physical sort order remains valid. However, if all sorting columns or an intermediate column is deleted, the physical sort order will be invalidated.

To maintain the sorting effect, the system automatically maintains the data. For example, when more than 50% of a data block (SegManifest) is modified due to UPDATE or DELETE operations, the system might trigger an automatic re-clustering of that data block.

Bloom filter support

SynxDB Cloud’s Cloud tables support bloom filters, allowing users to generate and maintain bloom filter information at the column level. This feature helps to quickly filter data blocks and improve query performance, especially when using IN conditions with multiple values, significantly reducing unnecessary data scans.

Option description

You can specify the columns for which you want to record bloom filter information by setting the bloomfilter_columns option. For example:

CREATE TABLE p1 (
    a int,
    b int,
    c text
) WITH (bloomfilter_columns='b,c,a');

In this example, bloom filter information will be generated for columns b, c, and a of table p1.

The size of the bloom filter is controlled by two GUCs (Grand Unified Configuration variables):

  • pax_max_tuples_per_file: Controls the maximum number of tuples stored in each data file.

  • pax_bloom_filter_work_memory_bytes: Controls the maximum memory usage allowed for the bloom filter.

Example:

-- Set the maximum number of tuples per file
SET pax_max_tuples_per_file = 131073;

-- Set the maximum working memory for the bloom filter
SET pax_bloom_filter_work_memory_bytes = 1048576;  -- 1MB

-- Create the table and specify the bloom filter option for the columns
CREATE TABLE p1 (
    a int,
    b int,
    c text
) WITH (bloomfilter_columns='b,c,a');

The size of the generated bloom filter is calculated as:

ceil(min(pax_max_tuples_per_file * 2, pax_bloom_filter_work_memory_bytes))

Where ceil is the ceiling function, ensuring that the generated bloom filter size is always a power of 2.

Expression support examples

The Cloud format supports basic conditional expressions. For example:

-- Create a test table
CREATE TABLE a(v1 int, v2 int, v3 int) WITH(minmax_columns='v1, v2, v3');

-- Basic conditions
SELECT * FROM a WHERE v1 <= 3;                                    -- Full condition support. Root is OpExpr.
SELECT * FROM a WHERE v1 IS NOT NULL;                             -- Full condition support. Root is NullTest.

-- Multiple conditions
SELECT * FROM a WHERE v1 <= 3 AND v2 >= 3;                        -- Full condition support. Tree structure has only one level.
SELECT * FROM a WHERE v1 <= 3 AND v2 >= 3 AND v3 >= 3;            -- Full condition support.
SELECT * FROM a WHERE v1 <= 3 AND v2 >= 3 AND v3 IS NOT NULL;     -- Full condition support.

The Cloud format supports nested expression structures and certain operators. For example:

-- Create a test table
CREATE TABLE a(v1 int, v2 int, v3 int) WITH(minmax_columns='v1, v2, v3');

-- Nested expressions
SELECT * FROM a WHERE (v1 <= 3 OR v2 > 3) AND v3 >= 10;         -- Full support for all conditions.
SELECT * FROM a WHERE (v1 <= 3 AND v2 > 3) OR v3 >= 10;         -- Full support for nested expressions.

-- Operators
SELECT * FROM a WHERE v1 + v2 <= 3;
SELECT * FROM a WHERE v1 + 10 <= 3;

The Cloud format can process complete expression trees, including nested AND/OR conditions. This means that all the above queries can be optimized, and SynxDB Cloud will use all available filter conditions to improve query efficiency.

Supported expression types

Sparse filtering in the Cloud format supports the following expression types:

  • Arithmetic Operators (OpExpr)

    • Supports addition (+), subtraction (-), and multiplication (*)

    • Division is not supported (due to difficulty in estimating ranges with negative numbers)

    • Examples: a + 1, 1 - a, a * b

  • Comparison Operators (OpExpr)

    • Supports <, <=, =, >=, >

    • Examples: a < 1, a + 1 < 10, a = b

  • Logical Operators (BoolExpr)

    • Supports AND, OR, NOT

    • Example: a < 10 AND b > 10

  • NULL Value Tests (NullTest)

    • Supports IS NULL, IS NOT NULL

    • Examples: a IS NULL, a IS NOT NULL

  • Type Casting (FuncExpr)

    • Only supports basic type casting

    • Example: a::float8 < 1.1

    • Custom functions are not supported, e.g., func(a) < 10

  • IN Operator (ScalarArrayOpExpr)

    • Supports IN expressions

    • Example: a IN (1, 2, 3)

Partial condition support

When a query contains unsupported expressions (like custom functions), the Cloud format employs a partial condition support strategy:

  • It identifies and extracts the supported conditions.

  • It uses the supported conditions for sparse filtering.

For example:

-- Create a custom function v2
CREATE OR REPLACE FUNCTION func(v2 double precision)
RETURNS double precision AS $$
BEGIN
  RETURN v2 * 2;
END;
$$ LANGUAGE plpgsql;

-- Use the custom function in a query
SELECT * FROM a WHERE v1 < 3 AND func(v2) < 10;  -- Cloud will use v1 < 3 for filtering

In the example above, although func(v2) < 10 cannot be used for sparse filtering, SynxDB Cloud still uses v1 < 3 to optimize query performance. This approach ensures that partial performance optimization is achieved even in complex queries.

Note

  • The effectiveness of sparse filtering depends on the query conditions.

  • It is recommended to enable statistics on columns that are frequently used as filter conditions.

  • Certain conditions (like custom functions) will be ignored.

Limitations of Cloud tables

  • The Cloud format does not support Write-Ahead Logging (WAL), so there is no data backup between the primary and mirror servers.

SQL options for the Cloud format

The Cloud format supports SQL options to control its behavior. You can use these options in the WITH() clause, for example, WITH(minmax_columns='b,c', storage_format=porc).

Name

Type

Valid values

Description

storage_format

string

  • porc

  • porc_vec

Controls the internal storage format. porc (default) stores data in a normal format and skips null values. porc_vec stores data in a vector format, always saving nulls for fixed-length fields.

compresstype

string

  • none

  • rle

  • zstd

  • zlib

  • dict

Specifies the compression method for column values. Only one can be selected. The default is none.

  • none: No compression is applied.

  • rle: Uses run-length encoding for consecutive duplicate data.

  • zstd: A fast algorithm with a high compression ratio.

  • zlib: A general-purpose algorithm suitable for most data.

  • dict: Uses dictionary encoding for numerous duplicate strings. (Experimental, not for production use).

compresslevel

int

An integer from 0 to 19

Specifies the compression level (default is 0). A lower value is faster, while a higher value provides a better compression ratio. It is only effective when used with compresstype.

minmax_columns

string

A comma-separated list of valid column names

Records min/max statistics for the specified columns to accelerate queries. Statistics are no longer recorded for a column if it is renamed. Changes made via ALTER TABLE only apply to new data files, not existing ones.

cluster_columns

string

A comma-separated list of valid column names

Hints to cloud to store data in a clustered manner. When using the CLUSTER command, data is sorted by these columns. The sorting method is controlled by the cluster_type parameter.

bloomfilter_columns

string

A comma-separated list of valid column names

Computes a bloom filter for the data in the specified columns to be used for data filtering.

parallel_workers

int

[0,64]

A PostgreSQL option that specifies the number of parallel workers for a parallel scan.

cluster_type

string

  • lexical

  • zorder

Specifies the clustering method when sorting is based on columns defined in reloptions. Valid values are lexical and zorder. lexical sorts data based on the values and sequence of the specified columns. zorder encodes the values of multiple columns into a single byte array and then sorts the byte array.

These option values only affect newly inserted and updated data and do not change existing data.

Cloud storage configuration parameters (GUC)

In addition to the SQL options above, you can tune Cloud storage behavior through a set of cloud.* GUC parameters (such as cloud.pax_enable_sparse_filter, cloud.pax_bloom_filter_work_memory_bytes, and cloud.pax_max_tuples_per_group). You can set these parameters at the session level using the SET command. For the full list and detailed descriptions, see Configuration Parameters.

Aggregate query optimization

The Cloud format provides advanced aggregate query optimization capabilities that can significantly improve the performance of aggregate operations like MIN, MAX, COUNT, and SUM. This optimization leverages the summary statistics collected during data writes to avoid full table scans for aggregate queries.

Write policies

The Cloud format supports two write policies that control how data modifications are handled:

  • COW (Copy on Write): When data is modified, all data is rewritten to ensure summary statistics remain accurate. This policy provides the most accurate aggregate results but may have higher write overhead for small modifications.

  • MOR (Merge on Read): The default policy that handles data modifications more efficiently but may not maintain perfectly accurate summary statistics for aggregate optimization.

To create a table with a specific write policy:

-- Create a table with COW write policy for optimal aggregate performance
CREATE TABLE sales_data (
    id int,
    amount decimal(10,2),
    date date
) WITH (
    write_policy='COW'
);

Aggregate query optimization methods

When using Cloud tables with appropriate write policies, the following aggregate queries can be optimized:

Simple aggregate queries without WHERE conditions:

-- These queries can be optimized to read from summary statistics
SELECT MAX(amount) FROM sales_data;
SELECT MIN(amount) FROM sales_data;

Aggregate queries with WHERE conditions:

-- Queries with filter conditions use min-max filtering for optimization
SELECT MAX(amount) FROM sales_data WHERE date >= '2024-01-01';
SELECT MIN(amount) FROM sales_data WHERE amount > 1000;

The optimization works by:

  1. Using min-max statistics to quickly identify data files that might contain relevant data

  2. For files that completely satisfy the WHERE conditions, reading aggregate results directly from summary statistics

  3. For files that partially satisfy conditions, performing selective scans only on relevant data

Note

  • Aggregate optimization is most effective when using the COW write policy

  • The optimization currently supports MIN, MAX, COUNT, SUM, and AVG aggregate functions

  • The effectiveness depends on the accuracy of summary statistics maintained by the write policy

Best practices

  • Using Partitioning options:

    • It is recommended to use partitioning options when data needs to be imported on a specific integer column and meets the following conditions:

      • The data is evenly distributed over this column, with a wide range and no extreme concentrations.

      • The column is frequently used as a query filter condition or as a join key.

    • Note that the partition key for the Cloud format is only effective for a single batch data import; it cannot be adjusted between multiple writes. The partition key setting only takes effect for future inserted or updated data, so after changing the partition key, newly imported data will be processed according to the new key.

  • Using minmax statistics:

    • For columns with a wide data range that are frequently used in query filters, setting minmax values can significantly speed up the query process.

    • By using minmax statistics, if a column in a data file does not match the minmax value or a null value test, the entire file can be quickly skipped, avoiding unnecessary data scans.

    • Important note: The effectiveness of minmax depends on how the data is inserted. If data in a Cloud table is inserted in batches (for example, using batch insert or copy), and the range of each batch is contiguous, minmax will be very effective. However, if data is inserted randomly, the effectiveness of minmax filtering might be reduced.

  • The Cloud format is well-integrated with the vectorized execution engine. You can control whether to generate a vectorized query plan by setting parameters in the query plan, which helps improve query performance.

    The vectorized plan leverages the columnar storage format of Cloud tables, effectively utilizes memory, and accelerates query execution. By processing data in batches, Cloud tables can better support high-throughput queries.

    For example:

    SET vector.enable_vectorization = on;  -- Enable vectorized query plans.
    SET vector.max_batch_size = 16384;  -- Set the maximum number of rows per batch.