v0.2.0 Release Notes

Release date: October, 2025

Release version: v0.2.0

SynxDB Cloud v0.2.0 is an enterprise data warehouse with decoupled storage and compute, based on the Apache Cloudberry™ (Incubating) 2.0 kernel and deployed via containers.

This release introduces several new features and improvements across the platform:

  • Resource management: Integrates with the SynxML AI platform (experimental), enables warehouse creation via SQL for better automation, and allows suspending and resuming accounts from the console for cost control.

  • Configuration and monitoring: Centralizes administration by adding console-based management for regions, clusters, data sources (s3.conf), and database parameters (GUCs). It also lays the foundation for future observability by collecting service metrics.

  • Query processing: Enhances query performance with parallel execution for window functions and enables runtime filters by default for faster joins. It also introduces in-database AI capabilities (experimental), allowing machine learning tasks to be executed directly via SQL.

  • Storage enhancements: Improves data lifecycle management with support for Dynamic Tables and reduces storage footprint through new optimizations. Security is also strengthened with IAM role-based access for cloud storage.

  • Lakehouse integration: Provides advanced parameters for fine-tuning connections to HDFS, improving performance and reliability for data lake queries.

  • Metadata management: Improves scalability and availability with a redesigned metadata service that supports multiple coordinators, multi-node deployment, and persistence on FoundationDB.

New features

Resource management

  • Supports integration with SynxML clusters (experimental)

    You can now associate your database with a SynxML cluster, our enterprise-grade AI platform. This deep integration creates a unified environment for both data management and machine learning, breaking down silos between data and AI teams. Data scientists gain seamless access to data for model development, while data analysts can leverage powerful AI capabilities directly within their workflows.

  • Supports creating warehouses with SQL

    You can now create compute warehouses directly using the CREATE WAREHOUSE SQL command. This feature provides a programmatic way to provision resources, complementing the existing management console interface. It enables automation and integration with DevOps workflows, allowing you to manage your compute infrastructure as code for greater efficiency and reproducibility. See Create a warehouse.

  • Supports suspending and resuming accounts from the console

    Administrators can now suspend and resume accounts directly from the O&M Platform. Suspending an account temporarily deactivates all its associated compute resources, providing an effective way to control costs without deleting the account or its data. This feature is particularly useful for managing non-production environments or for temporarily revoking access for security purposes. See Suspend and resume an account.

Configuration and monitoring

  • Enhances system observability with service metrics collection

    This release supports backend support for collecting key performance and health metrics from all core services, including UnionStore, catalog, warehouses, and coordinators. This is achieved through integration with Prometheus, establishing a foundation for comprehensive system monitoring. While a monitoring dashboard is not yet available in the console, this data collection is a crucial first step for future observability features, enabling easier troubleshooting and proactive system management.

  • Supports region and cluster management in the console

    The O&M Platform now provides a centralized interface for managing foundational infrastructure resources, including regions and clusters. Administrators can easily configure cloud provider details, associate storage locations, and manage the Kubernetes clusters that power the data platform. This simplifies the initial setup and ongoing administration of your deployment from a single control plane. See Manage regions and clusters.

  • Simplifies data source configuration in the console

    You can now configure connections to external S3-compatible object storage directly through the management console. This feature provides a user-friendly interface for managing s3.conf file, eliminating the need for manual, command-line edits on cluster nodes. This centralized approach simplifies data lake integration, reduces configuration errors, and accelerates the process of making external data sources available for querying. See Configure an Iceberg OSS connection.

  • Centralizes database configuration management

    The management console now includes a powerful interface for managing database configuration parameters (GUCs). This feature eliminates the need for manual file editing and provides a user-friendly way to view, modify, and validate settings. Administrators can apply configurations at different scopes—account, coordinator, or warehouse—for granular control, simplifying performance tuning and ensuring consistent system behavior. See Manage GUC configurations.

  • Supports LDAP authentication for database and console users

    This release adds support for LDAP-based authentication for both database users and O&M Platform users. This allows for centralized user management and integrates with existing enterprise directory services, enhancing security and simplifying administration. A new UI for LDAP configuration is also provided in the console. See Configure LDAP authentication.

  • Supports skipping Kubernetes API server certificate verification

    Supports bypassing the certificate verification of the Kubernetes API server. This is particularly useful in development or testing environments that use self-signed certificates, simplifying deployment and debugging.

  • Enhances troubleshooting and debugging capabilities

    This release introduces enhanced support for diagnosing and debugging issues. New tools and improved logging capabilities help identify root causes of failures more efficiently, reducing downtime and improving system reliability for complex customer scenarios.

Query processing

  • Supports in-database AI with SQL (experimental)

    A powerful new extension allows you to invoke a wide range of machine learning and AI functions directly through SQL. You can perform tasks like model training and prediction, interact with large language models (LLMs), generate text embeddings, and conduct semantic searches without moving data out of the database. This empowers data analysts and developers to build AI-driven applications and derive insights more efficiently using the language of data they already know. See SynxML SQL.

  • Supports parallel execution for window functions

    This release supports parallel execution for window functions, an important enhancement over standard PostgreSQL. In our cloud-native architecture, window function computations can now be processed in parallel across multiple nodes. This greatly speeds up analytical queries that rely heavily on window functions, such as ranking and moving averages, and can reduce query execution time by over 50% for certain complex queries. See Parallel execution for window function queries.

  • Enables Runtime Filter Pushdown by default for faster joins

    The Runtime Filter Pushdown feature (gp_enable_runtime_filter_pushdown) is now enabled by default. This optimization can greatly improve join query performance, particularly for partitioned tables. When this feature is enabled, the executor builds a bloom filter from a hash join’s inner table and pushes it down to the outer table’s scan node. This technique filters out tuples that do not meet join conditions early during the data scanning phase, thereby reducing data movement and subsequent computing overhead. See Runtime filter pushdown.

Storage enhancements

  • Supports dynamic tables

    Supports dynamic tables, a new type of database object similar to materialized views that can automatically refresh data from various sources based on a schedule. This feature accelerates queries, especially in lakehouse setups, and automates data pipelines, reducing the need for manual data updates. See Dynamic tables

  • Incorporates storage optimization

    This release supports significant storage optimizations to reduce disk space usage. Support for LZ4 compression is now available for table columns, offering another option that balances high compression speed with a good compression ratio. In addition, the storage for variable-length column data is optimized by using delta encoding for offsets, which can cut disk space usage by more than half for certain datasets, helping to lower storage costs.

  • Adjusts the default gpfdist compression level

    The default compression level of the gpfdist tool has been adjusted from 1 to 3. The new default value aims to achieve a better balance between CPU overhead and network traffic, thereby improving the performance of most network-intensive ETL tasks. See Compression settings.

  • Enhances security for cloud storage access with IAM role support

    You can now access cloud storage such as AWS S3 by assuming an IAM role instead of using static access keys. By specifying a roleArn in the user mapping, the system dynamically acquires temporary credentials to access data, removing the need to store long-lived access keys in the database. This greatly enhances security by minimizing the risk of credential leakage and simplifies credential management. See Access cloud storage with IAM role.

Lakehouse integration

  • Enhances advanced HDFS connection tuning

    This release supports new GUC parameters for fine-tuning connections to HDFS data sources. These settings provide advanced control over behaviors like load balancing across multiple HDFS routers and other connection-level optimizations. This allows administrators to improve the performance and reliability of queries in data lake environments. See HDFS and OSS-related configuration parameters.

Metadata managements

  • Supports multiple coordinators

    Allows multiple coordinator nodes in a compute cluster to concurrently read from and write to the metadata service.

  • Supports multi-node metadata service deployment

    Enhances service capability by allowing the metadata service to be deployed across multiple nodes. This version supports a one-write, multiple-reads model.

  • Supports FoundationDB-backed metadata persistence

    Persists metadata to FoundationDB, leveraging its high availability and scalability to support ultra-large-scale compute clusters.

Product change information

  • The default value of the gp_enable_runtime_filter_pushdown parameter has been changed from off to on.

  • The environment script greenplum_path.sh has been officially renamed to cloudberry-env.sh to be consistent with Apache Cloudberry™ (Incubating).

Bug fixes and other improvements

  • Ensure that the coordinator and catalog are deleted when an account is deleted.

  • Fix an issue where the horizontal scaling of a coordinator cannot be modified when using FoundationDB (FDB) as metadata storage.

  • Fix an issue where the alert rule expressions were identical for FoundationDB (FDB) and UnionStore when they had the same name in the same namespace.

  • Fix an issue that caused errors when refreshing secondary UI pages.

  • Fix a bug that incorrectly set the number of coordinator replicas to 0 when updating catalog specifications.

  • Add a check to prevent the deletion of a profile if it is referenced by a warehouse.

  • Add a check to prevent the deletion of a version if it is referenced by a warehouse or a FoundationDB (FDB) sidecar.

  • Fix an issue where selecting a warehouse in the SQL Editor fails.

  • Set the pod timezone automatically based on the Kubernetes cluster configuration.

  • Fix an issue that caused login page errors when accessed under the same domain.

  • Fix an issue that caused errors when opening the SQL Editor in multiple browser tabs.

  • Fix a bug where the authority method was missing for console principals.

  • Correct the account validation logic used when updating a warehouse.

  • Resolve issues with internationalization (i18n) support for the integration module.

  • Fix a bug in the LDAP health check mechanism.

  • Address an issue that prevented access to the H2 console.

  • Ensure resource names are now correctly validated against RFC 1123 standards.

  • Fix a bug that incorrectly allowed the coordinator count to be set when the metadata type is FDB.

  • Address issues within the UpdateWarehouse gRPC interface.

  • Fix a bug where a mandatory value was missing in requests to update a catalog.

  • Resolve performance and display issues in the organization tree.

  • Add LDAP authentication support for operations (Ops) users.

  • Add LDAP authentication support for database (DB) users.

  • Add a new frontend UI for LDAP configuration

  • Add support for configuring Iceberg Object Storage (OSS).

  • Enable Prometheus metrics scraping for DBaaS and added an activation condition for its ServiceMonitor.

  • Optimize various local scripts for better performance.

  • Fix failures encountered during the cherry-pick process.

  • Update the Gopher version to v4.0.20.

  • Optimize the execution of Datalake list operations.

  • Add a GUC (Grand Unified Configuration) option for HDFS in Gopher.

  • Update the cherry-pick-cloudberry.history file.

  • Fix an issue to ensure relation_acquire_sample_rows is used when implemented by the table access method.

  • Fix compilation issues when using the --disable-orca flag.

  • Fix the partition_append test case.

  • Implement the pg_get_expr() function for subpartition templates.

  • Fix an issue to preserve relid and cdbpolicy within the make_grouping_rel function.

  • Add support for LZ4 compression for table columns in Cloud.

  • Fix a hang issue caused by a single node process.

  • Fix an issue in dumptuples to ensure a quick exit after query execution is complete.

  • Set ColumnEncoding_Kind_DIRECT_DELTA as the default encoding for the offset stream.

  • Enable Cloud TOAST by default.

  • Remove the size check performed when altering a warehouse’s size.

  • Improve the performance and reliability of warehouse operations.

  • Remove the legacy script for managing dependencies.

  • Extend the deadline for the SQL forwarder check to prevent timeouts.

  • Add the ‘COMPRESSTYPE=lz4’ table option for cloud environments.

  • Change the CI rules for ICW (Integration Continuous Workflow) to trigger on_success.

  • Update the Gopher version to v4.0.22.

  • Set the default value for gopher_local_capacity_mb to 1024000.

  • Add support for roleArn in CREATE USER MAPPING for AWS environments.

  • Hard-code the base version number in the configuration.

  • Revert the version number to 2.0.0 in the configuration to align with the community edition.

  • Ensure grammar compatibility for the CREATE TABLESPACE command on remote storage.

  • Add support for CREATE, ALTER, and DROP MLCLUSTER commands.

  • Remove the configuration check for libseccomp and fix related autoheader issues.

  • Update the cherry-pick-lightning.history file.

  • Fix an issue where the max_worker_processes GUC parameter was not applied correctly.

  • Fix an issue to correctly report an error during CREATE STORAGE USER MAPPING when permissions are insufficient.

  • Modify an internal interface for improved functionality.

  • Fix an issue with the ExtendProtocol implementation in the proxy.

  • Rewrite a Greenplum system view to fix an underlying issue.

  • Adapt a system view to ensure compatibility after a cherry-pick.

  • Add Greenplum summary system views for better monitoring.

  • Add several gp_stat_progress_*_summary system views.

  • Fix the names of the pg_stat_all_tables and pg_stat_all_indexes views.

  • Implement a fallback to the standard execution path for partitioned tables in cloud environments.

  • Optimize the status maintenance process for materialized views on partitioned tables.

  • Optimize materialized view (MV) invalidation overhead by implementing reference counting.

  • Update the PACKAGE_VERSION in the configuration file.

  • Add support for signing JSON Web Tokens (JWT) within the database.

  • Add new tests for the proxy component.

  • Fix the regression test for the FoundationDB (FDB) catalog.

  • Adapt existing code to a new list interface.

  • Update the Gopher version to v4.0.21.

  • Add a GUC (Grand Unified Configuration) option for HDFS in Gopher.

  • Fix an issue that caused the proxy to crash in cloud environments.

  • Fix an issue with autovacuum fetching data files on the Query Dispatcher (QD) node.

  • Update the default value of gopher_local_capacity_mb to 1000GB.

  • Move the CREATE TPSERVER statement to its correct position in the grammar.

  • Recover a previous commit in the Arrow submodule.

  • Fix several test cases that were failing in cloud environments.

  • Fix newly introduced merge conflicts.

  • Add test cases for the Dynamic Tables feature in cloud environments.

  • Implement the Dynamic Tables feature.

  • Enable the query planner to use Materialized Views to answer queries on external tables.

  • Fix an issue to ensure manifest tuples are vacuumed in the order of the hot chain.

  • Fix the cluster management tool for configurations that do not use unionstore.

  • Add support for building with the unionstore component.

  • Fix the warehouse shrink/expand check by correctly handling invalid state messages.

  • Fix a compilation issue caused by an uninitialized variable.

  • Set the default value of gp_enable_runtime_filter_pushdown to true.

  • Update the cherry-pick history file.

  • Fix an incorrect column number in datalake_fdw.c.

  • Adapt the capacity of Cloud columns using a GUC parameter.

  • Fix a dangling pointer issue when mixing data from different ORCA caches.

  • Fix an issue where reading a text file in Datalake does not call closefile.

  • Update the cherry-pick history file.

  • Fix a double-free issue in alterResgroupCallback during the I/O limit cleanup process.

  • Fix a misuse of move semantics and unhide an Equals() method overload.

  • Configure ORCA to reject functions with prosupport during DXL translation.

  • Fix an issue by initializing FuncExpr.is_tablefunc to false.

  • Add the TPC-DS Query 04 test case to verify a bug in CTE sharing.

  • Fix a crash in EXPLAIN when showing append info for ShareInputScan nodes.

  • Align scan-related terminology to consistently use “Shared Scan” and “ShareInputScan”.

  • Enable support for hot-standby Disaster Recovery (DR) clusters.

  • Fix the handling of interconnect_address and the parallel worker check in a single-node setup.

  • Fix an issue by using the correct offset to access members of Serialized Snapshot Data.

  • Add support for directory tables in CDC (Change Data Capture).

  • Optimize the CI job cache for faster builds.

  • Add support for storing CDC replication slots in local storage.

  • Fix an issue where CDC stops working after dropping a tablespace.

  • Update the cherry-pick history file.

  • Adapt the codebase for a recent cherry-pick.

  • Adjust the planner_hook_wrapper interface.

  • Allow different strategies in ORCA to control the redistribution key below an aggregate node.

  • Add support in ORCA to push down partial aggregates below a join.

  • Add support in ORCA for creating plans in single-node mode.

  • Introduce hash window aggregation in ORCA when using the vectorized executor.

  • Correct the behavior of parallel window functions within a CASE WHEN statement.

  • Fix the row estimation logic for parallel subquery paths.

  • Fix an invalid relcache leak warning logged during autovacuum.

  • Prevent excessive sampling on Query Executor (QE) nodes by restricting ComputeExtStatisticsRows to the Query Dispatcher (QD) node.

  • Revert the commit that banned enums as distribution and partition keys.

  • Add an option to avoid generating additional EquivalenceClasses for RelabelType in cdb_pull_up_eclass.

  • Fix a “dispath” typo in the direct dispatch code.

  • Fix the getversion script in a submodule.

  • Fix a memory leak related to bitmaps in Cloud.

  • Enable the auto-cancellation of redundant CI pipelines.

  • Fix a bug where the configure command would hang if Cloud was not enabled.

  • Modify the plan diff for the vectorized result set.

  • Add macro documentation comments to AC_DEFINE for FTS and OpenSSL options.

  • Fix the fast analyze feature for Cloud tables and simplify the acquisition function selection.

  • Implement a fast ANALYZE for append-optimized tables.

  • Replace pip3 download with curl for fetching Python dependencies.

  • Fix a CI pipeline issue specific to Cloudberry.

  • Fix pg_dump for tables with hash partitioning on enum columns.

  • Fix an issue in Datalake where blocks were not being assigned correctly.

  • Update the cherry-pick-cloudberry.history file.

  • Fix failures in the isolation2 test suite.

  • Add a FIXME comment to note potential failures in other files.

  • Disable the offset number assertion in ginPostingListDecodeAllSegments().

  • Implement parallel processing for window functions.

  • Replace nested foreach loops with a backtrace in ORCA for optimization.

  • Fix several flaky test cases.

  • Fix issues in system_views_gp.in and the query_conflict test.

  • Generate Greenplum (gp_) views for corresponding PostgreSQL (pg_) system views.

  • Fix the hot_standby isolation2 and regression tests.

  • Ensure query conflict handling on a standby node works as expected.

  • Enable the upstream hot standby test suite.

  • Add support for repeatable-read distributed transaction (DTX) isolation on hot standby nodes.

  • Add support for read-committed distributed transaction (DTX) isolation on hot standby nodes.

  • Add the XLOG_LATESTCOMPLETED_GXID flag.

  • Refactor the restore point pausing logic for continuous archive recovery.

  • Re-revert the change to include the distributed transaction ID (XID) in transaction commit WAL records.

  • Enable query dispatching on hot standby nodes.

  • Disable the iceberg_s3 and gphdfs_read_partition_table regression tests in Datalake.

  • Prevent HiveAutoSync from restarting when it receives a SIGHUP signal.

  • Update the cherry-pick-cloudberry.history file.

  • Revert the change that used pip3 download for Python packages.

  • Rename greenplum_path.sh to cloudberry-env.sh.

  • Fix compilation issues in Cloud when vectorization is enabled.

  • Remove unused variables in Cloud.

  • Remove a macro guard from a struct definition in Cloud.

  • Fix the gprecoverseg utility by removing incorrect backported code.

  • Fix a dead link in the README file.

  • Set the default value of gp_appendonly_insert_files to 0.

  • Add a compatibility check for libevent 2.0+ to the configure script for gpfdist.

  • Use event_base with libevent 2.0+ to avoid the thread-unsafe event_init function in gpfdist.

  • Fix a status reporting issue in the CI’s RAT (Release Audit Tool) check.

  • Add an Apache RAT (Release Audit Tool) audit workflow to the CI pipeline for license checks.

  • Add and clean up license headers and files across the codebase.

  • Print a stack trace when a writer gang process is lost.

  • Use pip3 download for fetching Python packages during the release process.

  • Add support for AO/AOCS tables in pg_dump.

  • Increase the socket buffer size in gpfdist when compression is enabled.

  • Set the default compression level in gpfdist to 3.

  • Use const references in Cloud to avoid unnecessary shared_ptr constructor/destructor calls.

  • Wrap the bms_* (BitmapSet) functions in C++ within the ORCA optimizer.

  • Remove support for QuickLZ compression.

  • Inline several basic wrapper functions in ORCA for performance.

  • Replace the cpp-stub submodule with local sources in Cloud.

  • Add the cloudberry-env.sh file and rename all instances of “greenplum_path”.

  • Fix the database version check within the pg_upgrade process.

  • Replace std::shared_ptr<File> with std::unique_ptr<File> for better ownership semantics.

  • Configure Datalake to read normal table blocks sequentially.

  • Optimize the writing process for Parquet files in Datalake.

  • Disable a flaky test in the Perfmon suite.

  • Fix an invalid cast from a GArrow scalar type to GArrowNullScalar.

  • Suppress “maybe-uninitialized” warnings when compiling Cloud.

  • Add support for the Arrow plan merger with SubqueryScan nodes.

  • Introduce a FALLBACK_LOG macro to replace direct elog(DEBUG2, "Fallback to ...") calls.

  • Fix an un-reference error in the vectorization engine.

  • Print the Arrow plan in addition to the standard plan when debug_print_plan is enabled.

  • Fix a port bug in gpperfmon by using sizeof instead of a hardcoded value.

  • Format the code style of the random_segment file in Datalake.

  • Fix an issue where reading an archived text file in Datalake does not call closefile.

  • Fix a coredump in datalakeExecSegment in a coordinator-only configuration.

  • Fix an issue with text encodings that are valid in the database but not supported by ICU.

  • Update the Makefile for the Datalake component.

  • Resolve library conflicts between libjansson and json-c in Datalake.

  • Rename a Datalake function to avoid naming conflicts.

  • Fix a bug in the parser for zipped text format in Datalake.

  • Fix an issue to ensure manifest tuples are vacuumed in the order of the hot chain.

  • Add the GUC (Grand Unified Configuration) parameter cloud.pax_max_tuples_per_group.

  • Change the file path for a directory table’s directory to use the relation ID (relid) instead of the relfilenode.