v0.2.0 Release Notes
Release date: October, 2025
Release version: v0.2.0
SynxDB Cloud v0.2.0 is an enterprise data warehouse with decoupled storage and compute, based on the Apache Cloudberry™ (Incubating) 2.0 kernel and deployed via containers.
This release introduces several new features and improvements across the platform:
Resource management: Integrates with the SynxML AI platform (experimental), enables warehouse creation via SQL for better automation, and allows suspending and resuming accounts from the console for cost control.
Configuration and monitoring: Centralizes administration by adding console-based management for regions, clusters, data sources (
s3.conf), and database parameters (GUCs). It also lays the foundation for future observability by collecting service metrics.Query processing: Enhances query performance with parallel execution for window functions and enables runtime filters by default for faster joins. It also introduces in-database AI capabilities (experimental), allowing machine learning tasks to be executed directly via SQL.
Storage enhancements: Improves data lifecycle management with support for Dynamic Tables and reduces storage footprint through new optimizations. Security is also strengthened with IAM role-based access for cloud storage.
Lakehouse integration: Provides advanced parameters for fine-tuning connections to HDFS, improving performance and reliability for data lake queries.
Metadata management: Improves scalability and availability with a redesigned metadata service that supports multiple coordinators, multi-node deployment, and persistence on FoundationDB.
New features
Resource management
Supports integration with SynxML clusters (experimental)
You can now associate your database with a SynxML cluster, our enterprise-grade AI platform. This deep integration creates a unified environment for both data management and machine learning, breaking down silos between data and AI teams. Data scientists gain seamless access to data for model development, while data analysts can leverage powerful AI capabilities directly within their workflows.
Supports creating warehouses with SQL
You can now create compute warehouses directly using the
CREATE WAREHOUSESQL command. This feature provides a programmatic way to provision resources, complementing the existing management console interface. It enables automation and integration with DevOps workflows, allowing you to manage your compute infrastructure as code for greater efficiency and reproducibility. See Create a warehouse.Supports suspending and resuming accounts from the console
Administrators can now suspend and resume accounts directly from the O&M Platform. Suspending an account temporarily deactivates all its associated compute resources, providing an effective way to control costs without deleting the account or its data. This feature is particularly useful for managing non-production environments or for temporarily revoking access for security purposes. See Suspend and resume an account.
Configuration and monitoring
Enhances system observability with service metrics collection
This release supports backend support for collecting key performance and health metrics from all core services, including UnionStore, catalog, warehouses, and coordinators. This is achieved through integration with Prometheus, establishing a foundation for comprehensive system monitoring. While a monitoring dashboard is not yet available in the console, this data collection is a crucial first step for future observability features, enabling easier troubleshooting and proactive system management.
Supports region and cluster management in the console
The O&M Platform now provides a centralized interface for managing foundational infrastructure resources, including regions and clusters. Administrators can easily configure cloud provider details, associate storage locations, and manage the Kubernetes clusters that power the data platform. This simplifies the initial setup and ongoing administration of your deployment from a single control plane. See Manage regions and clusters.
Simplifies data source configuration in the console
You can now configure connections to external S3-compatible object storage directly through the management console. This feature provides a user-friendly interface for managing
s3.conffile, eliminating the need for manual, command-line edits on cluster nodes. This centralized approach simplifies data lake integration, reduces configuration errors, and accelerates the process of making external data sources available for querying. See Configure an Iceberg OSS connection.Centralizes database configuration management
The management console now includes a powerful interface for managing database configuration parameters (GUCs). This feature eliminates the need for manual file editing and provides a user-friendly way to view, modify, and validate settings. Administrators can apply configurations at different scopes—account, coordinator, or warehouse—for granular control, simplifying performance tuning and ensuring consistent system behavior. See Manage GUC configurations.
Supports LDAP authentication for database and console users
This release adds support for LDAP-based authentication for both database users and O&M Platform users. This allows for centralized user management and integrates with existing enterprise directory services, enhancing security and simplifying administration. A new UI for LDAP configuration is also provided in the console. See Configure LDAP authentication.
Supports skipping Kubernetes API server certificate verification
Supports bypassing the certificate verification of the Kubernetes API server. This is particularly useful in development or testing environments that use self-signed certificates, simplifying deployment and debugging.
Enhances troubleshooting and debugging capabilities
This release introduces enhanced support for diagnosing and debugging issues. New tools and improved logging capabilities help identify root causes of failures more efficiently, reducing downtime and improving system reliability for complex customer scenarios.
Query processing
Supports in-database AI with SQL (experimental)
A powerful new extension allows you to invoke a wide range of machine learning and AI functions directly through SQL. You can perform tasks like model training and prediction, interact with large language models (LLMs), generate text embeddings, and conduct semantic searches without moving data out of the database. This empowers data analysts and developers to build AI-driven applications and derive insights more efficiently using the language of data they already know. See SynxML SQL.
Supports parallel execution for window functions
This release supports parallel execution for window functions, an important enhancement over standard PostgreSQL. In our cloud-native architecture, window function computations can now be processed in parallel across multiple nodes. This greatly speeds up analytical queries that rely heavily on window functions, such as ranking and moving averages, and can reduce query execution time by over 50% for certain complex queries. See Parallel execution for window function queries.
Enables Runtime Filter Pushdown by default for faster joins
The Runtime Filter Pushdown feature (
gp_enable_runtime_filter_pushdown) is now enabled by default. This optimization can greatly improve join query performance, particularly for partitioned tables. When this feature is enabled, the executor builds a bloom filter from a hash join’s inner table and pushes it down to the outer table’s scan node. This technique filters out tuples that do not meet join conditions early during the data scanning phase, thereby reducing data movement and subsequent computing overhead. See Runtime filter pushdown.
Storage enhancements
Supports dynamic tables
Supports dynamic tables, a new type of database object similar to materialized views that can automatically refresh data from various sources based on a schedule. This feature accelerates queries, especially in lakehouse setups, and automates data pipelines, reducing the need for manual data updates. See Dynamic tables
Incorporates storage optimization
This release supports significant storage optimizations to reduce disk space usage. Support for LZ4 compression is now available for table columns, offering another option that balances high compression speed with a good compression ratio. In addition, the storage for variable-length column data is optimized by using delta encoding for offsets, which can cut disk space usage by more than half for certain datasets, helping to lower storage costs.
Adjusts the default gpfdist compression level
The default compression level of the gpfdist tool has been adjusted from 1 to 3. The new default value aims to achieve a better balance between CPU overhead and network traffic, thereby improving the performance of most network-intensive ETL tasks. See Compression settings.
Enhances security for cloud storage access with IAM role support
You can now access cloud storage such as AWS S3 by assuming an IAM role instead of using static access keys. By specifying a
roleArnin the user mapping, the system dynamically acquires temporary credentials to access data, removing the need to store long-lived access keys in the database. This greatly enhances security by minimizing the risk of credential leakage and simplifies credential management. See Access cloud storage with IAM role.
Lakehouse integration
Enhances advanced HDFS connection tuning
This release supports new GUC parameters for fine-tuning connections to HDFS data sources. These settings provide advanced control over behaviors like load balancing across multiple HDFS routers and other connection-level optimizations. This allows administrators to improve the performance and reliability of queries in data lake environments. See HDFS and OSS-related configuration parameters.
Metadata managements
Supports multiple coordinators
Allows multiple coordinator nodes in a compute cluster to concurrently read from and write to the metadata service.
Supports multi-node metadata service deployment
Enhances service capability by allowing the metadata service to be deployed across multiple nodes. This version supports a one-write, multiple-reads model.
Supports FoundationDB-backed metadata persistence
Persists metadata to FoundationDB, leveraging its high availability and scalability to support ultra-large-scale compute clusters.
Product change information
The default value of the
gp_enable_runtime_filter_pushdownparameter has been changed fromofftoon.The environment script
greenplum_path.shhas been officially renamed tocloudberry-env.shto be consistent with Apache Cloudberry™ (Incubating).
Bug fixes and other improvements
Ensure that the coordinator and catalog are deleted when an account is deleted.
Fix an issue where the horizontal scaling of a coordinator cannot be modified when using FoundationDB (FDB) as metadata storage.
Fix an issue where the alert rule expressions were identical for FoundationDB (FDB) and UnionStore when they had the same name in the same namespace.
Fix an issue that caused errors when refreshing secondary UI pages.
Fix a bug that incorrectly set the number of coordinator replicas to 0 when updating catalog specifications.
Add a check to prevent the deletion of a profile if it is referenced by a warehouse.
Add a check to prevent the deletion of a version if it is referenced by a warehouse or a FoundationDB (FDB) sidecar.
Fix an issue where selecting a warehouse in the SQL Editor fails.
Set the pod timezone automatically based on the Kubernetes cluster configuration.
Fix an issue that caused login page errors when accessed under the same domain.
Fix an issue that caused errors when opening the SQL Editor in multiple browser tabs.
Fix a bug where the authority method was missing for console principals.
Correct the account validation logic used when updating a warehouse.
Resolve issues with internationalization (i18n) support for the integration module.
Fix a bug in the LDAP health check mechanism.
Address an issue that prevented access to the H2 console.
Ensure resource names are now correctly validated against RFC 1123 standards.
Fix a bug that incorrectly allowed the coordinator count to be set when the metadata type is FDB.
Address issues within the
UpdateWarehousegRPC interface.Fix a bug where a mandatory value was missing in requests to update a catalog.
Resolve performance and display issues in the organization tree.
Add LDAP authentication support for operations (Ops) users.
Add LDAP authentication support for database (DB) users.
Add a new frontend UI for LDAP configuration
Add support for configuring Iceberg Object Storage (OSS).
Enable Prometheus metrics scraping for DBaaS and added an activation condition for its
ServiceMonitor.Optimize various local scripts for better performance.
Fix failures encountered during the cherry-pick process.
Update the Gopher version to v4.0.20.
Optimize the execution of Datalake list operations.
Add a GUC (Grand Unified Configuration) option for HDFS in Gopher.
Update the
cherry-pick-cloudberry.historyfile.Fix an issue to ensure
relation_acquire_sample_rowsis used when implemented by the table access method.Fix compilation issues when using the
--disable-orcaflag.Fix the
partition_appendtest case.Implement the
pg_get_expr()function for subpartition templates.Fix an issue to preserve
relidandcdbpolicywithin themake_grouping_relfunction.Add support for LZ4 compression for table columns in Cloud.
Fix a hang issue caused by a single node process.
Fix an issue in
dumptuplesto ensure a quick exit after query execution is complete.Set
ColumnEncoding_Kind_DIRECT_DELTAas the default encoding for the offset stream.Enable Cloud TOAST by default.
Remove the size check performed when altering a warehouse’s size.
Improve the performance and reliability of warehouse operations.
Remove the legacy script for managing dependencies.
Extend the deadline for the SQL forwarder check to prevent timeouts.
Add the ‘COMPRESSTYPE=lz4’ table option for cloud environments.
Change the CI rules for ICW (Integration Continuous Workflow) to trigger
on_success.Update the Gopher version to v4.0.22.
Set the default value for
gopher_local_capacity_mbto 1024000.Add support for
roleArninCREATE USER MAPPINGfor AWS environments.Hard-code the base version number in the configuration.
Revert the version number to 2.0.0 in the configuration to align with the community edition.
Ensure grammar compatibility for the
CREATE TABLESPACEcommand on remote storage.Add support for
CREATE,ALTER, andDROP MLCLUSTERcommands.Remove the configuration check for
libseccompand fix relatedautoheaderissues.Update the
cherry-pick-lightning.historyfile.Fix an issue where the
max_worker_processesGUC parameter was not applied correctly.Fix an issue to correctly report an error during
CREATE STORAGE USER MAPPINGwhen permissions are insufficient.Modify an internal interface for improved functionality.
Fix an issue with the
ExtendProtocolimplementation in the proxy.Rewrite a Greenplum system view to fix an underlying issue.
Adapt a system view to ensure compatibility after a cherry-pick.
Add Greenplum summary system views for better monitoring.
Add several
gp_stat_progress_*_summarysystem views.Fix the names of the
pg_stat_all_tablesandpg_stat_all_indexesviews.Implement a fallback to the standard execution path for partitioned tables in cloud environments.
Optimize the status maintenance process for materialized views on partitioned tables.
Optimize materialized view (MV) invalidation overhead by implementing reference counting.
Update the
PACKAGE_VERSIONin the configuration file.Add support for signing JSON Web Tokens (JWT) within the database.
Add new tests for the proxy component.
Fix the regression test for the FoundationDB (FDB) catalog.
Adapt existing code to a new list interface.
Update the Gopher version to v4.0.21.
Add a GUC (Grand Unified Configuration) option for HDFS in Gopher.
Fix an issue that caused the proxy to crash in cloud environments.
Fix an issue with autovacuum fetching data files on the Query Dispatcher (QD) node.
Update the default value of
gopher_local_capacity_mbto 1000GB.Move the
CREATE TPSERVERstatement to its correct position in the grammar.Recover a previous commit in the Arrow submodule.
Fix several test cases that were failing in cloud environments.
Fix newly introduced merge conflicts.
Add test cases for the Dynamic Tables feature in cloud environments.
Implement the Dynamic Tables feature.
Enable the query planner to use Materialized Views to answer queries on external tables.
Fix an issue to ensure manifest tuples are vacuumed in the order of the hot chain.
Fix the cluster management tool for configurations that do not use unionstore.
Add support for building with the unionstore component.
Fix the warehouse shrink/expand check by correctly handling invalid state messages.
Fix a compilation issue caused by an uninitialized variable.
Set the default value of
gp_enable_runtime_filter_pushdownto true.Update the cherry-pick history file.
Fix an incorrect column number in
datalake_fdw.c.Adapt the capacity of Cloud columns using a GUC parameter.
Fix a dangling pointer issue when mixing data from different ORCA caches.
Fix an issue where reading a text file in Datalake does not call
closefile.Update the cherry-pick history file.
Fix a double-free issue in
alterResgroupCallbackduring the I/O limit cleanup process.Fix a misuse of move semantics and unhide an
Equals()method overload.Configure ORCA to reject functions with
prosupportduring DXL translation.Fix an issue by initializing
FuncExpr.is_tablefuncto false.Add the TPC-DS Query 04 test case to verify a bug in CTE sharing.
Fix a crash in
EXPLAINwhen showing append info forShareInputScannodes.Align scan-related terminology to consistently use “Shared Scan” and “ShareInputScan”.
Enable support for hot-standby Disaster Recovery (DR) clusters.
Fix the handling of
interconnect_addressand the parallel worker check in a single-node setup.Fix an issue by using the correct offset to access members of Serialized Snapshot Data.
Add support for directory tables in CDC (Change Data Capture).
Optimize the CI job cache for faster builds.
Add support for storing CDC replication slots in local storage.
Fix an issue where CDC stops working after dropping a tablespace.
Update the cherry-pick history file.
Adapt the codebase for a recent cherry-pick.
Adjust the
planner_hook_wrapperinterface.Allow different strategies in ORCA to control the redistribution key below an aggregate node.
Add support in ORCA to push down partial aggregates below a join.
Add support in ORCA for creating plans in single-node mode.
Introduce hash window aggregation in ORCA when using the vectorized executor.
Correct the behavior of parallel window functions within a
CASE WHENstatement.Fix the row estimation logic for parallel subquery paths.
Fix an invalid relcache leak warning logged during autovacuum.
Prevent excessive sampling on Query Executor (QE) nodes by restricting
ComputeExtStatisticsRowsto the Query Dispatcher (QD) node.Revert the commit that banned enums as distribution and partition keys.
Add an option to avoid generating additional EquivalenceClasses for
RelabelTypeincdb_pull_up_eclass.Fix a “dispath” typo in the direct dispatch code.
Fix the
getversionscript in a submodule.Fix a memory leak related to bitmaps in Cloud.
Enable the auto-cancellation of redundant CI pipelines.
Fix a bug where the
configurecommand would hang if Cloud was not enabled.Modify the plan diff for the vectorized result set.
Add macro documentation comments to
AC_DEFINEfor FTS and OpenSSL options.Fix the fast analyze feature for Cloud tables and simplify the acquisition function selection.
Implement a fast
ANALYZEfor append-optimized tables.Replace
pip3 downloadwithcurlfor fetching Python dependencies.Fix a CI pipeline issue specific to Cloudberry.
Fix
pg_dumpfor tables with hash partitioning on enum columns.Fix an issue in Datalake where blocks were not being assigned correctly.
Update the
cherry-pick-cloudberry.historyfile.Fix failures in the isolation2 test suite.
Add a FIXME comment to note potential failures in other files.
Disable the offset number assertion in
ginPostingListDecodeAllSegments().Implement parallel processing for window functions.
Replace nested foreach loops with a backtrace in ORCA for optimization.
Fix several flaky test cases.
Fix issues in
system_views_gp.inand thequery_conflicttest.Generate Greenplum (gp_) views for corresponding PostgreSQL (pg_) system views.
Fix the
hot_standbyisolation2 and regression tests.Ensure query conflict handling on a standby node works as expected.
Enable the upstream hot standby test suite.
Add support for repeatable-read distributed transaction (DTX) isolation on hot standby nodes.
Add support for read-committed distributed transaction (DTX) isolation on hot standby nodes.
Add the
XLOG_LATESTCOMPLETED_GXIDflag.Refactor the restore point pausing logic for continuous archive recovery.
Re-revert the change to include the distributed transaction ID (XID) in transaction commit WAL records.
Enable query dispatching on hot standby nodes.
Disable the
iceberg_s3andgphdfs_read_partition_tableregression tests in Datalake.Prevent HiveAutoSync from restarting when it receives a SIGHUP signal.
Update the
cherry-pick-cloudberry.historyfile.Revert the change that used
pip3 downloadfor Python packages.Rename
greenplum_path.shtocloudberry-env.sh.Fix compilation issues in Cloud when vectorization is enabled.
Remove unused variables in Cloud.
Remove a macro guard from a struct definition in Cloud.
Fix the
gprecoversegutility by removing incorrect backported code.Fix a dead link in the README file.
Set the default value of
gp_appendonly_insert_filesto 0.Add a compatibility check for libevent 2.0+ to the configure script for gpfdist.
Use
event_basewith libevent 2.0+ to avoid the thread-unsafeevent_initfunction in gpfdist.Fix a status reporting issue in the CI’s RAT (Release Audit Tool) check.
Add an Apache RAT (Release Audit Tool) audit workflow to the CI pipeline for license checks.
Add and clean up license headers and files across the codebase.
Print a stack trace when a writer gang process is lost.
Use
pip3 downloadfor fetching Python packages during the release process.Add support for AO/AOCS tables in
pg_dump.Increase the socket buffer size in gpfdist when compression is enabled.
Set the default compression level in gpfdist to 3.
Use const references in Cloud to avoid unnecessary shared_ptr constructor/destructor calls.
Wrap the
bms_*(BitmapSet) functions in C++ within the ORCA optimizer.Remove support for QuickLZ compression.
Inline several basic wrapper functions in ORCA for performance.
Replace the
cpp-stubsubmodule with local sources in Cloud.Add the
cloudberry-env.shfile and rename all instances of “greenplum_path”.Fix the database version check within the
pg_upgradeprocess.Replace
std::shared_ptr<File>withstd::unique_ptr<File>for better ownership semantics.Configure Datalake to read normal table blocks sequentially.
Optimize the writing process for Parquet files in Datalake.
Disable a flaky test in the Perfmon suite.
Fix an invalid cast from a GArrow scalar type to
GArrowNullScalar.Suppress “maybe-uninitialized” warnings when compiling Cloud.
Add support for the Arrow plan merger with
SubqueryScannodes.Introduce a
FALLBACK_LOGmacro to replace directelog(DEBUG2, "Fallback to ...")calls.Fix an un-reference error in the vectorization engine.
Print the Arrow plan in addition to the standard plan when
debug_print_planis enabled.Fix a port bug in gpperfmon by using
sizeofinstead of a hardcoded value.Format the code style of the
random_segmentfile in Datalake.Fix an issue where reading an archived text file in Datalake does not call
closefile.Fix a coredump in
datalakeExecSegmentin a coordinator-only configuration.Fix an issue with text encodings that are valid in the database but not supported by ICU.
Update the Makefile for the Datalake component.
Resolve library conflicts between
libjanssonandjson-cin Datalake.Rename a Datalake function to avoid naming conflicts.
Fix a bug in the parser for zipped text format in Datalake.
Fix an issue to ensure manifest tuples are vacuumed in the order of the hot chain.
Add the GUC (Grand Unified Configuration) parameter
cloud.pax_max_tuples_per_group.Change the file path for a directory table’s directory to use the relation ID (relid) instead of the relfilenode.