v0.3.0 Release Notes

Release date: November, 2025

Release version: v0.3.0

Guided by a “One Data, One Platform” vision, SynxDB Cloud v0.3.0 is an enterprise-grade unified, open, and intelligent data platform that redefines the enterprise data warehouse. Built on the Apache Cloudberry™ (Incubating) 2.0 kernel and deployed via containers, it moves beyond decoupled storage and compute to unify transactional data, advanced analytics, and AI workflows on a single platform, empowering customers and partners to accelerate the transition from data-driven decisions to AI-driven intelligence.

This release introduces several new features and improvements across the platform:

Unified data engine for hybrid workloads
- Transactional and streaming ingestion: Introduces TpServer for high-concurrency OLTP and real-time streaming data ingestion, ensuring fresh, reliable data is always available for analytics and AI.
- HTAP capabilities: Seamlessly combines transactional and analytical processing (HTAP) on a single platform, eliminating data silos and enabling real-time, data-driven decisions.
Integrated AI and machine learning platform
- Generative AI (GenAI) enablement: Introduces native support for pgvector indexes, enabling high-performance vector similarity search directly within the data platform. This provides the critical foundation for building advanced Generative AI and semantic search applications.
- End-to-end ML lifecycle: Integrates comprehensive ML capabilities, including dedicated ML cluster management and in-database machine learning via intuitive SQL and Python SDKs, allowing data scientists to build, train, and deploy models faster.
Unified lakehouse engine
- Robust metadata management: Improves reliability and automation with Hive metadata auto-sync, a 2-phase commit for the FoundationDB catalog, and optimized metadata storage for cloud tables.
- Flexible storage integration: Supports undelegated object storage buckets, providing greater flexibility and control over your data lake architecture.
- Advanced indexing: Supports a comprehensive set of PostgreSQL-native indexes (B-tree, Hash, GiST, SP-GiST, GIN, BRIN) to accelerate query performance across diverse workloads.
- Rich plugin ecosystem: Supports extending functionality with a rich ecosystem of PostgreSQL plugins, including PostGIS for advanced geospatial analytics and other custom extensions.
Cloud-native operations
- Enhanced observability: Provides CPU, memory and network detailed monitoring metrics for critical components (for example, UnionStore and FDB) directly in the DBaaS Admin Console, enabling proactive troubleshooting and ensuring platform stability.
- Intelligent resource management: Introduces Environment Specs for precise, policy-based scheduling of components, optimizing resource utilization and cost in multi-tenant environments.

New features

Unified data engine for hybrid workloads

Supports TpServer, a stateful compute node designed for high-concurrency transactional processing (OLTP) and streaming data ingestion. Its core scenario enables users to leverage indexes and advanced components such as pgvector and PostGIS within SynxDB Cloud. By vertically separating transactional workloads from analytical queries, TpServer ensures that resource-intensive analysis does not impact the stability of critical business transactions. This hybrid architecture enables SynxDB Cloud to handle mixed workloads (HTAP) efficiently within a single system, simplifying the data infrastructure stack. See Using TpServer Nodes.

Previously, SynxDB Cloud was primarily an OLAP platform, optimized for analytical queries but limited in transactional processing and indexing capabilities. With the introduction of TpServer, the platform now supports high-concurrency OLTP workloads and comprehensive indexing. This evolution seamlessly combines transactional and analytical processing (HTAP) on a single platform, eliminating data silos and enabling real-time, data-driven decisions.

Integrated AI and machine learning platform

Moving beyond traditional analytics, SynxDB Cloud now integrates a complete AI and machine learning stack directly into the database. From vector similarity search for GenAI to a full-lifecycle ML platform, SynxDB Cloud enables data scientists and developers to build and deploy intelligent applications without moving data, greatly reducing complexity and latency.

Generative AI (GenAI) enablement

Introduces native support for pgvector indexes, enabling high-performance vector similarity search directly within the data platform. This feature allows storing and querying vector embeddings alongside structured data, supporting exact and approximate nearest neighbor searches (L2 distance, inner product, cosine distance). By integrating vector search capabilities, SynxDB Cloud provides the critical foundation for building advanced Generative AI and semantic search applications, such as RAG (Retrieval-Augmented Generation) systems. See Use pgvector for Vector Similarity Search.

End-to-end ML lifecycle

ML cluster management

Supports managing ML Clusters directly from the DBaaS Admin Console, enabling administrators to provision and scale dedicated computing resources for AI and machine learning workloads. This feature allows creating, monitoring, and managing the lifecycle of clusters tailored for tasks such as model training, LLM fine-tuning, and inference services. It also supports configuring object storage and shared nodes during creation to facilitate flexible data access and resource sharing. By integrating ML cluster management into the console, it ensures that data scientists have reliable, scalable infrastructure while simplifying operations for platform administrators. See Manage ML Clusters.
SynxML SQL

Supports SynxML SQL, a powerful extension that brings machine learning and natural language processing capabilities directly into the database via SQL commands. This feature enables users to perform tasks like text summarization, translation, XGBoost model training, and interaction with LLMs using standard SQL commands. By allowing in-database execution of complex AI operations, it streamlines workflows for data analysts and developers, eliminating the need to move data to external ML environments. See SynxML SQL.
SynxML Python

Supports SynxML Python, a Python SDK that provides a powerful interface for manipulating data and machine learning on the SynxML. This feature supports both Ray and DataFrame interfaces, enabling data scientists to seamlessly load data, perform preprocessing, and train traditional machine learning models (such as SVM and Random Forest) using familiar libraries like pandas and scikit-learn. By integrating these capabilities directly into the platform, it simplifies the development lifecycle and accelerates the deployment of AI applications. See SynxML Python.

Unified lakehouse engine

To realize the vision of a unified lakehouse, this release strengthens the core engine’s ability to manage diverse data sources and optimize performance at scale. By enhancing metadata reliability, expanding storage flexibility, and introducing advanced indexing, SynxDB Cloud ensures that both internal and external data are treated as first-class citizens, accessible with high performance and unified governance.

Robust metadata management

Hive metadata auto sync

Supports automatically synchronizing Hive metadata via Kafka. This feature listens for Hive Metastore change events and automatically updates external table definitions in SynxDB Cloud, eliminating the need to manually synchronize. This feature ensures metadata consistency across systems without manual intervention, greatly simplifying data management in hybrid environments. See Configure Hive Metadata auto sync.
FoundationDB catalog 2PC

Supports 2-phase commit in FoundationDB Catalog to enhance transaction atomicity and consistency for metadata operations. This improvement includes new transaction commands, hook implementations, and robust handling of distributed transactions, ensuring greater reliability for catalog operations in distributed environments.
Main manifest removal

Removes the main_manifest catalog table to optimize metadata storage for cloud tables. Instead of a dedicated catalog table, manifest tuples are now stored directly in the physical file (relfile) of the table in unionstore. This change is designed to prevent potential performance bottlenecks caused by excessive metadata accumulation in a single catalog table, ensuring scalable and efficient metadata management.

Flexible storage integration

Undelegated buckets

Supports registering and managing undelegated buckets, allowing users to integrate existing cloud storage resources created outside the platform into the SynxDB Cloud ecosystem. This feature is essential for organizations with specific compliance or security requirements that mandate using pre-provisioned buckets. It provides flexibility by enabling using external buckets for accounts and UnionStores while maintaining centralized visibility in the console. See Manage Buckets.
Gopher version update

Updates the Gopher component to version v4.0.23, which introduces refined monitoring capabilities and supports writing files without caching in gopherOpenFile.

Advanced indexing

Supports a comprehensive set of PostgreSQL-native indexes (B-tree, Hash, GiST, SP-GiST, GIN, BRIN) to accelerate query performance across diverse workloads. This includes support for index-only scans and covering indexes, which significantly reduce I/O by retrieving data directly from the index. In addition, dynamic index-only scans optimize queries on partitioned tables by combining index access with partition pruning. See Create and Manage Indexes.

Rich plugin ecosystem

Supports extending functionality with a rich ecosystem of PostgreSQL plugins, including PostGIS for advanced geospatial analytics and other custom extensions. PostGIS enables storing and querying Geographic Information System (GIS) objects, supporting GiST-based R-Tree spatial indexes and seamless integration of vector and raster geospatial data. This allows users to perform complex spatial queries and analysis directly within the database. See Geospatial Data Analysis with PostGIS.

Query optimization

Optimizes COUNT(DISTINCT) performance for integer data types (int2, int4, int8) by introducing an Aggregate Bitmap Scan using Roaring Bitmaps. This optimization replaces expensive data redistribution with efficient bitmap operations, reducing network overhead and improving query response times for distinct counting scenarios.

Cloud-native operations

As data platforms grow in complexity, operational efficiency becomes paramount. This release introduces cloud-native enhancements focused on observability and intelligent resource scheduling. These features empower administrators to proactively monitor system health and optimize resource allocation in multi-tenant environments, ensuring a stable and cost-effective platform.

Enhanced observability

Enhances system observability by providing comprehensive monitoring metrics directly within the DBaaS Admin Console. Administrators can now track the health and performance of critical components including UnionStore, FoundationDB, Coordinators, and Warehouses through detailed dashboards. This feature offers visibility into key indicators such as resource usage, database concurrency, and cache performance, enabling proactive troubleshooting and ensuring the stability of the data platform. See Check Monitor Metrics.

Intelligent resource management

Supports defining Kubernetes Node Selectors to precisely control the deployment of system components. This capability supports advanced scheduling scenarios such as fault isolation and dedicated hardware allocation directly from the console. By applying these specifications to resources like warehouses and accounts, users can optimize resource utilization and performance without direct Kubernetes cluster access. See Manage Environment Specifications.

Product change information

GUC parameters

The following configuration parameter is added in this release:

cloud.default_tpserver: Sets the current default TP server, which is a string type and does not have a default value.

Metadata changes

Updated gp_storage_server and added gp_tpserver to enable the new TpServer functionality.
Removed main_manifest and refactored the data organization model to improve efficiency.

Bug fixes

Fix an issue where the hashtable was not re-created if its memory usage exceeded the limitation.
Fix an assertion failure in the virtual catalog.
Fix table open issues in auto clusters.
Fix an issue where table files were not properly cleaned up when dropping a database.
Fix the logic for parsing file clean manifests.
Fix external table checks and error reporting.
Enhanced gpstop for tpserver and fixed related issues.
Corrected a typo from EXITS to EXISTS in SQL and code.
Fix the calculation of database or tablespace size.
Fix cloud_get_manifests to skip non-existent tuples when gp_select_invisible is enabled.
Fix compilation issues with higher versions of gcc/g++.
Fix inaccurate judgment by ORCA regarding whether a relation is empty.
Fix instability in cte_prune.
Fix a bug where backend panic occurred when TDE feature was enabled.
Fix \d+ to correctly display reloptions including write_policy.
Fix an issue to ensure dropping a database removes cloud table files.
Fix an issue where vacuum would always add the current manifest to dead_manifests.
Fix warehouse/tpserver GUC check and prevented collecting catalog for tpserver.
Fix duplicate distribution keys from subqueries.
Fix readable CTE with SELECT INTO clause.
Fix a segmentation fault in ORCA when appending group statistics.
Fix NULL locus in Shared Scan.
Fix CDC to store replication slots in default_dfs_tablespace.
Fix CDC xlog reader buffer LSN to not use int type.
Fix to only create default_cloud_tablespace in local mode.
Fix CDC DDL for unlogged tables with serial columns.
Fix partition table aggregation to use meta statistics.
Fix CDC support for creating, altering, and dropping directory tables.
Fix forwarding of external protocol data from proxy to QD.
Fix merging of segment manifest add/remove lists to meta manifest.
Fix memory leak due to missing calls to FreeVecExecuteState.
Fix no response when altering io_limit of resource group to -1.
Fix double free issue in IO limit.
Fix parallel worker assignment for partial paths.
Fix writable CTE on replicated tables (except partition tables).
Fix segmentation fault when reading Iceberg tables in Datalake.