Configure Hive Metadata Auto Sync

Warning

Hive Metadata Auto Sync is an experimental feature in the current version. Do not use it in production environments.

SynxDB Cloud can synchronize Hive metadata in real time through Kafka. The feature listens for Hive Metastore change events and updates the matching external table definitions in SynxDB Cloud without operator action. It complements the manual synchronization functions.

Hive Metadata Auto Sync runs as an independent component, managed separately from the database cluster. You configure it through the Hive Meta Sync tab on the Database Config page of the DBaaS Admin Console, but the feature also needs preparation on the Hive cluster and inside the target SynxDB Cloud database. This document covers the full setup end to end.

How it works

The synchronization pipeline has four components:

Hive Metastore with the SynxDB Cloud listener plugin installed. The plugin intercepts metadata change events such as CREATE TABLE, ALTER TABLE, and DROP TABLE, and publishes them to a Kafka topic.
Kafka broker that transports the metadata change events.
Meta Sync component running in the SynxDB Cloud cluster. It consumes events from Kafka and translates each event into a matching CREATE FOREIGN TABLE or DROP FOREIGN TABLE statement.
Target database in SynxDB Cloud (typically named hivedb), where the foreign tables are created against a pre-provisioned foreign server named __hive_auto_sync_server.

The Kafka topic name uses the format <hdw.catalog.name>_fdb-<catalog>_hms, where:

hdw.catalog.name is the value of cn.cbdb.apiary.kafka.hdw.catalog.name configured in hive-site.xml on the Hive side. The same value must appear as hdw.catalog.name in the Meta Sync configuration.
<catalog> is each entry in cn.cbdb.apiary.kafka.autosync.catalogs. The same value must appear as hive.catalog.name in the Meta Sync configuration so that the consumer subscribes to the correct topic.

Prerequisites

Before configuring Hive Metadata Auto Sync, confirm the following:

The listener plugin is installed on the Hive Metastore. Contact SynxDB Cloud technical support to obtain the kafka-metastore-listener-<version>-all.jar file, then install it as described in Step 1.
A Kafka cluster is reachable from both the Hive Metastore and the SynxDB Cloud cluster. Both PLAINTEXT and SASL_PLAINTEXT security protocols are supported. When SASL_PLAINTEXT is used, the supported SASL mechanism is SCRAM-SHA-256.
The HDFS connection and Hive Connector are already configured in the DBaaS Admin Console. Complete Configure an HDFS connection and Configure a Hive connection first.
A Hive Meta Sync profile is available on the Profile page. If none exists, create one before proceeding.

Step 1. Install the listener on the Hive Metastore

You perform this step on the Hive cluster, not in the SynxDB Cloud console.

Place the listener jar in the Hive Metastore classpath (typically $HIVE_HOME/lib/). If clients connect through HiveServer2, install the jar there as well.

Add the following properties to hive-site.xml on every Hive Metastore node:

<property>
  <name>hive.metastore.event.listeners</name>
  <value>cn.cbdb.apiary.kafka.listener.HiveMetaStoreEventListener</value>
</property>
<property>
  <name>cn.cbdb.apiary.kafka.bootstrap.servers</name>
  <value>kafka-host:9092</value>
</property>
<property>
  <name>cn.cbdb.apiary.kafka.hdw.catalog.name</name>
  <value>hdw_catalog</value>
</property>
<property>
  <name>cn.cbdb.apiary.kafka.hive.cluster.name</name>
  <value>cluster-1</value>
</property>
<property>
  <name>cn.cbdb.apiary.kafka.autosync.catalogs</name>
  <value>hive</value>
</property>
<property>
  <name>cn.cbdb.apiary.kafka.autosync.databases.hive</name>
  <value>*</value>
</property>
<property>
  <name>cn.cbdb.apiary.kafka.sync.catalog.wise</name>
  <value>true</value>
</property>

For Kafka brokers that require SASL authentication, also add:

<property>
  <name>cn.cbdb.apiary.kafka.security.protocol</name>
  <value>SASL_PLAINTEXT</value>
</property>
<property>
  <name>cn.cbdb.apiary.kafka.sasl.mechanism</name>
  <value>SCRAM-SHA-256</value>
</property>
<property>
  <name>cn.cbdb.apiary.kafka.sasl.jaas.config</name>
  <value>org.apache.kafka.common.security.scram.ScramLoginModule required username="kafka-user" password="kafka-password";</value>
</property>

Property reference:

Property	Description
`hive.metastore.event.listeners`	Fully qualified class name of the SynxDB Cloud listener. Must be `cn.cbdb.apiary.kafka.listener.HiveMetaStoreEventListener`.
`cn.cbdb.apiary.kafka.bootstrap.servers`	Kafka broker address or addresses, comma-separated.
`cn.cbdb.apiary.kafka.hdw.catalog.name`	Catalog identifier used as the topic prefix. Must match `hdw.catalog.name` in the Meta Sync configuration.
`cn.cbdb.apiary.kafka.hive.cluster.name`	Logical name of this Hive cluster.
`cn.cbdb.apiary.kafka.autosync.catalogs`	Comma-separated list of Hive catalogs to publish. Each value in this list must also appear as `hive.catalog.name` in the Meta Sync configuration that consumes from it.
`cn.cbdb.apiary.kafka.autosync.databases.<catalog>`	Comma-separated list of Hive databases to publish for the given catalog. Use `*` to publish all databases.
`cn.cbdb.apiary.kafka.sync.catalog.wise`	Whether a separate topic is used per catalog. Must match `sync.catalog.wise` in the Meta Sync configuration.
`cn.cbdb.apiary.kafka.security.protocol`	`PLAINTEXT` or `SASL_PLAINTEXT`.
`cn.cbdb.apiary.kafka.sasl.mechanism`	`SCRAM-SHA-256` when SASL is used.
`cn.cbdb.apiary.kafka.sasl.jaas.config`	JAAS login string for the Kafka client used by the listener.

Restart the Hive Metastore process to load the listener. To confirm the listener is publishing events, run CREATE TABLE in Hive and check that an event appears in the matching Kafka topic:
```
kafka-console-consumer.sh \
    --bootstrap-server <broker-address> \
    --topic <hdw.catalog.name>_fdb-<catalog>_hms \
    --from-beginning --max-messages 1
```

Step 2. Prepare the target database

The Meta Sync component writes foreign tables into a designated database in SynxDB Cloud. You need to create this database and provision a foreign server inside it before any sync events arrive.

Create the target database. Run the following statement from the DBaaS User Console worksheet or through psql. The database name must match the value you set as hdw.database in the Meta Sync configuration.
```
CREATE DATABASE hivedb;
```

Create the foreign server. Switch to the target database and create the foreign server that Meta Sync uses:

\c hivedb

SELECT public.create_foreign_server(
    '__hive_auto_sync_server',     -- exact name required by Meta Sync; do not change
    'gpadmin',                      -- existing role for the initial user mapping
    'datalake_fdw',                 -- foreign data wrapper
    'hdfs-cluster-1'                -- must match hdfs.gp.name in the Meta Sync configuration
);

Note

The Meta Sync component requires the server name __hive_auto_sync_server. Use this exact name.

Grant privileges to the Meta Sync database user. When the Meta Sync pod connects to SynxDB Cloud, it logs in as a database user that belongs to the account you will select in Step 4. Look up that user in the DBaaS Admin Console under Organizations > your organization > your account > Users; the username is shown in the Name column (for example, 123123).

In the target database, grant privileges to that user and create the user mapping. The example uses <sync_user> as a placeholder; substitute the real username:
```
GRANT USAGE ON FOREIGN SERVER __hive_auto_sync_server TO "<sync_user>";
GRANT ALL   ON SCHEMA public                          TO "<sync_user>";
CREATE USER MAPPING FOR "<sync_user>" SERVER __hive_auto_sync_server;
```
Warning

Double-quote the username in SQL to avoid identifier parsing errors.

Step 3. Access the Hive Meta Sync tab

Log in to the DBaaS Admin Console.
In the left navigation pane, click Database Config.
Click the Hive Meta Sync tab at the top of the page. This page lists all current Hive Metadata Auto Sync configurations.

Step 4. Create a configuration (basic information)

Click + Create in the upper-right corner of the list.
In the Basic Information step, provide the following details:
- Organization: Select the organization for this configuration.
- Account: Select the account for this configuration.
- Service Configuration Template: Select the appropriate template (for example, Hive Meta Sync Template).
Click Next.

Step 5. Configure sync parameters

Select a Profile for the Hive Meta Sync component. The profile determines the resource allocation (CPU, memory, and storage) for the sync service. This field is required.
In the Hive Meta Sync Content input area, choose Manual Input and provide the YAML body. Every line must start at column 0; see the YAML formatting caveat below.

Template for a PLAINTEXT Kafka broker:
```
bootstrap.servers:
  - kafka-host:9092
hdw.catalog.name: hdw_catalog
security.protocol: PLAINTEXT
prometheus.port: 15888
sync.catalog.wise: true

hive.clusters:
  - hive.gp.name: hive-cluster-1
    hive.cluster.name: hive
    hive.catalog.list:
      - hive.catalog.name: hive
        hdw.database: hivedb
        hive.partition.prov_id: 001
        hdw.auth.user:
          - <sync_user>
    hdfs.gp.name: hdfs-cluster-1
```
Template for a SASL_PLAINTEXT Kafka broker:
```
bootstrap.servers:
  - kafka-host:9092
hdw.catalog.name: hdw_catalog
security.protocol: SASL_PLAINTEXT
sasl.mechanism: SCRAM-SHA-256
sasl.jaas.config: 'org.apache.kafka.common.security.scram.ScramLoginModule required username="kafka-user" password="kafka-password";'
prometheus.port: 15888
sync.catalog.wise: true

hive.clusters:
  - hive.gp.name: hive-cluster-1
    hive.cluster.name: hive
    hive.catalog.list:
      - hive.catalog.name: hive
        hdw.database: hivedb
        hive.partition.prov_id: 001
        hdw.auth.user:
          - <sync_user>
    hdfs.gp.name: hdfs-cluster-1
```
Key parameter descriptions:
- bootstrap.servers: Kafka broker address or addresses. Must be reachable from inside the SynxDB Cloud cluster network.
- hdw.catalog.name: Must match cn.cbdb.apiary.kafka.hdw.catalog.name set on the Hive side. Otherwise the consumer subscribes to a topic that no producer writes to.
- security.protocol, sasl.mechanism, sasl.jaas.config: Authentication settings. Must match the Kafka broker’s configuration.
- sync.catalog.wise: Must match cn.cbdb.apiary.kafka.sync.catalog.wise on the Hive side.
- hive.gp.name: Must match the connection name created on the Hive Connector tab (for example, hive-cluster-1).
- hive.cluster.name: Logical cluster name for display purposes.
- hive.catalog.name (inner): Must appear in cn.cbdb.apiary.kafka.autosync.catalogs on the Hive side. The topic name is derived as <hdw.catalog.name>_fdb-<hive.catalog.name>_hms.
- hdw.database: Target database name in SynxDB Cloud. Must already exist (see Step 2).
- hdw.auth.user: List of SynxDB Cloud users that automatically receive SELECT permission on synchronized schemas.
- hdfs.gp.name: Must match both the connection name created on the HDFS tab and the hdfsClusterName argument passed to create_foreign_server in Step 2.
Optionally, select an Environment Spec to specify the Kubernetes runtime environment for the sync component.
Click Next.

YAML formatting caveat

Follow these two rules when filling in the Hive Meta Sync Content field:

No leading whitespace on any line. Top-level keys, including bootstrap.servers: and hive.clusters:, must start at column 0.
Single-quote the sasl.jaas.config value. It contains double quotes, semicolons, and equals signs that can be misparsed without single quotes.

Breaking either rule causes the Meta Sync pod to fail to start with a YAML parse error. If this happens, edit the configuration to satisfy both rules and resubmit.

Step 6. Preview and submit

In the Configuration Preview step, review the following sections:
- Basic Information: Confirms the account and service configuration template.
- Hive Meta Sync Content: Shows the selected profile, environment spec, and a full preview of the sync configuration content.
If everything is correct, click Submit to create the synchronization configuration. SynxDB Cloud provisions the Meta Sync pod automatically, and the pod begins consuming from the configured Kafka topic.

Verify the synchronization

From a Hive client, create a test table in a database listed in cn.cbdb.apiary.kafka.autosync.databases.<catalog>:
```
-- in beeline
CREATE TABLE default.sync_test (id INT, name STRING) STORED AS PARQUET;
```

After a few seconds, list foreign tables in the target SynxDB Cloud database. The new table appears in the public schema:

-- in psql, connected to hivedb
\det+

Expected output:

 Schema |   Table   |         Server          | FDW options
--------+-----------+-------------------------+-------------
 public | sync_test | __hive_auto_sync_server | ...

Verify drop synchronization as well:
```
-- in beeline
DROP TABLE default.sync_test;
```
After a few seconds, \det in hivedb no longer lists the table.

If the expected foreign table does not appear, check the Meta Sync pod log for errors.

Manage sync tasks

After the task is created, you can manage the synchronization task from the list view on the Hive Meta Sync tab. The list displays the following columns: ID, Account Name, Status, Created, Active/Deactivate, and Action.

Status

Each Hive Meta Sync task has one of the following statuses:

Pending: The task has been created but is not yet running.
Running: The task is actively synchronizing Hive metadata.
Suspended: The task has been deactivated and is not processing metadata changes.

Available operations

Active/Deactivate toggle: Activates or deactivates the sync task. When you deactivate the task, SynxDB Cloud stops synchronizing metadata changes and the status changes to Suspended.
Edit: Opens the edit form, where you can modify the profile, sync configuration content, and environment spec. Click Submit to save your changes.
Delete: Permanently removes the sync task.