SynxDB AI Bot - Doc

SynxDB AI Bot - Doc (or AI Bot - Doc) is the native AI engine for SynxDB Cloud. It is an enterprise-grade multimodal data intelligence platform that embodies the “One Platform, All Data” philosophy. By functioning as strictly integrated storage and compute within the SynxDB Cloud ecosystem, AI Bot - Doc enables organizations to run AI/ML workloads directly where their data resides—transforming the Cloud Data Warehouse into a unified engine for both analytics and intelligence.

Note

  • AI Bot - Doc is only available when enable-doc-mind is set to true in the deployment configuration. In addition, the AI Bot - Doc entry only appears on the detail page of accounts whose metadata_type is UnionStore, so you must also enable enable-union-store and select UnionStore as the backend service when creating the account. For details, see Deploy SynxDB Cloud.

  • For instructions on creating and managing SynxML instances via the DBaaS Admin Console, see Manage AI Platforms.

Core scenarios & user value

AI Bot - Doc extends the core value of the Unified Lakehouse by bridging the gap between traditional data warehousing and modern AI workflows. It empowers enterprises to achieve:

  • Enterprise knowledge base & RAG: Transform siloed unstructured data into actionable intelligence. Build secure knowledge bases and deploy RAG applications directly within the platform using internal vectorization.

  • End-to-end data science lifecycle: Eliminate data movement. Run high-performance training jobs (powered by Ray) directly on warehouse storage, ensuring single-source-of-truth accuracy and elastic scaling.

  • Intelligent data applications: Democratize data access. Enable any user to build intelligent search agents and chatbots using built-in “Chat with Data” capabilities (Text-to-SQL, Document Q&A).

Key capabilities

  • Data management: Centralize and organize multimodal assets (PDFs, images) as directory tables and build knowledge bases for semantic search.

  • Development & experiment: A complete workshop for data scientists to explore data, develop with native Notebooks, manage workflows, and train/deploy models.

  • AI applications: Ready-to-use intelligent interfaces for semantic search and conversational analytics (Chat) across your enterprise data.

Underlying architecture

Beneath the AI Bot - Doc interface, the platform leverages the powerful SynxML engine:

  • Compute foundation: Powered by Ray on Kubernetes, ensuring elastic scalability for compute-intensive ML tasks separate from SQL workloads.

  • Storage: Uses the same object storage as the data warehouse, ensuring a single source of truth.

  • Dual-interface access:

    • SQL interface: For analysts to run predictions and LLM calls via simple SQL functions.

    • Python SDK: For data scientists to leverage the full power of Ray and common ML libraries (PyTorch, XGBoost).

How to use

Manage data assets

The Data Management module allows you to centralize and organize all your enterprise assets in one place. It bridges the gap between raw files (like PDFs and images) and structured analytics through two core components: Document Management and Knowledge Management.

Organize unstructured files (Document Management)

This module enables you to break the traditional boundary between file systems and databases. By utilizing Directory Tables, you can map unstructured files stored in object storage directly to database tables. This allows you to manage file metadata using SQL while retaining the flexibility of the underlying storage.

Manage unstructured files as database tables

Instead of managing loose files in disparate folders, you can create Directory Tables to establish a logical link between your storage layer and the database. This provides a structured view of your assets, automatically tracking metadata such as file paths, sizes, and timestamps.

To create a directory table:

  1. Navigate to Data Management > Document Management in the sidebar.

  2. Click Add Directory Table in the top right corner.

  3. Enter a name for your table (for example, contract_documents or product_images) and confirm. The system will create the table and mapping automatically.

Centralize and organize multimodal assets

You can build a single source of truth for your enterprise data by importing diverse datasets into these tables. The platform supports a wide range of formats, including PDF, Word (.docx), Excel (.xlsx), PowerPoint (.pptx), images, and videos.

To import and tag data:

  1. Click on the name of a created Directory Table to enter its details view. You can switch between List and Grid views to best display your content.

  2. Click Upload Files to open the configuration panel.

  3. Drag and drop files directly into the upload area or click Choose File to select from your local machine.

  4. Before confirming, you can enrich the data metadata:

    • Tag: Enter keywords (for example, tag1, tag2) to categorize the batch.

    • Relative Path Prefix: Define a virtual folder structure (for example, prefix) to organize files within the storage bucket without physically moving them.

  5. Click Confirm to start the upload.

Note: For large-scale data engineering, the platform backend also supports importing datasets directly from the Gravitino Virtual File System (GVFS).

Preview content

Before using data for AI training or RAG, you often need to verify its quality. Click on any file name within the list to open the online preview. This allows you to inspect the content instantly in your browser without downloading it.

Build knowledge bases (Knowledge Management)

The Knowledge Management module allows you to transform static documents into active intelligence. It serves as the backbone for Retrieval-Augmented Generation (RAG) applications, enabling your AI agents to understand, cite, and reason with your enterprise data.

Create domain-specific knowledge bases

You can segregate your enterprise knowledge by business domain (for example, “HR Policies,” “Technical Manuals,” or “Legal Contracts”) to ensure that AI responses are precise and contextually relevant.

To create a knowledge base:

  1. Navigate to Data Management > Knowledge Management.

  2. Click Add Knowledge Base.

  3. In the pop-up window, provide a Name and a Description to clearly define the scope of this knowledge base (for example, Name: kb002, Description: Product Manuals for V2.0).

  4. Click Confirm. The new knowledge base will appear in the list, showing its creation time and current document count.

Zero-copy data synchronization

Unlike traditional systems that require you to re-upload files specifically for AI processing, AI Bot - Doc leverages the “One Platform” architecture to sync data directly from your existing directory tables.

To populate your knowledge base:

  1. Click on the name of your newly created knowledge base to enter the details view.

  2. Click Add Documents.

  3. In the “Select documents to add” modal, select the existing Directory Tables or folders you wish to include (for example, dir001, ssj).

  4. Click Confirm (for example, Confirm (0)). The system will automatically trigger a background pipeline to parse, chunk, and embed (vectorize) the documents.

Incremental updates

As your business data evolves, your knowledge base needs to stay current. The platform provides tools to keep your AI synchronized with your files.

To maintain your knowledge base:

  1. When new files are added to a source Directory Table, click Update from Documents in the Knowledge Base details view.

  2. Select the directory tables you wish to refresh. The system intelligently identifies new or modified files (status shown as Processing or Succeeded) and updates the vector index, saving compute resources.

Develop and experiment

The Development & Experiment module provides a comprehensive workshop for data scientists and engineers. Whether you prefer interactive coding or visual low-code tools, this module offers the necessary environments to explore data, build pipelines, and refine algorithms directly on the platform.

Analyze with notebooks (Notebook)

For data scientists and algorithm engineers who prefer a code-first approach, the platform integrates a native Notebook environment. This feature allows you to perform interactive data analysis, write Python code, and execute mixed-media documentation (Markdown and Code) without the need for complex local environment configuration.

Launch your coding workspace

You can rapidly spin up a new environment to start your experiments. Since the compute is decoupled from storage, these notebooks can access your data warehouse assets directly while running on independent resources.

To create a new notebook:

  1. Navigate to Development & Experiment > Notebook in the sidebar.

  2. Click + Create located in the top right corner.

  3. In the pop-up window, enter a unique Name for your notebook (for example, sales_forecast_v1) and an optional Description to help team members understand its purpose.

Click Confirm. Your new notebook will appear in the list immediately, displaying its owner and creation timestamp.

Manage and organize your experiments

As your team generates more analysis scripts, efficient management becomes crucial. The platform allows you to maintain a clean workspace or bring in external work.

To manage your notebooks:

  1. Open: Click the Open folder icon next to a notebook to launch the editor in a new tab. Here you can write code, run cells, and visualize outputs.

  2. Import: If you have existing work from other Jupyter environments, click Import next to Create to upload standard .ipynb files directly.

  3. Delete: To remove obsolete experiments, click the red Trash icon in the actions column.

Explore and process data (Data Workshop)

The Data Workshop provides a visual interface for exploring data and managing data processing workflows. It consists of three main modules: Data Exploration, Workflow Management, and Task Management.

Explore data

The Data Exploration module allows you to inspect the structure and content of your databases and tables without writing SQL queries. You can view table metadata, manage directory tables, and interact with your data using AI.

To explore data:

  1. Navigate to Development & Experiment > Data Workshop > Data Exploration.

  2. Select the target Database and Schema from the dropdown menus. The system will display a list of available tables.

  3. In the list view, you can:

    • View table details: Click the table name (for example, dir001) to open the details panel. For directory tables, this view lists the contained files along with their metadata, such as relative_path, size, and last_modified time.

    • Chat with data: Click the Chat icon in the Actions column to start an AI-driven conversation with the specific table.

    • Delete table: Click the Trash icon to remove the table from the database.

Manage workflows

Workflow Management enables you to design, organize, and deploy complex data processing pipelines.

To manage workflows:

  1. Navigate to Development & Experiment > Data Workshop > Workflow Management.

  2. You can perform the following actions:

    • Create workflow: Click + New Workflow to start designing a new pipeline.

    • Import workflow: Click Import to upload an existing workflow definition.

    • Switch views: Use the Grid and List toggles to change how your workflows are displayed.

Note

For detailed instructions on workflow management, refer to AI Bot - Doc’s built-in documentation at http://<ip>:3000/<doc-ai-platform>/docs. Replace <doc-ai-platform> with the actual service path provided in your deployment package.

Train and deploy models (Model Workshop)

The Model Workshop offers a complete environment for training and managing machine learning models. It supports the entire lifecycle from task creation to model deployment.

Start a training task

You can configure and launch new model training jobs with customizable algorithms and hyperparameters.

To create a training task:

  1. Navigate to Development & Experiment > Model Workshop > Create Training Task.

  2. Basic Information:

    • Task Name: Enter a unique name for the training task.

    • Task Description: Provide a brief description of the task’s purpose.

    • Model Type: Select the algorithm family (for example, Sklearn Regression).

    • Model Name: Choose the specific model algorithm (for example, ARDRegression).

  3. Model Configuration:

    • Adjust the model-specific hyperparameters such as max_iter, tol, and regularization parameters (alpha, lambda) to optimize performance.

    • Configure other settings like fit_intercept (whether to calculate the intercept for this model) and verbose (logging level).

  4. Data Configuration:

    • Database Name: Select the source database containing your training data.

    • Training Dataset: Choose the table or dataset to be used for training.

    • Validation Dataset: Choose a separate dataset for validating the model’s performance.

    • Variable Mapping: Map the columns from your dataset to the model’s inputs.

      • Target column: Select the column you want to predict.

      • Feature columns: Select one or more columns that serve as input features for the prediction.

  5. Click Create to launch the training task. The system will create the task and switch to the Model Tasks view where you can monitor its progress.

Monitor training progress

The Training Task List provides a central view to manage and monitor all your model training activities.

To manage model tasks:

  1. Navigate to Development & Experiment > Model Workshop > Model Tasks.

  2. Search and filter: Use the search bar to find tasks by name, or use the filter dropdown to view tasks by status (for example, All Status, Success, Running).

  3. Monitor progress: The list displays key information for each task, including:

    • Task Info: Name and description.

    • Model Info: The algorithm used.

    • Dataset: Training and validation datasets.

    • Status: Current state of the training job.

    • Creator and Create Time.

View model registry

The Model List serves as a registry for all available machine learning models in your environment.

To view models:

  1. Navigate to Development & Experiment > Model Workshop > Model List.

  2. You can search for specific models by name using the search bar. This view provides a quick overview of your model assets, whether they were trained internally or imported.

Import external models

You can register external models into the platform for deployment and management.

To import a model:

  1. Navigate to Development & Experiment > Model Workshop > Import Model (or click Import Model from the Model List view).

  2. Model Description: Enter a description for the model.

  3. Model Type: Select the appropriate algorithm family (for example, Sklearn Regression).

  4. Model Name: Specify the model name (for example, ARDRegression).

  5. Model File: Drag and drop your model file into the upload area, or click to select a file from your local machine.

  6. Click Submit to register the model.

Deploy and monitor services

The Model Services module allows you to deploy trained models as accessible services.

To manage model services:

  1. Navigate to Development & Experiment > Model Workshop > Model Services.

  2. Deploy service: Click + Deploy to configure and launch a new model service.

  3. Monitor services: You can view the status of deployed services, search for services by task name, and filter by status. The list displays active deployments and allows you to manage their lifecycle.

Use AI applications

The AI Applications module provides ready-to-use intelligent interfaces that leverage your indexed data for search and conversational analytics.

Search enterprise knowledge

The Search application enables you to perform semantic and keyword-based searches across your index directory tables.

To use Search:

  1. Navigate to AI Applications > Search.

  2. Enter query: Type your query in the search bar.

  3. Configure search scope:

    • Mode: Toggle between All, Text, or Multimedia to filter the type of content you want to retrieve.

    • Similarity Threshold: Adjust the slider (for example, <1.00) to control the strictness of the semantic matching.

    • Scope: Select All Documents or specific subsets to narrow down the search field.

  4. Click Search. The system returns relevant sections from your documents, images, or files that match your query contextually.

Chat with your data

The Chat application provides an interactive conversational interface to query your data using natural language. It supports multiple modes to target specific knowledge sources.

To start a chat:

  1. Navigate to AI Applications > Chat.

  2. Select a conversation mode:

    • Knowledge Base Q&A: Interacts with your curated knowledge bases.

    • Document Q&A: Limits the context to specific documents you select.

    • Table Q&A: Enables you to query structured data in database tables using natural language (Text-to-SQL).

  3. Start intelligent chat:

    • Click on a mode to open the chat interface.

    • Type your question in the input box at the bottom.

    • Use New to start a fresh conversation or History to review past interactions.

    • You can also toggle settings such as the underlying model using the controls in the input area.