SynxML SQL

SynxML SQL is a powerful SQL extension that enables machine learning and natural language processing operations directly within your database using SQL commands.

Features

  • Natural language processing: Text summarization, translation, classification, question answering, and text generation.

  • Chat operations: Interactive chat conversations with LLM services.

  • Model management: Register and manage ML service endpoints.

  • XGBoost integration: Train and deploy XGBoost models directly in PostgreSQL.

  • Security: Row-level security and proper privilege management.

  • Help system: Built-in help functions for all operations.

Prerequisites

Before using SynxML SQL in SynxDB Cloud, ensure the following resources are set up:

  • Create organization, account, and user/role via the DBaaS Admin Console.

  • Create a warehouse.

  • Create an ML cluster.

For details, see Use DBaaS Admin Console to create resources.

Prepare database and extension

  1. Create a database (for example, testdb) and connect to it:

    CREATE DATABASE IF NOT EXISTS testdb;
    \c testdb;
    
  2. Create and configure the extension (for example, using the synxml_auth role):

    CREATE EXTENSION IF NOT EXISTS synxml CASCADE;
    SELECT synxml.configure_auth_role('synxml_auth');
    

Manage LLM services

Register LLM services

You can register locally deployed LLM services or third-party services.

-- Register a chat service
SELECT synxml.register_customer_service(
    'LLM', 
    'http://10.14.10.1:8800/vllm/v1',
    'zhipu/glm4-9b-chat',
    '123'
);

-- Register a multimodal embedding service
SELECT synxml.register_customer_service(
    'MM_EMBED',
    'http://10.14.10.1:8000/vl_embedding/v1/embeddings',
    'bge-vl-base'
);

-- Register an Embedding service
SELECT synxml.register_customer_service(
    'EMBED',
    'http://10.14.10.1:8000/embedding/v1/embeddings',
    'jina-embeddings-v2-base-zh'
);

-- Register a rerank service
SELECT synxml.register_customer_service(
    'RERANK',
    'https://api.siliconflow.cn/v1/rerank',
    'BAAI/bge-reranker-v2-m3',
    'sk-xxx'    -- Replace with your actual API key
);

Use registered LLM services

Text transformation:

SELECT synxml.transform(
    task => 'summarization',
    params => '{"text":"Artificial intelligence (AI) is intelligence demonstrated by machines, as opposed to the natural intelligence displayed by animals including humans. AI research has been defined as the field of study of intelligent agents, which refers to any system that perceives its environment and takes actions that maximize its chance of achieving its goals.", "max_length": 50}'::JSONB
);

Text summarization:

SELECT
  synxml.summarize(
    text => 'Artificial intelligence (AI) is intelligence demonstrated by machines, as opposed to the natural intelligence displayed by animals including humans. AI research has been defined as the field of study of intelligent agents, which refers to any system that perceives its environment and takes actions that maximize its chance of achieving its goals.',
    max_length => 20
  );

Translation:

SELECT synxml.translate(
    text => 'Hello',
    source_language => 'en',
    target_language => 'fr'
);

Load data

Example of creating a table and loading data:

CREATE TABLE iris (
  id SERIAL,
  sepal_length DECIMAL,
  sepal_width DECIMAL,
  petal_length DECIMAL,
  petal_width DECIMAL,
  species VARCHAR(50)
);
\copy iris from 'iris.csv' delimiter ',' csv header;

Manage user and permission

SynxML applies row-level security (RLS) for model, service, and job management.

Create a test user and grant permissions. For example:

CREATE ROLE testuser NOLOGIN;
GRANT testuser TO synxml_auth;
GRANT USAGE ON MLCLUSTER ray4synxml TO testuser;
GRANT SELECT ON iris TO testuser;

SET ROLE testuser;

Train model

Example of training an XGBoost model:

SELECT
    synxml.xgboost_train(
        train_tblname => 'iris'::TEXT,          
        model_name => 'xgb_iris_classifier'::TEXT,
        train_config => '{"objective":"multi:softmax", "num_class":3, "y":"target", "num_boost_round":10, "num_workers":4}'::JSONB
);

Note

This UDF call is asynchronous. It will return a job ID. You can check the job status later.

Manage jobs

List jobs

List all submitted jobs:

SELECT * FROM synxml.jobs;

Check job status

-- Replace 'JOB_ID' with the actual job ID you received from the training call
SELECT synxml.job_status('JOB_ID');

Check job logs

-- Replace 'JOB_ID' with the actual job ID you received from the training call
SELECT synxml.job_logs('JOB_ID', 100);

Manage models

Check the models table once the job is completed:

SELECT * FROM synxml.models;
SELECT * FROM synxml.dirtable_models;

Make predictions

Make predictions using the trained model:

SELECT 
    synxml.xgboost_predict(
        test_tblname => 'iris'::TEXT,
        model_name => 'xgb_iris_classifier'::TEXT,
        predict_config => '{"keep_columns":["sepal length (cm)", "sepal width (cm)", "petal length (cm)", "petal width (cm)"], "drop_columns":["target"], "max_num_workers":4}'::JSONB,
        output_tblname => 'prediction_results'::TEXT
    );

Check the prediction results once the job is completed:

SELECT * FROM prediction_results LIMIT 10;

Available UDFs

The following lists all available user defined functions (UDFs) in the synxml schema:

  • synxml._get_db_uri

    • Result data type: text

    • Type: func

  • synxml._has_read_privilege

    • Result data type: boolean

    • Argument data types: table_name text

    • Type: func

  • synxml.answer

    • Result data type: text

    • Argument data types: question text, context text DEFAULT ''::text, temperature double precision DEFAULT 0.3, max_new_tokens integer DEFAULT 1024, llm_service_name text DEFAULT 'LLM'::text

    • Type: func

  • synxml.chat

    • Result data type: text

    • Argument data types: input text, history jsonb DEFAULT '[]'::jsonb, temperature double precision DEFAULT 0.3, max_new_tokens integer DEFAULT 256, llm_service_name text DEFAULT 'LLM'::text

    • Type: func

  • synxml.chunk

    • Result data type: text

    • Argument data types: text text, chunk_size integer, chunk_overlap integer

    • Type: func

  • synxml.classify

    • Result data type: text

    • Argument data types: text text, categories text[], temperature double precision DEFAULT 0.3, llm_service_name text DEFAULT 'LLM'::text

    • Type: func

  • synxml.configure_auth_role

    • Result data type: void

    • Argument data types: auth_role text

    • Type: func

  • synxml.cross_validate

    • Result data type: text

    • Argument data types: model_type text, train_config jsonb, train_tblname text, output_tblname text DEFAULT NULL::text, description text DEFAULT ''::text, jobman_service_name text DEFAULT 'JOBMAN'::text

    • Type: func

  • synxml.embed

    • Result data type: double precision[]

    • Argument data types: input text, embedding_service_name text DEFAULT 'EMBED'::text

    • Type: func

  • synxml.embed_size

    • Result data type: integer

    • Argument data types: embedding_service_name text DEFAULT 'EMBED'::text

    • Type: func

  • synxml.generate

    • Result data type: text

    • Argument data types: text text, style text DEFAULT 'default'::text, temperature double precision DEFAULT 0.3, max_new_tokens integer DEFAULT 256, llm_service_name text DEFAULT 'LLM'::text

    • Type: func

  • synxml.get_algorithms

    • Result data type: text[]

    • Type: func

  • synxml.get_model_by_name

    • Result data type: SETOF synxml.model_info

    • Argument data types: name text

    • Type: func

  • synxml.get_models

    • Result data type: SETOF synxml.model_info

    • Type: func

  • synxml.get_models_by_description

    • Result data type: SETOF synxml.model_info

    • Argument data types: description text

    • Type: func

  • synxml.get_service_setting

    • Result data type: SETOF synxml.service_settings

    • Type: func

  • synxml.get_service_setting

    • Result data type: SETOF synxml.service_settings

    • Argument data types: service_name text

    • Type: func

  • synxml.help

    • Result data type: text

    • Type: func

  • synxml.help

    • Result data type: text

    • Argument data types: function_name text

    • Type: func

  • synxml.hybrid_search

    • Result data type: SETOF synxml.search_result

    • Argument data types: query text, tablename text, text_column text DEFAULT 'chunk'::text, embedding_column text DEFAULT 'embedding'::text, top_k integer DEFAULT 5, embedding_service_name text DEFAULT 'EMBED'::text, rerank_service_name text DEFAULT 'RERANK'::text

    • Type: func

  • synxml.job_info

    • Result data type: TABLE(...)

    • Argument data types: jobid text, jobman_service_name text DEFAULT 'JOBMAN'::text

    • Type: func

  • synxml.job_list

    • Result data type: SETOF synxml.job_status

    • Argument data types: jobman_service_name text DEFAULT 'JOBMAN'::text

    • Type: func

  • synxml.job_logs

    • Result data type: SETOF text

    • Argument data types: jobid text, limits integer DEFAULT 100, jobman_service_name text DEFAULT 'JOBMAN'::text

    • Type: func

  • synxml.job_status

    • Result data type: text

    • Argument data types: jobid text, jobman_service_name text DEFAULT 'JOBMAN'::text

    • Type: func

  • synxml.job_stop

    • Result data type: text

    • Argument data types: jobid text, jobman_service_name text DEFAULT 'JOBMAN'::text

    • Type: func

  • synxml.mlp_predict

    • Result data type: text

    • Argument data types: test_tblname text, model_name text, predict_config jsonb, output_tblname text, description text DEFAULT ''::text, jobman_service_name text DEFAULT 'JOBMAN'::text

    • Type: func

  • synxml.mlp_train

    • Result data type: text

    • Argument data types: train_config jsonb, train_tblname text, model_name text DEFAULT NULL::text, valid_tblname text DEFAULT NULL::text, description text DEFAULT ''::text, jobman_service_name text DEFAULT 'JOBMAN'::text

    • Type: func

  • synxml.multimodal_embed

    • Result data type: double precision[]

    • Argument data types: input_text text DEFAULT ''::text, input_image text DEFAULT ''::text, multimodal_embedding_service_name text DEFAULT 'MM_EMBED'::text

    • Type: func

  • synxml.multimodal_embed_size

    • Result data type: integer

    • Argument data types: multimodal_embedding_service_name text DEFAULT 'MM_EMBED'::text

    • Type: func

  • synxml.param_tune

    • Result data type: text

    • Argument data types: model_type text, train_config jsonb, train_tblname text, output_tblname text DEFAULT NULL::text, description text DEFAULT ''::text, jobman_service_name text DEFAULT 'JOBMAN'::text

    • Type: func

  • synxml.predict

    • Result data type: text

    • Argument data types: model_type text, test_tblname text, model_name text, predict_config jsonb, output_tblname text, description text DEFAULT ''::text, jobman_service_name text DEFAULT 'JOBMAN'::text

    • Type: func

  • synxml.register_customer_service

    • Result data type: text

    • Argument data types: name text, endpoint text, model_name text DEFAULT ''::text, api_key text DEFAULT ''::text, service_provider text DEFAULT 'openai'::text, db_role text DEFAULT CURRENT_USER

    • Type: func

  • synxml.rerank

    • Result data type: SETOF synxml.search_result

    • Argument data types: query text, docs text[], top_n integer DEFAULT 3, rerank_service_name text DEFAULT 'RERANK'::text

    • Type: func

  • synxml.semantic_search

    • Result data type: SETOF synxml.search_result

    • Argument data types: query text, tablename text, text_column text DEFAULT 'chunk'::text, embedding_column text DEFAULT 'embedding'::text, top_k integer DEFAULT 5, embedding_service_name text DEFAULT 'EMBEDD'::text

    • Type: func

  • synxml.summarize

    • Result data type: text

    • Argument data types: text text, max_length integer DEFAULT 100, temperature double precision DEFAULT 0.3, llm_service_name text DEFAULT 'LLM'::text

    • Type: func

  • synxml.train

    • Result data type: text

    • Argument data types: model_type text, train_config jsonb, train_tblname text, model_name text DEFAULT NULL::text, valid_tblname text DEFAULT NULL::text, description text DEFAULT ''::text, jobman_service_name text DEFAULT 'JOBMAN'::text

    • Type: func

  • synxml.transform

    • Result data type: text

    • Argument data types: task text, params jsonb, max_new_tokens integer DEFAULT 512, temperature double precision DEFAULT 0.3, llm_service_name text DEFAULT 'LLM'::text

    • Type: func

  • synxml.translate

    • Result data type: text

    • Argument data types: text text, source_language text, target_language text, temperature double precision DEFAULT 0.3, llm_service_name text DEFAULT 'LLM'::text

    • Type: func

  • synxml.unregister_customer_service

    • Result data type: void

    • Argument data types: service_name text

    • Type: func

  • synxml.xgboost_predict

    • Result data type: text

    • Argument data types: test_tblname text, model_name text, predict_config jsonb, output_tblname text, description text DEFAULT ''::text, jobman_service_name text DEFAULT 'JOBMAN'::text

    • Type: func

  • synxml.xgboost_train

    • Result data type: text

    • Argument data types: train_config jsonb, train_tblname text, model_name text DEFAULT NULL::text, valid_tblname text DEFAULT NULL::text, description text DEFAULT ''::text, jobman_service_name text DEFAULT 'JOBMAN'::text

    • Type: func

Supported algorithms

You can list all supported algorithms using the following query:

SELECT * FROM unnest(synxml.get_algorithms()) AS algorithm;

Supported algorithms include:

  • Regression:

    • ARDRegression

    • AdaBoostRegressor

    • BaggingRegressor

    • BayesianRidge

    • DLinearRegressor

    • DecisionTreeRegressor

    • ElasticNet

    • ExtraTreesRegressor

    • GaussianProcessRegressor

    • GradientBoostingRegressor

    • HuberRegressor

    • KNeighborsRegressor

    • KernelRidge

    • Lars

    • Lasso

    • LassoLars

    • LinearRegression

    • MLP

    • MeanShift

    • PassiveAggressiveRegressor

    • QuantileRegressor

    • RANSACRegressor

    • RNNBlockRegressor

    • RandomForestRegressor

    • Ridge

    • SGDRegressor

    • SVR

    • TheilSenRegressor

    • XGBoost

  • Classification:

    • AdaBoostClassifier

    • BaggingClassifier

    • CatBoost

    • DecisionTreeClassifier

    • ExtraTreesClassifier

    • GradientBoostingClassifier

    • KNeighborsClassifier

    • LightGBM

    • LogisticRegression

    • MLPClassifier

    • RandomForestClassifier

    • SVC

  • Clustering:

    • AffinityPropagation

    • AgglomerativeClustering

    • Birch

    • DBSCAN

    • KMeans

    • MiniBatchKMeans

    • OPTICS

    • SpectralClustering

  • Others:

    • Bert

    • CausalLM

    • OrthogonalMatchingPursuit

    • ResNet

    • TextEmbedder

Get help

To get help for a specific UDF:

-- Replace 'UDF' with the actual UDF name you want to get help for
SELECT synxml.help('UDF');