Vertex AI

Vertex AI is Google Cloud's unified ML platform — a fully managed environment for building, training, deploying, and monitoring machine learning models at scale. It covers the complete MLOps lifecycle: from AutoML and custom training jobs through to model serving, batch prediction, feature management, and pipeline orchestration. For data engineers, the most relevant components are Vertex Pipelines (KFP-based ML workflow orchestration), Batch Predictions (large-scale inference against BigQuery), Feature Store (reusable feature serving), and Vertex AI Search & Conversation (enterprise RAG). Integrates natively with BigQuery, Cloud Storage, Dataflow, and Google ADK.

GCP Vertex Pipelines (KFP) Feature Store AutoML Custom Training Model Registry

Table of Contents

  1. Core Concepts
  2. Industry Use Cases
  3. Code Examples
  4. Comparison Table
  5. Gotchas & Pitfalls
  6. Exercises
  7. Quiz
  8. Further Reading

Core Concepts

1. AutoML

AutoML provides point-and-click model training for tabular, image, text, and video data with no code. Vertex AutoML Tabular runs NAS (Neural Architecture Search) and ensemble methods to find the best model automatically. Export the model as a Docker container or deploy directly to a managed endpoint. Best for teams without ML expertise who need a baseline model quickly.

2. Custom Training

Package your training code as a Docker container or Python module and submit it as a CustomTrainingJob. Vertex manages the compute (choose from CPU, GPU, TPU), streams logs to Cloud Logging, and writes outputs to Cloud Storage. Supports distributed training (TF ParameterServer, PyTorch DDP) and hyperparameter tuning via Vertex Vizier.

3. Model Registry & Artifacts

Every trained model is registered in the Model Registry with version tracking, evaluation metrics, and lineage to its training dataset. The ML Metadata Store tracks artifact lineage — which datasets, training runs, and parameters produced each model version. Essential for reproducibility and audit trails.

4. Endpoints & Serving

Serving TypeLatencyScaleUse Case
Online Endpoints~10–100 msAuto-scale to 0Real-time inference, APIs
Batch PredictionsMinutes–hoursParallelised by DataflowOffline scoring, BigQuery tables
Optimised Endpoints (vLLM)<50 msGPU-backedLLM serving, high-throughput generation

5. Vertex Pipelines

Vertex Pipelines is a serverless orchestration service for ML workflows, based on Kubeflow Pipelines (KFP) v2. Pipelines are defined as Python functions decorated with @dsl.component and assembled with @dsl.pipeline. Each component runs in an isolated container. Pipelines are compiled to YAML and submitted to the Vertex API. Output: a DAG visualisation, artifact tracking, and full rerunability.

6. Feature Store

Vertex AI Feature Store is a managed repository for ML features. Stores feature values in BigTable (online, low-latency) and BigQuery (offline, bulk export). Key concepts: Feature Group (logical group of features backed by a BigQuery table), Feature View (subset of features served to a model), online serving (single-entity lookup <10 ms), batch serving (bulk export for training).

↑ Back to top

Industry Use Cases

1. Automated ML Retraining Pipeline

A Vertex Pipeline runs weekly: extract training data from BigQuery → validate with TFX → train model → evaluate against champion → if challenger wins, register in Model Registry and deploy to endpoint. Fully automated CI/CD for ML models triggered by Cloud Scheduler.

2. Large-Scale Batch Prediction on BigQuery

Score millions of customer records daily for churn risk. A deployed model processes a BigQuery input table via Batch Prediction, writes scores to an output BigQuery table. Downstream dbt models join scores to business tables for BI dashboards. No servers to manage — scales horizontally.

3. Feature Reuse Across Multiple Models

A company builds customer lifetime value, churn, and upsell propensity models. By storing shared features (recency, frequency, monetary value, tenure) in Feature Store, each model fetches consistent, point-in-time correct features — preventing training-serving skew and reducing redundant feature engineering work.

4. Enterprise RAG with Vertex AI Search

Vertex AI Search ingests documents from Cloud Storage and BigQuery, builds a managed vector index with Gemini embeddings, and serves semantic search + grounded generation via API. For data engineers: connect internal data sources (BigQuery exports, GCS documents) to a zero-infra RAG system accessible to business users.

↑ Back to top

Code Examples

1. Custom Training Job

from google.cloud import aiplatform

aiplatform.init(project="my-project", location="us-central1")

job = aiplatform.CustomTrainingJob(
    display_name="churn-model-training",
    script_path="trainer/train.py",
    container_uri="us-docker.pkg.dev/vertex-ai/training/sklearn-cpu.1-3:latest",
    requirements=["scikit-learn==1.4.0", "pandas==2.2.0"],
    model_serving_container_image_uri="us-docker.pkg.dev/vertex-ai/prediction/sklearn-cpu.1-3:latest",
)

model = job.run(
    dataset=aiplatform.TabularDataset("projects/my-project/locations/us-central1/datasets/1234"),
    target_column="churned",
    training_fraction_split=0.8,
    validation_fraction_split=0.1,
    test_fraction_split=0.1,
    machine_type="n1-standard-4",
    sync=True,
)

print(f"Model: {model.resource_name}")

2. Deploy Model & Run Batch Prediction

from google.cloud import aiplatform

# Deploy to online endpoint
endpoint = model.deploy(
    machine_type="n1-standard-2",
    min_replica_count=1,
    max_replica_count=5,
    traffic_split={"0": 100},
    sync=True,
)

# Online prediction
prediction = endpoint.predict(instances=[
    {"recency_days": 45, "num_purchases": 3, "total_spend": 120.5},
])
print(prediction.predictions)

# Batch prediction from BigQuery
batch_job = model.batch_predict(
    job_display_name="churn-batch-score-2025-01",
    bigquery_source="bq://my-project.ml_features.customers_to_score",
    bigquery_destination_prefix="bq://my-project.ml_predictions",
    machine_type="n1-standard-4",
    starting_replica_count=2,
    max_replica_count=10,
    sync=True,
)

3. Vertex Pipelines (KFP v2)

from kfp import dsl
from kfp.dsl import component, pipeline, Output, Input, Model, Dataset
from google.cloud import aiplatform

@component(base_image="python:3.11", packages_to_install=["pandas", "google-cloud-bigquery"])
def extract_features(
    project: str,
    table: str,
    output_dataset: Output[Dataset],
):
    """Extract training features from BigQuery."""
    from google.cloud import bigquery
    import pandas as pd
    client = bigquery.Client(project=project)
    df = client.query(f"SELECT * FROM `{table}`").to_dataframe()
    df.to_csv(output_dataset.path, index=False)

@component(base_image="python:3.11", packages_to_install=["pandas", "scikit-learn"])
def train_model(
    input_dataset: Input[Dataset],
    output_model: Output[Model],
    target_column: str,
):
    """Train a scikit-learn classifier."""
    import pandas as pd, joblib
    from sklearn.ensemble import GradientBoostingClassifier
    df = pd.read_csv(input_dataset.path)
    X = df.drop(columns=[target_column])
    y = df[target_column]
    model = GradientBoostingClassifier(n_estimators=100)
    model.fit(X, y)
    joblib.dump(model, output_model.path)

@dsl.pipeline(name="churn-training-pipeline")
def churn_pipeline(project: str, bq_table: str, target_column: str):
    extract_task = extract_features(project=project, table=bq_table)
    train_task   = train_model(
        input_dataset=extract_task.outputs["output_dataset"],
        target_column=target_column,
    )

# Compile and submit
from kfp import compiler
compiler.Compiler().compile(churn_pipeline, "pipeline.yaml")

aiplatform.init(project="my-project", location="us-central1")
job = aiplatform.PipelineJob(
    display_name="churn-pipeline-run",
    template_path="pipeline.yaml",
    parameter_values={
        "project": "my-project",
        "bq_table": "my-project.ml.training_features",
        "target_column": "churned",
    },
)
job.submit()

4. Feature Store Online Serving

from google.cloud.aiplatform import FeatureOnlineStore
from google.cloud.aiplatform_v1beta1 import FeatureOnlineStoreServiceClient
from google.cloud.aiplatform_v1beta1.types import FetchFeatureValuesRequest

# Assume Feature Group / View already provisioned via Terraform or console
PROJECT = "my-project"
LOCATION = "us-central1"
STORE_ID = "customer_features_store"
FEATURE_VIEW_ID = "churn_features"

client = FeatureOnlineStoreServiceClient(
    client_options={"api_endpoint": f"{LOCATION}-aiplatform.googleapis.com"}
)

# Fetch features for a single entity in <10 ms
response = client.fetch_feature_values(
    request=FetchFeatureValuesRequest(
        feature_view=(
            f"projects/{PROJECT}/locations/{LOCATION}"
            f"/featureOnlineStores/{STORE_ID}/featureViews/{FEATURE_VIEW_ID}"
        ),
        data_key=FetchFeatureValuesRequest.DataKey(key="customer_123"),
    )
)
print(response.key_values)
↑ Back to top

Comparison Table

Platform Cloud Pipelines Feature Store LLM / GenAI Hosting Best For
Vertex AI GCP KFP v2 (managed) Yes (BigTable + BQ) Gemini + OSS via Model Garden GCP-native MLOps & GenAI
AWS SageMaker AWS SageMaker Pipelines Yes (online + offline) Bedrock (Claude, Titan, Llama) AWS-native MLOps
Azure ML Azure Azure ML Pipelines Yes Azure OpenAI + OSS Azure-native, enterprise security
Databricks ML Multi-cloud MLflow Projects / Jobs Feature Engineering (MLflow) DBRX, Llama via Model Serving Lakehouse-centric ML
MLflow (self-hosted) Any Manual (Airflow/Prefect) No Via serving plugins Open-source experiment tracking
↑ Back to top

Gotchas & Pitfalls

↑ Back to top

Exercises

  1. End-to-End Pipeline: Implement a 3-component Vertex Pipeline (extract from BigQuery, train a scikit-learn model, evaluate and register in Model Registry). Use the Vertex AI Workbench notebook to develop and test each component locally before compiling.
  2. Batch Scoring Automation: Deploy any registered model and set up a Cloud Scheduler job that triggers a batch prediction daily against a BigQuery source table. Write the results to a separate BigQuery table and verify with a dbt test checking that output row count = input row count.
  3. Feature Store Integration: Create a Feature Group backed by a BigQuery table with at least 5 customer features. Provision an online serving endpoint and write a Python script that fetches features for a given customer ID and passes them to a deployed model for real-time scoring.
↑ Back to top

Quiz

  1. What is the difference between an Online Endpoint and Batch Prediction in Vertex AI?
    Answer: Online Endpoints serve synchronous, low-latency predictions (one or a few records at a time, ~10–100 ms). Batch Prediction processes large datasets asynchronously (BigQuery tables, Cloud Storage files), at high throughput but higher latency (minutes to hours). Online is for real-time APIs; Batch for offline scoring.
  2. What problem does Vertex AI Feature Store solve?
    Answer: It prevents training-serving skew by storing feature values in a single system used for both training data export (offline/batch) and low-latency online serving. It also enables feature reuse across multiple models, reducing duplicated feature engineering.
  3. Why does Vertex Pipelines component caching sometimes cause stale results?
    Answer: Components are cached by their input parameters and code hash. If the underlying data in BigQuery changes but the query string doesn't, the cached output is reused. To avoid this, disable caching on data-extraction components or include a time-based parameter (e.g. run date) in the component inputs.
  4. What is the relationship between Vertex AI and Kubeflow Pipelines (KFP)?
    Answer: Vertex Pipelines is the managed, serverless compute layer for running KFP v2 pipelines. Pipelines are written using the KFP SDK (@dsl.component, @dsl.pipeline) and compiled to YAML, then submitted to the Vertex Pipelines API which handles all infrastructure provisioning.
  5. How does Vertex AI Model Monitoring detect training-serving skew?
    Answer: It compares the statistical distribution of model inputs at serving time (sampled from online endpoint traffic) against a training baseline dataset. When a configurable drift threshold (e.g. Jensen-Shannon divergence > 0.1) is exceeded for a feature, it sends an alert to Cloud Monitoring / email.
↑ Back to top

Further Reading

↑ Back to top