Vertex AI

Vertex AI is Google Cloud's unified ML platform — a fully managed environment for building, training, deploying, and monitoring machine learning models at scale. It covers the complete MLOps lifecycle: from AutoML and custom training jobs through to model serving, batch prediction, feature management, and pipeline orchestration. For data engineers, the most relevant components are Vertex Pipelines (KFP-based ML workflow orchestration), Batch Predictions (large-scale inference against BigQuery), Feature Store (reusable feature serving), and Vertex AI Search & Conversation (enterprise RAG). Integrates natively with BigQuery, Cloud Storage, Dataflow, and Google ADK.

GCP Vertex Pipelines (KFP) Feature Store AutoML Custom Training Model Registry

Core Concepts
Industry Use Cases
Code Examples
Comparison Table
Gotchas & Pitfalls
Exercises
Quiz
Further Reading

Core Concepts

1. AutoML

AutoML provides point-and-click model training for tabular, image, text, and video data with no code. Vertex AutoML Tabular runs NAS (Neural Architecture Search) and ensemble methods to find the best model automatically. Export the model as a Docker container or deploy directly to a managed endpoint. Best for teams without ML expertise who need a baseline model quickly.

2. Custom Training

Package your training code as a Docker container or Python module and submit it as a CustomTrainingJob. Vertex manages the compute (choose from CPU, GPU, TPU), streams logs to Cloud Logging, and writes outputs to Cloud Storage. Supports distributed training (TF ParameterServer, PyTorch DDP) and hyperparameter tuning via Vertex Vizier.

3. Model Registry & Artifacts

Every trained model is registered in the Model Registry with version tracking, evaluation metrics, and lineage to its training dataset. The ML Metadata Store tracks artifact lineage — which datasets, training runs, and parameters produced each model version. Essential for reproducibility and audit trails.

4. Endpoints & Serving

Serving Type	Latency	Scale	Use Case
Online Endpoints	~10–100 ms	Auto-scale to 0	Real-time inference, APIs
Batch Predictions	Minutes–hours	Parallelised by Dataflow	Offline scoring, BigQuery tables
Optimised Endpoints (vLLM)	<50 ms	GPU-backed	LLM serving, high-throughput generation

5. Vertex Pipelines

Vertex Pipelines is a serverless orchestration service for ML workflows, based on Kubeflow Pipelines (KFP) v2. Pipelines are defined as Python functions decorated with @dsl.component and assembled with @dsl.pipeline. Each component runs in an isolated container. Pipelines are compiled to YAML and submitted to the Vertex API. Output: a DAG visualisation, artifact tracking, and full rerunability.

6. Feature Store

Vertex AI Feature Store is a managed repository for ML features. Stores feature values in BigTable (online, low-latency) and BigQuery (offline, bulk export). Key concepts: Feature Group (logical group of features backed by a BigQuery table), Feature View (subset of features served to a model), online serving (single-entity lookup <10 ms), batch serving (bulk export for training).

↑ Back to top

Industry Use Cases

1. Automated ML Retraining Pipeline

A Vertex Pipeline runs weekly: extract training data from BigQuery → validate with TFX → train model → evaluate against champion → if challenger wins, register in Model Registry and deploy to endpoint. Fully automated CI/CD for ML models triggered by Cloud Scheduler.

2. Large-Scale Batch Prediction on BigQuery

Score millions of customer records daily for churn risk. A deployed model processes a BigQuery input table via Batch Prediction, writes scores to an output BigQuery table. Downstream dbt models join scores to business tables for BI dashboards. No servers to manage — scales horizontally.

3. Feature Reuse Across Multiple Models

A company builds customer lifetime value, churn, and upsell propensity models. By storing shared features (recency, frequency, monetary value, tenure) in Feature Store, each model fetches consistent, point-in-time correct features — preventing training-serving skew and reducing redundant feature engineering work.

4. Enterprise RAG with Vertex AI Search

Vertex AI Search ingests documents from Cloud Storage and BigQuery, builds a managed vector index with Gemini embeddings, and serves semantic search + grounded generation via API. For data engineers: connect internal data sources (BigQuery exports, GCS documents) to a zero-infra RAG system accessible to business users.

↑ Back to top

Code Examples

1. Custom Training Job

from google.cloud import aiplatform

aiplatform.init(project="my-project", location="us-central1")

job = aiplatform.CustomTrainingJob(
    display_name="churn-model-training",
    script_path="trainer/train.py",
    container_uri="us-docker.pkg.dev/vertex-ai/training/sklearn-cpu.1-3:latest",
    requirements=["scikit-learn==1.4.0", "pandas==2.2.0"],
    model_serving_container_image_uri="us-docker.pkg.dev/vertex-ai/prediction/sklearn-cpu.1-3:latest",
)

model = job.run(
    dataset=aiplatform.TabularDataset("projects/my-project/locations/us-central1/datasets/1234"),
    target_column="churned",
    training_fraction_split=0.8,
    validation_fraction_split=0.1,
    test_fraction_split=0.1,
    machine_type="n1-standard-4",
    sync=True,
)

print(f"Model: {model.resource_name}")

2. Deploy Model & Run Batch Prediction

from google.cloud import aiplatform

# Deploy to online endpoint
endpoint = model.deploy(
    machine_type="n1-standard-2",
    min_replica_count=1,
    max_replica_count=5,
    traffic_split={"0": 100},
    sync=True,
)

# Online prediction
prediction = endpoint.predict(instances=[
    {"recency_days": 45, "num_purchases": 3, "total_spend": 120.5},
])
print(prediction.predictions)

# Batch prediction from BigQuery
batch_job = model.batch_predict(
    job_display_name="churn-batch-score-2025-01",
    bigquery_source="bq://my-project.ml_features.customers_to_score",
    bigquery_destination_prefix="bq://my-project.ml_predictions",
    machine_type="n1-standard-4",
    starting_replica_count=2,
    max_replica_count=10,
    sync=True,
)

3. Vertex Pipelines (KFP v2)

from kfp import dsl
from kfp.dsl import component, pipeline, Output, Input, Model, Dataset
from google.cloud import aiplatform

@component(base_image="python:3.11", packages_to_install=["pandas", "google-cloud-bigquery"])
def extract_features(
    project: str,
    table: str,
    output_dataset: Output[Dataset],
):
    """Extract training features from BigQuery."""
    from google.cloud import bigquery
    import pandas as pd
    client = bigquery.Client(project=project)
    df = client.query(f"SELECT * FROM `{table}`").to_dataframe()
    df.to_csv(output_dataset.path, index=False)

@component(base_image="python:3.11", packages_to_install=["pandas", "scikit-learn"])
def train_model(
    input_dataset: Input[Dataset],
    output_model: Output[Model],
    target_column: str,
):
    """Train a scikit-learn classifier."""
    import pandas as pd, joblib
    from sklearn.ensemble import GradientBoostingClassifier
    df = pd.read_csv(input_dataset.path)
    X = df.drop(columns=[target_column])
    y = df[target_column]
    model = GradientBoostingClassifier(n_estimators=100)
    model.fit(X, y)
    joblib.dump(model, output_model.path)

@dsl.pipeline(name="churn-training-pipeline")
def churn_pipeline(project: str, bq_table: str, target_column: str):
    extract_task = extract_features(project=project, table=bq_table)
    train_task   = train_model(
        input_dataset=extract_task.outputs["output_dataset"],
        target_column=target_column,
    )

# Compile and submit
from kfp import compiler
compiler.Compiler().compile(churn_pipeline, "pipeline.yaml")

aiplatform.init(project="my-project", location="us-central1")
job = aiplatform.PipelineJob(
    display_name="churn-pipeline-run",
    template_path="pipeline.yaml",
    parameter_values={
        "project": "my-project",
        "bq_table": "my-project.ml.training_features",
        "target_column": "churned",
    },
)
job.submit()

4. Feature Store Online Serving

from google.cloud.aiplatform import FeatureOnlineStore
from google.cloud.aiplatform_v1beta1 import FeatureOnlineStoreServiceClient
from google.cloud.aiplatform_v1beta1.types import FetchFeatureValuesRequest

# Assume Feature Group / View already provisioned via Terraform or console
PROJECT = "my-project"
LOCATION = "us-central1"
STORE_ID = "customer_features_store"
FEATURE_VIEW_ID = "churn_features"

client = FeatureOnlineStoreServiceClient(
    client_options={"api_endpoint": f"{LOCATION}-aiplatform.googleapis.com"}
)

# Fetch features for a single entity in <10 ms
response = client.fetch_feature_values(
    request=FetchFeatureValuesRequest(
        feature_view=(
            f"projects/{PROJECT}/locations/{LOCATION}"
            f"/featureOnlineStores/{STORE_ID}/featureViews/{FEATURE_VIEW_ID}"
        ),
        data_key=FetchFeatureValuesRequest.DataKey(key="customer_123"),
    )
)
print(response.key_values)

↑ Back to top

Comparison Table

Platform	Cloud	Pipelines	Feature Store	LLM / GenAI Hosting	Best For
Vertex AI	GCP	KFP v2 (managed)	Yes (BigTable + BQ)	Gemini + OSS via Model Garden	GCP-native MLOps & GenAI
AWS SageMaker	AWS	SageMaker Pipelines	Yes (online + offline)	Bedrock (Claude, Titan, Llama)	AWS-native MLOps
Azure ML	Azure	Azure ML Pipelines	Yes	Azure OpenAI + OSS	Azure-native, enterprise security
Databricks ML	Multi-cloud	MLflow Projects / Jobs	Feature Engineering (MLflow)	DBRX, Llama via Model Serving	Lakehouse-centric ML
MLflow (self-hosted)	Any	Manual (Airflow/Prefect)	No	Via serving plugins	Open-source experiment tracking

↑ Back to top

Gotchas & Pitfalls

Training-serving skew: Features computed differently in training (batch, Pandas) vs serving (online, Feature Store) lead to silent model degradation. Always use Feature Store for both training export and online serving, or validate distributions with Vertex Model Monitoring.
Pipeline component caching: Vertex Pipelines caches component outputs by default. If upstream data changes but cache keys don't, re-runs silently use stale outputs. Disable caching (enable_caching=False) on data-extraction components, or make cache keys data-content-sensitive.
Batch prediction input schema drift: If the BigQuery source table for batch prediction has new or reordered columns, prediction fails with a schema mismatch error. Pin the column list in the prediction job and validate with a schema check step before every batch run.
Endpoint cold start: Auto-scaling to zero saves cost but add 30–90 second cold start latency on the first request. For latency-SLA use cases, set min_replica_count=1.
Quota limits on custom training: GPU quota in each region is limited by default. Request quota increases proactively — lead time can be days. Use committed use discounts for predictable training workloads to lower cost and secure priority allocation.

↑ Back to top

Exercises

End-to-End Pipeline: Implement a 3-component Vertex Pipeline (extract from BigQuery, train a scikit-learn model, evaluate and register in Model Registry). Use the Vertex AI Workbench notebook to develop and test each component locally before compiling.
Batch Scoring Automation: Deploy any registered model and set up a Cloud Scheduler job that triggers a batch prediction daily against a BigQuery source table. Write the results to a separate BigQuery table and verify with a dbt test checking that output row count = input row count.
Feature Store Integration: Create a Feature Group backed by a BigQuery table with at least 5 customer features. Provision an online serving endpoint and write a Python script that fetches features for a given customer ID and passes them to a deployed model for real-time scoring.

↑ Back to top

Quiz

What is the difference between an Online Endpoint and Batch Prediction in Vertex AI?
Answer: Online Endpoints serve synchronous, low-latency predictions (one or a few records at a time, ~10–100 ms). Batch Prediction processes large datasets asynchronously (BigQuery tables, Cloud Storage files), at high throughput but higher latency (minutes to hours). Online is for real-time APIs; Batch for offline scoring.
What problem does Vertex AI Feature Store solve?
Answer: It prevents training-serving skew by storing feature values in a single system used for both training data export (offline/batch) and low-latency online serving. It also enables feature reuse across multiple models, reducing duplicated feature engineering.
Why does Vertex Pipelines component caching sometimes cause stale results?
Answer: Components are cached by their input parameters and code hash. If the underlying data in BigQuery changes but the query string doesn't, the cached output is reused. To avoid this, disable caching on data-extraction components or include a time-based parameter (e.g. run date) in the component inputs.
What is the relationship between Vertex AI and Kubeflow Pipelines (KFP)?
Answer: Vertex Pipelines is the managed, serverless compute layer for running KFP v2 pipelines. Pipelines are written using the KFP SDK (@dsl.component, @dsl.pipeline) and compiled to YAML, then submitted to the Vertex Pipelines API which handles all infrastructure provisioning.
How does Vertex AI Model Monitoring detect training-serving skew?
Answer: It compares the statistical distribution of model inputs at serving time (sampled from online endpoint traffic) against a training baseline dataset. When a configurable drift threshold (e.g. Jensen-Shannon divergence > 0.1) is exceeded for a feature, it sends an alert to Cloud Monitoring / email.

↑ Back to top

Vertex AI

Table of Contents

Core Concepts

1. AutoML

2. Custom Training

3. Model Registry & Artifacts

4. Endpoints & Serving

5. Vertex Pipelines

6. Feature Store

Industry Use Cases

1. Automated ML Retraining Pipeline

2. Large-Scale Batch Prediction on BigQuery

3. Feature Reuse Across Multiple Models

4. Enterprise RAG with Vertex AI Search

Code Examples

1. Custom Training Job

2. Deploy Model & Run Batch Prediction

3. Vertex Pipelines (KFP v2)

4. Feature Store Online Serving

Comparison Table

Gotchas & Pitfalls

Exercises

Quiz

Further Reading