Vertex AI is Google Cloud's unified ML platform — a fully managed environment for building, training, deploying, and monitoring machine learning models at scale. It covers the complete MLOps lifecycle: from AutoML and custom training jobs through to model serving, batch prediction, feature management, and pipeline orchestration. For data engineers, the most relevant components are Vertex Pipelines (KFP-based ML workflow orchestration), Batch Predictions (large-scale inference against BigQuery), Feature Store (reusable feature serving), and Vertex AI Search & Conversation (enterprise RAG). Integrates natively with BigQuery, Cloud Storage, Dataflow, and Google ADK.
GCP Vertex Pipelines (KFP) Feature Store AutoML Custom Training Model Registry
AutoML provides point-and-click model training for tabular, image, text, and video data with no code. Vertex AutoML Tabular runs NAS (Neural Architecture Search) and ensemble methods to find the best model automatically. Export the model as a Docker container or deploy directly to a managed endpoint. Best for teams without ML expertise who need a baseline model quickly.
Package your training code as a Docker container or Python module and submit it as a CustomTrainingJob. Vertex manages the compute (choose from CPU, GPU, TPU), streams logs to Cloud Logging, and writes outputs to Cloud Storage. Supports distributed training (TF ParameterServer, PyTorch DDP) and hyperparameter tuning via Vertex Vizier.
Every trained model is registered in the Model Registry with version tracking, evaluation metrics, and lineage to its training dataset. The ML Metadata Store tracks artifact lineage — which datasets, training runs, and parameters produced each model version. Essential for reproducibility and audit trails.
| Serving Type | Latency | Scale | Use Case |
|---|---|---|---|
| Online Endpoints | ~10–100 ms | Auto-scale to 0 | Real-time inference, APIs |
| Batch Predictions | Minutes–hours | Parallelised by Dataflow | Offline scoring, BigQuery tables |
| Optimised Endpoints (vLLM) | <50 ms | GPU-backed | LLM serving, high-throughput generation |
Vertex Pipelines is a serverless orchestration service for ML workflows, based on Kubeflow Pipelines (KFP) v2. Pipelines are defined as Python functions decorated with @dsl.component and assembled with @dsl.pipeline. Each component runs in an isolated container. Pipelines are compiled to YAML and submitted to the Vertex API. Output: a DAG visualisation, artifact tracking, and full rerunability.
Vertex AI Feature Store is a managed repository for ML features. Stores feature values in BigTable (online, low-latency) and BigQuery (offline, bulk export). Key concepts: Feature Group (logical group of features backed by a BigQuery table), Feature View (subset of features served to a model), online serving (single-entity lookup <10 ms), batch serving (bulk export for training).
↑ Back to topA Vertex Pipeline runs weekly: extract training data from BigQuery → validate with TFX → train model → evaluate against champion → if challenger wins, register in Model Registry and deploy to endpoint. Fully automated CI/CD for ML models triggered by Cloud Scheduler.
Score millions of customer records daily for churn risk. A deployed model processes a BigQuery input table via Batch Prediction, writes scores to an output BigQuery table. Downstream dbt models join scores to business tables for BI dashboards. No servers to manage — scales horizontally.
A company builds customer lifetime value, churn, and upsell propensity models. By storing shared features (recency, frequency, monetary value, tenure) in Feature Store, each model fetches consistent, point-in-time correct features — preventing training-serving skew and reducing redundant feature engineering work.
Vertex AI Search ingests documents from Cloud Storage and BigQuery, builds a managed vector index with Gemini embeddings, and serves semantic search + grounded generation via API. For data engineers: connect internal data sources (BigQuery exports, GCS documents) to a zero-infra RAG system accessible to business users.
↑ Back to topfrom google.cloud import aiplatform
aiplatform.init(project="my-project", location="us-central1")
job = aiplatform.CustomTrainingJob(
display_name="churn-model-training",
script_path="trainer/train.py",
container_uri="us-docker.pkg.dev/vertex-ai/training/sklearn-cpu.1-3:latest",
requirements=["scikit-learn==1.4.0", "pandas==2.2.0"],
model_serving_container_image_uri="us-docker.pkg.dev/vertex-ai/prediction/sklearn-cpu.1-3:latest",
)
model = job.run(
dataset=aiplatform.TabularDataset("projects/my-project/locations/us-central1/datasets/1234"),
target_column="churned",
training_fraction_split=0.8,
validation_fraction_split=0.1,
test_fraction_split=0.1,
machine_type="n1-standard-4",
sync=True,
)
print(f"Model: {model.resource_name}")
from google.cloud import aiplatform
# Deploy to online endpoint
endpoint = model.deploy(
machine_type="n1-standard-2",
min_replica_count=1,
max_replica_count=5,
traffic_split={"0": 100},
sync=True,
)
# Online prediction
prediction = endpoint.predict(instances=[
{"recency_days": 45, "num_purchases": 3, "total_spend": 120.5},
])
print(prediction.predictions)
# Batch prediction from BigQuery
batch_job = model.batch_predict(
job_display_name="churn-batch-score-2025-01",
bigquery_source="bq://my-project.ml_features.customers_to_score",
bigquery_destination_prefix="bq://my-project.ml_predictions",
machine_type="n1-standard-4",
starting_replica_count=2,
max_replica_count=10,
sync=True,
)
from kfp import dsl
from kfp.dsl import component, pipeline, Output, Input, Model, Dataset
from google.cloud import aiplatform
@component(base_image="python:3.11", packages_to_install=["pandas", "google-cloud-bigquery"])
def extract_features(
project: str,
table: str,
output_dataset: Output[Dataset],
):
"""Extract training features from BigQuery."""
from google.cloud import bigquery
import pandas as pd
client = bigquery.Client(project=project)
df = client.query(f"SELECT * FROM `{table}`").to_dataframe()
df.to_csv(output_dataset.path, index=False)
@component(base_image="python:3.11", packages_to_install=["pandas", "scikit-learn"])
def train_model(
input_dataset: Input[Dataset],
output_model: Output[Model],
target_column: str,
):
"""Train a scikit-learn classifier."""
import pandas as pd, joblib
from sklearn.ensemble import GradientBoostingClassifier
df = pd.read_csv(input_dataset.path)
X = df.drop(columns=[target_column])
y = df[target_column]
model = GradientBoostingClassifier(n_estimators=100)
model.fit(X, y)
joblib.dump(model, output_model.path)
@dsl.pipeline(name="churn-training-pipeline")
def churn_pipeline(project: str, bq_table: str, target_column: str):
extract_task = extract_features(project=project, table=bq_table)
train_task = train_model(
input_dataset=extract_task.outputs["output_dataset"],
target_column=target_column,
)
# Compile and submit
from kfp import compiler
compiler.Compiler().compile(churn_pipeline, "pipeline.yaml")
aiplatform.init(project="my-project", location="us-central1")
job = aiplatform.PipelineJob(
display_name="churn-pipeline-run",
template_path="pipeline.yaml",
parameter_values={
"project": "my-project",
"bq_table": "my-project.ml.training_features",
"target_column": "churned",
},
)
job.submit()
from google.cloud.aiplatform import FeatureOnlineStore
from google.cloud.aiplatform_v1beta1 import FeatureOnlineStoreServiceClient
from google.cloud.aiplatform_v1beta1.types import FetchFeatureValuesRequest
# Assume Feature Group / View already provisioned via Terraform or console
PROJECT = "my-project"
LOCATION = "us-central1"
STORE_ID = "customer_features_store"
FEATURE_VIEW_ID = "churn_features"
client = FeatureOnlineStoreServiceClient(
client_options={"api_endpoint": f"{LOCATION}-aiplatform.googleapis.com"}
)
# Fetch features for a single entity in <10 ms
response = client.fetch_feature_values(
request=FetchFeatureValuesRequest(
feature_view=(
f"projects/{PROJECT}/locations/{LOCATION}"
f"/featureOnlineStores/{STORE_ID}/featureViews/{FEATURE_VIEW_ID}"
),
data_key=FetchFeatureValuesRequest.DataKey(key="customer_123"),
)
)
print(response.key_values)
↑ Back to top
| Platform | Cloud | Pipelines | Feature Store | LLM / GenAI Hosting | Best For |
|---|---|---|---|---|---|
| Vertex AI | GCP | KFP v2 (managed) | Yes (BigTable + BQ) | Gemini + OSS via Model Garden | GCP-native MLOps & GenAI |
| AWS SageMaker | AWS | SageMaker Pipelines | Yes (online + offline) | Bedrock (Claude, Titan, Llama) | AWS-native MLOps |
| Azure ML | Azure | Azure ML Pipelines | Yes | Azure OpenAI + OSS | Azure-native, enterprise security |
| Databricks ML | Multi-cloud | MLflow Projects / Jobs | Feature Engineering (MLflow) | DBRX, Llama via Model Serving | Lakehouse-centric ML |
| MLflow (self-hosted) | Any | Manual (Airflow/Prefect) | No | Via serving plugins | Open-source experiment tracking |
Vertex Model Monitoring.enable_caching=False) on data-extraction components, or make cache keys data-content-sensitive.min_replica_count=1.committed use discounts for predictable training workloads to lower cost and secure priority allocation.@dsl.component, @dsl.pipeline) and compiled to YAML, then submitted to the Vertex Pipelines API which handles all infrastructure provisioning.