Skip to content

Temporal Upgrade Guide

This guide provides detailed instructions for upgrading your Temporal deployment from version 1.27.x to 1.29.1, including prerequisites, step-by-step procedures, rollback strategies, and troubleshooting.

Overview

Upgrading Temporal requires careful planning and execution to ensure zero downtime and data integrity. This guide covers:

  • Pre-upgrade checklist and preparation
  • Database schema migrations
  • Server component upgrades
  • Worker and client updates
  • Validation and verification
  • Rollback procedures

Upgrade Path: 1.27.x → 1.29.1

Version Compatibility Matrix

Current Version Target Version Direct Upgrade Notes
1.27.x 1.29.1 ✅ Yes Requires schema migration
1.26.x 1.29.1 ✅ Yes Requires schema migration
1.25.x 1.29.1 ⚠️ Not recommended Upgrade to 1.27 first
<1.25 1.29.1 ❌ No Multi-step upgrade required

Breaking Changes: 1.27 → 1.29

1. Schema Changes (1.28+)

  • PostgreSQL Schema: v1.16 → v1.17
  • MySQL Schema: v1.16 → v1.17
  • Cassandra Schema: v1.11 → v1.12
  • Visibility Schema: Updates for improved query performance

2. Docker Image Changes (1.29+)

  • Slimmed images with reduced dependencies
  • Separate admin-tools image required
  • Base image changes may affect custom Dockerfiles

3. Metrics Changes (1.29)

  • Some metrics renamed for consistency
  • Enhanced cardinality control
  • Legacy metrics format deprecated

4. Configuration Changes

  • New eager workflow start settings (enabled by default)
  • Worker versioning configuration options
  • Enhanced authorization plugin API

Pre-Upgrade Checklist

1. Environment Assessment

# Check current Temporal version
kubectl get deployment temporal-frontend -n temporal-backend -o jsonpath='{.spec.template.spec.containers[0].image}'

# Check database schema version
kubectl exec -it deployment/temporal-admintools -n temporal-backend -- \
  temporal-sql-tool --plugin postgres12 \
  --ep postgresql.example.com \
  -u temporal \
  --db temporal \
  show-schema-version

# Check current resource usage
kubectl top pods -n temporal-backend
kubectl top nodes

2. Backup Strategy

Database Backup

# PostgreSQL backup
kubectl exec -it postgresql-primary-0 -n temporal-backend -- \
  pg_dump -U temporal -Fc temporal > temporal-backup-$(date +%Y%m%d-%H%M%S).dump

# Verify backup
pg_restore --list temporal-backup-*.dump | head -20

# Store backup securely
aws s3 cp temporal-backup-*.dump s3://your-backup-bucket/temporal/$(date +%Y%m%d)/

Configuration Backup

# Backup Helm values
helm get values temporal -n temporal-backend > temporal-values-backup-$(date +%Y%m%d).yaml

# Backup Kubernetes resources
kubectl get all,configmap,secret -n temporal-backend -o yaml > k8s-resources-backup-$(date +%Y%m%d).yaml

# Backup custom configurations
cp -r /etc/temporal/config /backup/temporal-config-$(date +%Y%m%d)

Workflow State Verification

# Export critical workflow states
kubectl exec -it deployment/temporal-admintools -n temporal-backend -- \
  tctl workflow list --query 'ExecutionStatus="Running"' --more --pagesize 1000 \
  > running-workflows-$(date +%Y%m%d).txt

# Count running workflows by type
kubectl exec -it deployment/temporal-admintools -n temporal-backend -- \
  tctl workflow list --query 'ExecutionStatus="Running"' | \
  jq -r '.WorkflowType' | sort | uniq -c

3. Staging Environment Testing

Critical: Always test the upgrade in a staging environment first!

# Clone production data to staging (PostgreSQL example)
pg_dump -U temporal -h prod-postgres.example.com temporal | \
  psql -U temporal -h staging-postgres.example.com temporal

# Apply upgrade in staging
# (Follow upgrade procedures in staging first)

# Validate staging environment
./scripts/validate-staging.sh

Upgrade Procedures

Step 1: Prepare the Upgrade

1.1. Download Schema Migration Files

# Download Temporal schema repository
git clone https://github.com/temporalio/temporal.git
cd temporal/schema

# Checkout the target version
git checkout v1.29.1

# Verify schema files
ls -la postgresql/v12/temporal/versioned/v1.17/
ls -la postgresql/v12/visibility/versioned/v1.17/

1.2. Review Migration Scripts

# Review schema changes
cat postgresql/v12/temporal/versioned/v1.17/*.sql
cat postgresql/v12/visibility/versioned/v1.17/*.sql

# Check for breaking changes
grep -i "drop\|alter\|rename" postgresql/v12/temporal/versioned/v1.17/*.sql

1.3. Schedule Maintenance Window

For production environments: - Recommended window: 2-4 hours - Low-traffic period: Preferred - Communication: Notify stakeholders - Rollback time: Reserve 50% of window for potential rollback

Step 2: Database Schema Migration

2.1. Pre-Migration Validation

# Verify database connectivity
temporal-sql-tool --plugin postgres12 \
  --ep postgresql.example.com \
  -u temporal \
  --db temporal \
  validate

# Check current schema version
temporal-sql-tool --plugin postgres12 \
  --ep postgresql.example.com \
  -u temporal \
  --db temporal \
  show-schema-version

# Expected output for 1.27.x:
# Current database schema version: 1.16

2.2. Dry-Run Migration

# Dry-run schema update (no changes applied)
temporal-sql-tool --plugin postgres12 \
  --ep postgresql.example.com \
  -u temporal \
  -p 5432 \
  --db temporal \
  --tls \
  --tls-cert-file /path/to/client.crt \
  --tls-key-file /path/to/client.key \
  --tls-ca-file /path/to/ca.crt \
  update-schema \
  -d ./postgresql/v12/temporal/versioned \
  --dry-run

# Review the output carefully

2.3. Apply Schema Migration

# Migrate default store (temporal database)
temporal-sql-tool --plugin postgres12 \
  --ep postgresql.example.com \
  -u temporal \
  -p 5432 \
  --db temporal \
  --tls \
  --tls-cert-file /path/to/client.crt \
  --tls-key-file /path/to/client.key \
  --tls-ca-file /path/to/ca.crt \
  update-schema \
  -d ./postgresql/v12/temporal/versioned

# Migrate visibility store
temporal-sql-tool --plugin postgres12 \
  --ep postgresql.example.com \
  -u temporal \
  -p 5432 \
  --db temporal_visibility \
  --tls \
  --tls-cert-file /path/to/client.crt \
  --tls-key-file /path/to/client.key \
  --tls-ca-file /path/to/ca.crt \
  update-schema \
  -d ./postgresql/v12/visibility/versioned

2.4. Verify Schema Migration

# Verify new schema version
temporal-sql-tool --plugin postgres12 \
  --ep postgresql.example.com \
  -u temporal \
  --db temporal \
  show-schema-version

# Expected output:
# Current database schema version: 1.17

# Verify tables and indexes
psql -U temporal -h postgresql.example.com -d temporal -c "\dt"
psql -U temporal -h postgresql.example.com -d temporal -c "\di"

Step 3: Update Helm Values

3.1. Create Updated Values File

# values-1.29.yaml
server:
  image:
    repository: temporalio/server
    tag: 1.29.1
    pullPolicy: IfNotPresent

  replicaCount: 3

  config:
    # Enable eager workflow start (new default in 1.29)
    services:
      frontend:
        eagerWorkflowStartEnabled: true
        rateLimit:
          eagerWorkflowStart:
            maxPerSecond: 100
            burstSize: 200

    # Existing persistence configuration
    persistence:
      defaultStore: default
      visibilityStore: visibility
      numHistoryShards: 4096

      datastores:
        default:
          driver: "postgres12"
          host: "postgresql.example.com"
          port: 5432
          database: "temporal"
          user: "temporal"
          existingSecret: "temporal-default-store"
          maxConns: 50
          maxIdleConns: 10
          maxConnLifetime: "1h"

        visibility:
          driver: "postgres12"
          host: "postgresql.example.com"
          port: 5432
          database: "temporal_visibility"
          user: "temporal"
          existingSecret: "temporal-visibility-store"
          maxConns: 20
          maxIdleConns: 5
          maxConnLifetime: "1h"

    # Update TLS settings to 1.3
    global:
      tls:
        internode:
          server:
            minVersion: "1.3"
            certFile: /etc/temporal/certs/tls.crt
            keyFile: /etc/temporal/certs/tls.key
            clientCAFile: /etc/temporal/certs/ca.crt
          client:
            minVersion: "1.3"
            certFile: /etc/temporal/certs/tls.crt
            keyFile: /etc/temporal/certs/tls.key
            caFile: /etc/temporal/certs/ca.crt

admintools:
  image:
    repository: temporalio/admin-tools
    tag: 1.29.1-tctl-1.18.2-cli-1.3.0
    pullPolicy: IfNotPresent

web:
  image:
    repository: temporalio/ui
    tag: 2.40.0  # Latest UI version
    pullPolicy: IfNotPresent

3.2. Diff Current and New Configuration

# Compare configurations
helm get values temporal -n temporal-backend > current-values.yaml
diff -u current-values.yaml values-1.29.yaml

# Review differences carefully

Step 4: Rolling Upgrade of Temporal Server

4.1. Update Admin Tools First

# Upgrade admin tools (safe, no impact on running workflows)
helm upgrade temporal temporalio/temporal \
  -n temporal-backend \
  --reuse-values \
  --set admintools.image.tag=1.29.1-tctl-1.18.2-cli-1.3.0 \
  --wait

# Verify admin tools
kubectl get pods -n temporal-backend -l app=temporal-admintools
kubectl logs -n temporal-backend -l app=temporal-admintools --tail=50

4.2. Upgrade Frontend Service

# Frontend handles client connections - upgrade carefully
helm upgrade temporal temporalio/temporal \
  -n temporal-backend \
  --reuse-values \
  --set server.image.tag=1.29.1 \
  --set server.frontend.replicaCount=3 \
  --wait \
  --timeout 10m

# Monitor frontend rollout
kubectl rollout status deployment/temporal-frontend -n temporal-backend

# Verify frontend health
kubectl exec -it deployment/temporal-admintools -n temporal-backend -- \
  tctl cluster health

4.3. Upgrade History Service

# History service manages workflow state - critical component
helm upgrade temporal temporalio/temporal \
  -n temporal-backend \
  --reuse-values \
  --set server.image.tag=1.29.1 \
  --wait \
  --timeout 15m

# Monitor history rollout (this takes longest)
kubectl rollout status deployment/temporal-history -n temporal-backend

# Verify no workflow disruptions
kubectl logs -n temporal-backend -l app=temporal-history --tail=100 | \
  grep -i "error\|fatal"

4.4. Upgrade Matching Service

# Matching service handles task queues
helm upgrade temporal temporalio/temporal \
  -n temporal-backend \
  --reuse-values \
  --set server.image.tag=1.29.1 \
  --wait

# Monitor matching rollout
kubectl rollout status deployment/temporal-matching -n temporal-backend

4.5. Upgrade Worker Service

# Internal worker service
helm upgrade temporal temporalio/temporal \
  -n temporal-backend \
  --reuse-values \
  --set server.image.tag=1.29.1 \
  --wait

# Monitor worker rollout
kubectl rollout status deployment/temporal-worker -n temporal-backend

4.6. Complete Upgrade with Full Values

# Apply all configuration changes
helm upgrade temporal temporalio/temporal \
  -n temporal-backend \
  -f values-1.29.yaml \
  --wait \
  --timeout 20m

# Verify all components
kubectl get pods -n temporal-backend
kubectl get deployment -n temporal-backend

Step 5: Validate Upgrade

5.1. Cluster Health Check

# Check cluster health
kubectl exec -it deployment/temporal-admintools -n temporal-backend -- \
  tctl cluster health

# Expected output:
# SERVING

# Check all services
kubectl exec -it deployment/temporal-admintools -n temporal-backend -- \
  tctl admin cluster describe

5.2. Workflow Validation

# Verify running workflows
kubectl exec -it deployment/temporal-admintools -n temporal-backend -- \
  tctl workflow list --query 'ExecutionStatus="Running"'

# Test workflow start (using eager start)
kubectl exec -it deployment/temporal-admintools -n temporal-backend -- \
  tctl workflow start \
    --taskqueue test-queue \
    --workflow_type TestWorkflow \
    --execution_timeout 300 \
    --input '"test-upgrade"'

# Verify workflow execution
kubectl exec -it deployment/temporal-admintools -n temporal-backend -- \
  tctl workflow show -w <workflow-id>

5.3. Metrics Verification

# Check Prometheus metrics
kubectl port-forward -n temporal-backend svc/temporal-frontend 9090:9090 &
curl http://localhost:9090/metrics | grep temporal_

# Verify new metrics are present
curl http://localhost:9090/metrics | grep "temporal_request_latency"

5.4. API Testing

# test_upgrade.py - Test client connectivity
from temporalio.client import Client
import asyncio

async def test_connection():
    client = await Client.connect("temporal.example.com:7233")

    # Test namespace access
    await client.list_workflows("WorkflowType='TestWorkflow'")

    # Test workflow start (validates eager start)
    handle = await client.start_workflow(
        TestWorkflow.run,
        id=f"test-upgrade-{int(time.time())}",
        task_queue="test-queue"
    )

    result = await handle.result()
    print(f"Upgrade validation successful: {result}")

asyncio.run(test_connection())

Step 6: Update Workers and Clients

6.1. Update Python SDK in Workers

# Update requirements.txt or pyproject.toml
# From: temporalio>=1.7.0
# To:   temporalio>=1.18.2

# Using uv (recommended)
cd /path/to/worker
uv pip install temporalio==1.18.2

# Or using pip
pip install --upgrade temporalio==1.18.2

# Rebuild worker images
docker build -t your-registry/temporal-worker:v2.0.0 .
docker push your-registry/temporal-worker:v2.0.0

6.2. Deploy Updated Workers

# k8s/worker-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: temporal-worker
  namespace: temporal-product
spec:
  replicas: 3
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0  # Zero downtime
  template:
    spec:
      containers:
      - name: worker
        image: your-registry/temporal-worker:v2.0.0
        env:
        - name: TEMPORAL_HOST
          value: "temporal-frontend.temporal-backend:7233"
        - name: SDK_VERSION
          value: "1.18.2"
# Deploy updated workers
kubectl apply -f k8s/worker-deployment.yaml

# Monitor rollout (zero downtime)
kubectl rollout status deployment/temporal-worker -n temporal-product

# Verify workers are processing tasks
kubectl logs -n temporal-product -l app=temporal-worker --tail=100

6.3. Update Client Applications

# Update client applications gradually
from temporalio.client import Client

# New features available in 1.18.2+
async def use_new_features():
    client = await Client.connect(
        "temporal.example.com:7233",
        namespace="production"
    )

    # Use Update-With-Start (GA in 1.28+)
    handle = await client.start_workflow(
        OrderWorkflow.run,
        id="order-12345",
        task_queue="orders"
    )

    # Execute update
    result = await handle.execute_update(
        OrderWorkflow.update_status,
        "processing"
    )

Step 7: Post-Upgrade Monitoring

7.1. Monitor for 24-48 Hours

# Setup monitoring dashboard
cat <<EOF > prometheus-queries.yaml
queries:
  # Request latency
  - temporal_request_latency_bucket

  # Error rates
  - rate(temporal_request_errors_total[5m])

  # Workflow execution rates
  - rate(temporal_workflow_execution_started[5m])

  # Worker polling
  - temporal_worker_task_slots_available

  # Database connections
  - temporal_persistence_requests_total
EOF

# Alert on anomalies

7.2. Performance Comparison

# Compare metrics before/after upgrade
# - Workflow start latency (should improve with eager start)
# - Task processing throughput
# - Database query performance
# - Resource utilization

Rollback Procedures

When to Rollback

Rollback if you encounter: - Persistent cluster health failures - High error rates (>5% increase) - Workflow execution failures - Database connectivity issues - Critical feature regressions

Rollback Steps

1. Rollback Temporal Server

# Rollback to previous version
helm rollback temporal -n temporal-backend

# Or specify specific revision
helm rollback temporal <revision-number> -n temporal-backend

# Verify rollback
kubectl get pods -n temporal-backend
kubectl exec -it deployment/temporal-admintools -n temporal-backend -- \
  tctl cluster health

2. Rollback Database Schema (If Necessary)

# Restore database from backup
pg_restore -U temporal -h postgresql.example.com -d temporal \
  temporal-backup-YYYYMMDD-HHMMSS.dump

# Verify schema version
temporal-sql-tool --plugin postgres12 \
  --ep postgresql.example.com \
  -u temporal \
  --db temporal \
  show-schema-version

3. Rollback Workers

# Revert to previous worker image
kubectl set image deployment/temporal-worker \
  worker=your-registry/temporal-worker:v1.0.0 \
  -n temporal-product

kubectl rollout status deployment/temporal-worker -n temporal-product

Troubleshooting Common Issues

Issue 1: Schema Migration Fails

Symptoms:

Error: schema version mismatch

Resolution:

# Check schema version
temporal-sql-tool --plugin postgres12 \
  --ep postgresql.example.com \
  -u temporal \
  --db temporal \
  show-schema-version

# Force schema update if stuck
temporal-sql-tool --plugin postgres12 \
  --ep postgresql.example.com \
  -u temporal \
  --db temporal \
  update-schema \
  -d ./postgresql/v12/temporal/versioned \
  --version 1.17

Issue 2: Frontend Connection Errors

Symptoms:

rpc error: code = Unavailable desc = connection error

Resolution:

# Check frontend pods
kubectl get pods -n temporal-backend -l app=temporal-frontend

# Check frontend logs
kubectl logs -n temporal-backend -l app=temporal-frontend --tail=200

# Verify service endpoints
kubectl get endpoints temporal-frontend -n temporal-backend

# Test connectivity
kubectl run -it --rm debug --image=busybox --restart=Never -- \
  nc -zv temporal-frontend.temporal-backend.svc.cluster.local 7233

Issue 3: Worker Version Mismatch

Symptoms:

Worker SDK version incompatible with server

Resolution:

# Update worker SDK
pip install --upgrade temporalio==1.18.2

# Rebuild and redeploy workers
docker build -t your-registry/temporal-worker:latest .
kubectl rollout restart deployment/temporal-worker -n temporal-product

Issue 4: Eager Workflow Start Issues

Symptoms:

Eager workflow start failed, falling back to normal start

Resolution:

# Adjust rate limits in Helm values
server:
  config:
    services:
      frontend:
        eagerWorkflowStartEnabled: true
        rateLimit:
          eagerWorkflowStart:
            maxPerSecond: 200  # Increase limit
            burstSize: 400

Issue 5: High Database Load

Symptoms: - Slow query performance - Connection pool exhaustion

Resolution:

# Adjust connection pool settings
server:
  config:
    persistence:
      datastores:
        default:
          maxConns: 100  # Increase connections
          maxIdleConns: 20
          maxConnLifetime: "30m"  # Reduce lifetime

Best Practices

1. Gradual Rollout

  • Upgrade staging first
  • Use canary deployments for workers
  • Monitor extensively at each stage

2. Communication

  • Notify stakeholders of maintenance window
  • Prepare rollback communications
  • Document all changes

3. Automation

# Create upgrade automation script
#!/bin/bash
set -e

./scripts/backup-database.sh
./scripts/test-schema-migration.sh
./scripts/upgrade-temporal.sh
./scripts/validate-upgrade.sh
./scripts/monitor-health.sh

4. Testing

  • Test all critical workflows post-upgrade
  • Validate new features work as expected
  • Performance benchmark comparison

Additional Resources

Support

For upgrade assistance: - Temporal Community Forum - Temporal Slack - Professional support: support@temporal.io