Skip to content

Use Case: VM Backup and Recovery

Business Context

Ensuring data integrity and availability requires effective VM backup and recovery strategies. This use case explores how to implement comprehensive backup solutions and recovery processes to protect critical VM workloads, ensuring minimal downtime and data loss.

Technical Requirements

Infrastructure Requirements

  • OpenShift 4.12+ with KubeVirt enabled
  • Rubrik Cloud Data Management for comprehensive backup and recovery
  • Network connectivity for Rubrik management and data transfer

Resource Requirements

  • Storage: Adequate space for backup snapshots and archival
  • Network: Sufficient bandwidth for backup data transfer

Architecture Overview

graph TD
    subgraph "OpenShift Cluster"
        VM1["VM Instance 1"]
        VM2["VM Instance 2"]
        VM3["VM Instance 3"]
    end

    subgraph "Rubrik Ecosystem"
        RUBRIK_CDM["Rubrik CDM"]
        RUBRIK_POLARIS["Rubrik Polaris"]
        SLA_DOMAINS["SLA Domains"]
    end

    subgraph "Storage & Replication"
        LOCAL_STORAGE["Local Rubrik Storage"]
        REMOTE_RUBRIK["Remote Rubrik Cluster"]
        CLOUD_ARCHIVE["Cloud Archive (AWS/Azure)"]
    end

    VM1 --> RUBRIK_CDM
    VM2 --> RUBRIK_CDM
    VM3 --> RUBRIK_CDM

    RUBRIK_CDM --> SLA_DOMAINS
    RUBRIK_CDM --> LOCAL_STORAGE
    LOCAL_STORAGE --> REMOTE_RUBRIK
    LOCAL_STORAGE --> CLOUD_ARCHIVE

    RUBRIK_POLARIS --> RUBRIK_CDM
    RUBRIK_POLARIS --> REMOTE_RUBRIK

    style RUBRIK_CDM fill:#f9f,stroke:#333
    style RUBRIK_POLARIS fill:#ff9,stroke:#333
    style SLA_DOMAINS fill:#9ff,stroke:#333

Implementation Steps

Step 1: Setup Backup Infrastructure

Deploy Rubrik Kubernetes Connector

apiVersion: apps/v1
kind: Deployment
metadata:
  name: rubrik-connector
  namespace: rubrik-system
spec:
  replicas: 1
  selector:
    matchLabels:
      app: rubrik-connector
  template:
    metadata:
      labels:
        app: rubrik-connector
    spec:
      containers:
      - name: connector
        image: rubrik/kubernetes-connector:latest
        env:
        - name: RUBRIK_CDM_URL
          valueFrom:
            secretKeyRef:
              name: rubrik-credentials
              key: cdm-url
        - name: RUBRIK_USERNAME
          valueFrom:
            secretKeyRef:
              name: rubrik-credentials
              key: username
        - name: RUBRIK_PASSWORD
          valueFrom:
            secretKeyRef:
              name: rubrik-credentials
              key: password
        ports:
        - containerPort: 8080
        volumeMounts:
        - name: kubeconfig
          mountPath: /etc/kubeconfig
          readOnly: true
      volumes:
      - name: kubeconfig
        secret:
          secretName: rubrik-kubeconfig
---
apiVersion: v1
kind: Secret
metadata:
  name: rubrik-credentials
  namespace: rubrik-system
type: Opaque
data:
  cdm-url: <base64-encoded-rubrik-cdm-url>
  username: <base64-encoded-username>
  password: <base64-encoded-password>

Configure Rubrik SLA Domains

# Rubrik SLA Domain Configuration
apiVersion: rubrik.com/v1
kind: SLADomain
metadata:
  name: vm-backup-sla
  namespace: vm-workloads
spec:
  name: "VM Production Backup"
  frequencies:
  - timeUnit: "Daily"
    frequency: 1
    retention: 30  # days
  - timeUnit: "Weekly"
    frequency: 1
    retention: 12  # weeks
  - timeUnit: "Monthly"
    frequency: 1
    retention: 12  # months
  archivalSpecs:
  - locationId: "cloud-archive-location"
    archivalThreshold: 7  # days
  replicationSpecs:
  - locationId: "remote-rubrik-cluster"
    retentionLimit: 90  # days

Step 2: Configure VM Backup Policies

Assign VMs to SLA Domains

apiVersion: rubrik.com/v1
kind: VMBackupPolicy
metadata:
  name: vm-backup-policy
  namespace: vm-workloads
spec:
  selector:
    matchLabels:
      backup: "enabled"
  slaRef:
    name: vm-backup-sla
  backupOptions:
    consistencyType: "application-consistent"
    excludedDisks: []
    includeIndexing: true
---
# Apply backup policy to specific VMs
apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
  name: critical-app-vm
  namespace: vm-workloads
  labels:
    backup: "enabled"
    tier: "production"
spec:
  running: true
  template:
    metadata:
      labels:
        backup: "enabled"
    spec:
      domain:
        devices:
          disks:
          - disk:
              bus: virtio
            name: rootdisk
        memory:
          guest: 8Gi
        cpu:
          cores: 4
      volumes:
      - dataVolume:
          name: critical-app-vm-dv
        name: rootdisk

On-Demand Backup Configuration

# Trigger immediate backup using Rubrik CLI
rubrik backup vm --vm-name "critical-app-vm" --namespace "vm-workloads" --sla "VM Production Backup"

# Schedule backup using Rubrik API
curl -X POST "https://rubrik-cdm.example.com/api/internal/vmware/vm/{vm-id}/snapshot" \
  -H "Authorization: Bearer ${RUBRIK_TOKEN}" \
  -H "Content-Type: application/json" \
  -d '{
    "slaId": "vm-backup-sla-id",
    "archivalPolicy": {
      "locationId": "cloud-archive-location",
      "retentionDays": 2555
    }
  }'

Step 3: Configure Backup Storage and Replication

Cloud Archive Configuration

apiVersion: rubrik.com/v1
kind: CloudArchive
metadata:
  name: aws-s3-archive
  namespace: rubrik-system
spec:
  provider: "AWS"
  bucketName: "rubrik-vm-backups"
  region: "us-west-2"
  encryptionType: "AWS_KMS"
  credentials:
    secretRef:
      name: aws-credentials
      namespace: rubrik-system
  storageClass: "GLACIER"
  retentionLock: true
  retentionLockDurationDays: 2555  # 7 years
---
apiVersion: v1
kind: Secret
metadata:
  name: aws-credentials
  namespace: rubrik-system
type: Opaque
data:
  access-key-id: <base64-encoded-access-key>
  secret-access-key: <base64-encoded-secret-key>

Replication Target Configuration

apiVersion: rubrik.com/v1
kind: ReplicationTarget
metadata:
  name: dr-site-rubrik
  namespace: rubrik-system
spec:
  targetClusterAddress: "rubrik-dr.example.com"
  credentials:
    secretRef:
      name: dr-rubrik-credentials
  replicationBandwidth: "100 Mbps"
  encryptionInTransit: true
  retentionOnTarget: 90  # days

Step 4: Implement Recovery Procedures

Instant Recovery Configuration

apiVersion: rubrik.com/v1
kind: VMRecovery
metadata:
  name: vm-instant-recovery
  namespace: vm-workloads
spec:
  sourceVM:
    name: "critical-app-vm"
    namespace: "vm-workloads"
  recoveryPoint:
    snapshotId: "snapshot-id-from-rubrik"
    # or use timestamp
    # timestamp: "2024-01-15T10:30:00Z"
  recoveryOptions:
    type: "instant"  # instant, full, or export
    targetNamespace: "vm-recovery"
    targetVMName: "critical-app-vm-recovered"
    powerOn: true
    preserveMAC: false
  networkMapping:
  - sourceNetwork: "production-network"
    targetNetwork: "recovery-network"
---
# Alternative: Full VM Recovery
apiVersion: rubrik.com/v1
kind: VMRecovery
metadata:
  name: vm-full-recovery
  namespace: vm-workloads
spec:
  sourceVM:
    name: "critical-app-vm"
    namespace: "vm-workloads"
  recoveryPoint:
    snapshotId: "latest"
  recoveryOptions:
    type: "full"
    targetNamespace: "vm-workloads"
    targetVMName: "critical-app-vm"
    powerOn: false
    overwriteExisting: true

Granular File Recovery

# Mount VM snapshot for file-level recovery
rubrik vm mount --vm-name "critical-app-vm" \
  --snapshot-id "snapshot-id" \
  --mount-path "/mnt/recovery" \
  --read-only

# Extract specific files
cp /mnt/recovery/important-file.txt /recovery/destination/

# Unmount when recovery is complete
rubrik vm unmount --mount-id "mount-id"

Cross-Platform Recovery

# Export VM to different platform (e.g., VMware to KubeVirt)
apiVersion: rubrik.com/v1
kind: VMExport
metadata:
  name: vm-cross-platform-recovery
  namespace: vm-workloads
spec:
  sourceVM:
    name: "legacy-vmware-vm"
    platform: "vmware"
  exportOptions:
    targetPlatform: "kubevirt"
    targetNamespace: "migrated-workloads"
    format: "ova"
    targetStorage:
      storageClass: "fast-ssd"
      size: "100Gi"
  conversionOptions:
    optimizeForKubernetes: true
    removeVMwareTools: true
    installKubernetesAgents: true

Troubleshooting Guide

Common Issues and Solutions

Rubrik Connector Issues

  • Issue: Rubrik connector cannot communicate with CDM
  • Solution:
  • Verify network connectivity between OpenShift cluster and Rubrik CDM
  • Check Rubrik credentials in the secret
  • Ensure proper RBAC permissions for the connector service account

Backup Failures

  • Issue: VM backups failing or incomplete
  • Solution:
  • Check VM quiesce settings and ensure VM tools are installed
  • Verify SLA domain configuration and policies
  • Monitor Rubrik cluster storage capacity
  • Review backup logs in Rubrik Polaris dashboard

Recovery Performance Issues

  • Issue: Slow recovery or instant recovery not working
  • Solution:
  • Verify network bandwidth between Rubrik and OpenShift cluster
  • Check storage performance on target environment
  • Use Rubrik's storage optimization features
  • Consider using local recovery points for faster access

Replication and Archive Issues

  • Issue: Replication to remote site or cloud archive failing
  • Solution:
  • Verify bandwidth allocation and network policies
  • Check cloud credentials and permissions
  • Review replication target configuration
  • Monitor Rubrik cluster bandwidth utilization

Best Practices

  • Regular Backups: Schedule regular backups and snapshots for critical VMs.
  • Test Recovery: Regularly test recovery procedures using Rubrik capabilities to ensure data integrity.
  • Security and Compliance: Leverage Rubrik's built-in encryption and compliance features.

Integration with RH OVE Ecosystem

Monitoring and Alerting Integration

# Rubrik Metrics ServiceMonitor
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: rubrik-metrics
  namespace: rubrik-system
spec:
  selector:
    matchLabels:
      app: rubrik-connector
  endpoints:
  - port: metrics
    interval: 30s
    path: /metrics
---
# Rubrik Backup Alerts
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: rubrik-backup-alerts
  namespace: rubrik-system
spec:
  groups:
  - name: rubrik.backup
    rules:
    - alert: RubrikBackupFailure
      expr: rubrik_backup_failures_total > 0
      for: 5m
      labels:
        severity: critical
      annotations:
        summary: "Rubrik backup failure detected"
        description: "VM backup has failed for {{ $labels.vm_name }}"

    - alert: RubrikClusterCapacity
      expr: rubrik_cluster_used_capacity_percentage > 85
      for: 10m
      labels:
        severity: warning
      annotations:
        summary: "Rubrik cluster capacity high"
        description: "Rubrik cluster is {{ $value }}% full"

    - alert: RubrikReplicationDelay
      expr: rubrik_replication_lag_minutes > 60
      for: 15m
      labels:
        severity: warning
      annotations:
        summary: "Rubrik replication lag detected"
        description: "Replication is {{ $value }} minutes behind"

GitOps Integration

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: rubrik-backup-policies
  namespace: argocd
spec:
  project: platform-services
  source:
    repoURL: https://git.example.com/rubrik-backup-config
    targetRevision: HEAD
    path: policies/production
  destination:
    server: https://kubernetes.default.svc
    namespace: vm-workloads
  syncPolicy:
    automated:
      prune: false
      selfHeal: true
    syncOptions:
    - CreateNamespace=true

Multi-Cluster Backup Management

  • Centralized Management: Use Rubrik Polaris to manage backup policies across multiple OpenShift clusters
  • Cross-Cluster Recovery: Implement recovery procedures that can restore VMs to different clusters
  • Global SLA Management: Define consistent backup policies across the entire RH OVE ecosystem

This guide provides the necessary steps and best practices for implementing VM backup and recovery processes in the RH OVE environment, ensuring data resilience and availability for enterprise workloads.