Skip to content

Use Case: VM Scaling and Performance Optimization

Business Context

Dynamic scaling and performance optimization are essential for maintaining optimal VM performance while managing resource costs. This use case demonstrates how to implement auto-scaling, resource optimization, and performance monitoring for virtual machines in the RH OVE ecosystem.

Technical Requirements

Infrastructure Requirements

  • OpenShift 4.12+ with KubeVirt enabled
  • Horizontal Pod Autoscaler (HPA) support
  • Metrics server and custom metrics API
  • Persistent storage with performance monitoring capabilities
  • CPU and memory monitoring tools (Prometheus/Grafana)

Resource Requirements

  • CPU: Variable based on workload demands
  • Memory: Dynamic allocation based on usage patterns
  • Storage: High-performance storage with IOPS monitoring
  • Network: Low-latency network for performance-sensitive applications

Architecture Overview

graph TD
    subgraph "Monitoring Stack"
        PROMETHEUS["Prometheus"]
        GRAFANA["Grafana"]
        ALERTMANAGER["AlertManager"]
    end

    subgraph "Scaling Infrastructure"
        HPA["Horizontal Pod Autoscaler"]
        VPA["Vertical Pod Autoscaler"]
        METRICS["Metrics Server"]
    end

    subgraph "VM Workloads"
        VM1["VM Instance 1"]
        VM2["VM Instance 2"]
        VM3["VM Instance 3"]
    end

    PROMETHEUS --> ALERTMANAGER
    GRAFANA --> PROMETHEUS
    METRICS --> HPA
    METRICS --> VPA
    HPA --> VM1
    HPA --> VM2
    VPA --> VM3

    VM1 --> PROMETHEUS
    VM2 --> PROMETHEUS
    VM3 --> PROMETHEUS

    style PROMETHEUS fill:#f9f,stroke:#333
    style HPA fill:#ff9,stroke:#333

Implementation Steps

Step 1: Enable VM Performance Monitoring

Deploy VM with Performance Monitoring

apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
  name: performance-vm
  namespace: vm-workloads
spec:
  running: true
  template:
    metadata:
      labels:
        app: performance-vm
        monitoring: "enabled"
    spec:
      domain:
        cpu:
          cores: 2
          model: host-passthrough
        devices:
          disks:
          - disk:
              bus: virtio
            name: rootdisk
          interfaces:
          - name: default
            bridge: {}
        memory:
          guest: 4Gi
        resources:
          requests:
            memory: 4Gi
            cpu: 2
          limits:
            memory: 8Gi
            cpu: 4
      networks:
      - name: default
        pod: {}
      volumes:
      - dataVolume:
          name: performance-vm-dv
        name: rootdisk
---
apiVersion: cdi.kubevirt.io/v1beta1
kind: DataVolume
metadata:
  name: performance-vm-dv
  namespace: vm-workloads
spec:
  source:
    http:
      url: "https://vm-images.example.com/performance-vm.img"
  pvc:
    accessModes:
    - ReadWriteOnce
    resources:
      requests:
        storage: 50Gi
    storageClassName: fast-ssd

Step 2: Configure Horizontal Pod Autoscaler for VMs

HPA Configuration

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: vm-hpa
  namespace: vm-workloads
spec:
  scaleTargetRef:
    apiVersion: kubevirt.io/v1
    kind: VirtualMachine
    name: performance-vm
  minReplicas: 1
  maxReplicas: 5
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 60
      policies:
      - type: Percent
        value: 50
        periodSeconds: 60
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Percent
        value: 25
        periodSeconds: 60

Step 3: Implement Vertical Pod Autoscaler

VPA Configuration

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: vm-vpa
  namespace: vm-workloads
spec:
  targetRef:
    apiVersion: kubevirt.io/v1
    kind: VirtualMachine
    name: performance-vm
  updatePolicy:
    updateMode: "Auto"
  resourcePolicy:
    containerPolicies:
    - containerName: compute
      minAllowed:
        cpu: 100m
        memory: 1Gi
      maxAllowed:
        cpu: 8
        memory: 16Gi
      controlledResources: ["cpu", "memory"]

Step 4: Performance Monitoring and Alerting

ServiceMonitor for VM Metrics

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: vm-performance-monitor
  namespace: vm-workloads
  labels:
    app: vm-monitor
spec:
  selector:
    matchLabels:
      monitoring: "enabled"
  endpoints:
  - port: metrics
    interval: 30s
    path: /metrics
    honorLabels: true

Performance Alerting Rules

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: vm-performance-alerts
  namespace: vm-workloads
spec:
  groups:
  - name: vm.performance
    rules:
    - alert: VMHighCPUUsage
      expr: kubevirt_vm_cpu_usage > 0.8
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "VM {{ $labels.name }} has high CPU usage"
        description: "VM {{ $labels.name }} CPU usage is above 80% for more than 5 minutes."

    - alert: VMHighMemoryUsage
      expr: kubevirt_vm_memory_usage > 0.9
      for: 5m
      labels:
        severity: critical
      annotations:
        summary: "VM {{ $labels.name }} has high memory usage"
        description: "VM {{ $labels.name }} memory usage is above 90% for more than 5 minutes."

    - alert: VMDiskIOHigh
      expr: kubevirt_vm_disk_iops > 1000
      for: 10m
      labels:
        severity: warning
      annotations:
        summary: "VM {{ $labels.name }} has high disk I/O"
        description: "VM {{ $labels.name }} disk IOPS is above 1000 for more than 10 minutes."

Step 5: Performance Optimization Strategies

CPU Pinning for Performance-Critical VMs

apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
  name: high-performance-vm
  namespace: vm-workloads
spec:
  running: true
  template:
    metadata:
      annotations:
        cpu-load-balancing.crio.io: "disable"
        cpu-quota.crio.io: "disable"
        irq-load-balancing.crio.io: "disable"
    spec:
      domain:
        cpu:
          cores: 4
          sockets: 1
          threads: 1
          dedicatedCpuPlacement: true
          isolateEmulatorThread: true
          model: host-passthrough
        devices:
          disks:
          - disk:
              bus: virtio
            name: rootdisk
        memory:
          guest: 8Gi
          hugepages:
            pageSize: 1Gi
        resources:
          requests:
            memory: 8Gi
            cpu: 4
          limits:
            memory: 8Gi
            cpu: 4
      nodeSelector:
        node-role.kubernetes.io/worker: ""
        performance-node: "true"
      volumes:
      - dataVolume:
          name: high-performance-vm-dv
        name: rootdisk

Step 6: Storage Performance Optimization

High-Performance Storage Configuration

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: high-performance-ssd
provisioner: kubernetes.io/csi-driver
parameters:
  type: gp3
  iops: "3000"
  throughput: "125"
  fsType: ext4
reclaimPolicy: Delete
allowVolumeExpansion: true
volumeBindingMode: WaitForFirstConsumer

Troubleshooting Guide

Common Issues and Solutions

Scaling Not Triggering

  • Issue: HPA/VPA not scaling VMs as expected
  • Solution:
  • Check metrics server functionality: kubectl top nodes
  • Verify resource requests are set properly
  • Check HPA status: kubectl describe hpa vm-hpa

Performance Degradation

  • Issue: VM performance is not meeting expectations
  • Solution:
  • Review CPU pinning configuration
  • Check for resource contention on nodes
  • Verify storage performance metrics
  • Analyze network latency and throughput

Memory Issues

  • Issue: Out of memory errors or high memory pressure
  • Solution:
  • Increase memory limits in VM specification
  • Enable hugepages for better memory performance
  • Check for memory leaks in applications

Best Practices

Resource Management

  • Right-sizing: Start with conservative resource allocations and scale based on monitoring data
  • Resource Limits: Always set both requests and limits to prevent resource starvation
  • Node Selection: Use node selectors and taints to ensure VMs are scheduled on appropriate nodes

Performance Tuning

  • CPU Optimization: Use CPU pinning for performance-critical workloads
  • Memory Optimization: Configure hugepages for memory-intensive applications
  • Storage Optimization: Use high-performance storage classes for I/O intensive workloads
  • Network Optimization: Configure SR-IOV for network-intensive applications

Monitoring and Alerting

  • Proactive Monitoring: Set up comprehensive monitoring for all performance metrics
  • Alert Thresholds: Configure appropriate alert thresholds to prevent performance issues
  • Capacity Planning: Use historical data for capacity planning and resource allocation

Integration with RH OVE Ecosystem

GitOps Integration

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: vm-performance
  namespace: argocd
spec:
  project: default
  source:
    repoURL: https://git.example.com/vm-performance-config
    targetRevision: HEAD
    path: performance
  destination:
    server: https://kubernetes.default.svc
    namespace: vm-workloads
  syncPolicy:
    automated:
      prune: false
      selfHeal: true
    syncOptions:
    - CreateNamespace=true

Multi-Cluster Performance Management

  • Centralized Monitoring: Aggregate performance metrics across multiple clusters
  • Cross-Cluster Scaling: Implement scaling policies that consider cluster resource availability
  • Performance Benchmarking: Establish performance baselines across different cluster configurations

This comprehensive guide provides the tools and strategies needed to implement effective VM scaling and performance optimization within the RH OVE ecosystem, ensuring optimal resource utilization and application performance.