Use Case: VM Scaling and Performance Optimization¶

Business Context¶

Dynamic scaling and performance optimization are essential for maintaining optimal VM performance while managing resource costs. This use case demonstrates how to implement auto-scaling, resource optimization, and performance monitoring for virtual machines in the RH OVE ecosystem.

Technical Requirements¶

Infrastructure Requirements¶

OpenShift 4.12+ with KubeVirt enabled
Horizontal Pod Autoscaler (HPA) support
Metrics server and custom metrics API
Persistent storage with performance monitoring capabilities
CPU and memory monitoring tools (Prometheus/Grafana)

Resource Requirements¶

CPU: Variable based on workload demands
Memory: Dynamic allocation based on usage patterns
Storage: High-performance storage with IOPS monitoring
Network: Low-latency network for performance-sensitive applications

Architecture Overview¶

graph TD
    subgraph "Monitoring Stack"
        PROMETHEUS["Prometheus"]
        GRAFANA["Grafana"]
        ALERTMANAGER["AlertManager"]
    end

    subgraph "Scaling Infrastructure"
        HPA["Horizontal Pod Autoscaler"]
        VPA["Vertical Pod Autoscaler"]
        METRICS["Metrics Server"]
    end

    subgraph "VM Workloads"
        VM1["VM Instance 1"]
        VM2["VM Instance 2"]
        VM3["VM Instance 3"]
    end

    PROMETHEUS --> ALERTMANAGER
    GRAFANA --> PROMETHEUS
    METRICS --> HPA
    METRICS --> VPA
    HPA --> VM1
    HPA --> VM2
    VPA --> VM3

    VM1 --> PROMETHEUS
    VM2 --> PROMETHEUS
    VM3 --> PROMETHEUS

    style PROMETHEUS fill:#f9f,stroke:#333
    style HPA fill:#ff9,stroke:#333

Implementation Steps¶

Step 1: Enable VM Performance Monitoring¶

Deploy VM with Performance Monitoring¶

apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
  name: performance-vm
  namespace: vm-workloads
spec:
  running: true
  template:
    metadata:
      labels:
        app: performance-vm
        monitoring: "enabled"
    spec:
      domain:
        cpu:
          cores: 2
          model: host-passthrough
        devices:
          disks:
          - disk:
              bus: virtio
            name: rootdisk
          interfaces:
          - name: default
            bridge: {}
        memory:
          guest: 4Gi
        resources:
          requests:
            memory: 4Gi
            cpu: 2
          limits:
            memory: 8Gi
            cpu: 4
      networks:
      - name: default
        pod: {}
      volumes:
      - dataVolume:
          name: performance-vm-dv
        name: rootdisk
---
apiVersion: cdi.kubevirt.io/v1beta1
kind: DataVolume
metadata:
  name: performance-vm-dv
  namespace: vm-workloads
spec:
  source:
    http:
      url: "https://vm-images.example.com/performance-vm.img"
  pvc:
    accessModes:
    - ReadWriteOnce
    resources:
      requests:
        storage: 50Gi
    storageClassName: fast-ssd

Step 2: Configure Horizontal Pod Autoscaler for VMs¶

HPA Configuration¶

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: vm-hpa
  namespace: vm-workloads
spec:
  scaleTargetRef:
    apiVersion: kubevirt.io/v1
    kind: VirtualMachine
    name: performance-vm
  minReplicas: 1
  maxReplicas: 5
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 60
      policies:
      - type: Percent
        value: 50
        periodSeconds: 60
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Percent
        value: 25
        periodSeconds: 60

Step 3: Implement Vertical Pod Autoscaler¶

VPA Configuration¶

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: vm-vpa
  namespace: vm-workloads
spec:
  targetRef:
    apiVersion: kubevirt.io/v1
    kind: VirtualMachine
    name: performance-vm
  updatePolicy:
    updateMode: "Auto"
  resourcePolicy:
    containerPolicies:
    - containerName: compute
      minAllowed:
        cpu: 100m
        memory: 1Gi
      maxAllowed:
        cpu: 8
        memory: 16Gi
      controlledResources: ["cpu", "memory"]

Step 4: Performance Monitoring and Alerting¶

ServiceMonitor for VM Metrics¶

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: vm-performance-monitor
  namespace: vm-workloads
  labels:
    app: vm-monitor
spec:
  selector:
    matchLabels:
      monitoring: "enabled"
  endpoints:
  - port: metrics
    interval: 30s
    path: /metrics
    honorLabels: true

Performance Alerting Rules¶

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: vm-performance-alerts
  namespace: vm-workloads
spec:
  groups:
  - name: vm.performance
    rules:
    - alert: VMHighCPUUsage
      expr: kubevirt_vm_cpu_usage > 0.8
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "VM {{ $labels.name }} has high CPU usage"
        description: "VM {{ $labels.name }} CPU usage is above 80% for more than 5 minutes."

    - alert: VMHighMemoryUsage
      expr: kubevirt_vm_memory_usage > 0.9
      for: 5m
      labels:
        severity: critical
      annotations:
        summary: "VM {{ $labels.name }} has high memory usage"
        description: "VM {{ $labels.name }} memory usage is above 90% for more than 5 minutes."

    - alert: VMDiskIOHigh
      expr: kubevirt_vm_disk_iops > 1000
      for: 10m
      labels:
        severity: warning
      annotations:
        summary: "VM {{ $labels.name }} has high disk I/O"
        description: "VM {{ $labels.name }} disk IOPS is above 1000 for more than 10 minutes."

Step 5: Performance Optimization Strategies¶

CPU Pinning for Performance-Critical VMs¶

apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
  name: high-performance-vm
  namespace: vm-workloads
spec:
  running: true
  template:
    metadata:
      annotations:
        cpu-load-balancing.crio.io: "disable"
        cpu-quota.crio.io: "disable"
        irq-load-balancing.crio.io: "disable"
    spec:
      domain:
        cpu:
          cores: 4
          sockets: 1
          threads: 1
          dedicatedCpuPlacement: true
          isolateEmulatorThread: true
          model: host-passthrough
        devices:
          disks:
          - disk:
              bus: virtio
            name: rootdisk
        memory:
          guest: 8Gi
          hugepages:
            pageSize: 1Gi
        resources:
          requests:
            memory: 8Gi
            cpu: 4
          limits:
            memory: 8Gi
            cpu: 4
      nodeSelector:
        node-role.kubernetes.io/worker: ""
        performance-node: "true"
      volumes:
      - dataVolume:
          name: high-performance-vm-dv
        name: rootdisk

Step 6: Storage Performance Optimization¶

High-Performance Storage Configuration¶

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: high-performance-ssd
provisioner: kubernetes.io/csi-driver
parameters:
  type: gp3
  iops: "3000"
  throughput: "125"
  fsType: ext4
reclaimPolicy: Delete
allowVolumeExpansion: true
volumeBindingMode: WaitForFirstConsumer

Troubleshooting Guide¶

Common Issues and Solutions¶

Scaling Not Triggering¶

Issue: HPA/VPA not scaling VMs as expected
Solution:
Check metrics server functionality: kubectl top nodes
Verify resource requests are set properly
Check HPA status: kubectl describe hpa vm-hpa

Performance Degradation¶

Issue: VM performance is not meeting expectations
Solution:
Review CPU pinning configuration
Check for resource contention on nodes
Verify storage performance metrics
Analyze network latency and throughput

Memory Issues¶

Issue: Out of memory errors or high memory pressure
Solution:
Increase memory limits in VM specification
Enable hugepages for better memory performance
Check for memory leaks in applications

Best Practices¶

Resource Management¶

Right-sizing: Start with conservative resource allocations and scale based on monitoring data
Resource Limits: Always set both requests and limits to prevent resource starvation
Node Selection: Use node selectors and taints to ensure VMs are scheduled on appropriate nodes

Performance Tuning¶

CPU Optimization: Use CPU pinning for performance-critical workloads
Memory Optimization: Configure hugepages for memory-intensive applications
Storage Optimization: Use high-performance storage classes for I/O intensive workloads
Network Optimization: Configure SR-IOV for network-intensive applications

Monitoring and Alerting¶

Proactive Monitoring: Set up comprehensive monitoring for all performance metrics
Alert Thresholds: Configure appropriate alert thresholds to prevent performance issues
Capacity Planning: Use historical data for capacity planning and resource allocation

Integration with RH OVE Ecosystem¶

GitOps Integration¶

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: vm-performance
  namespace: argocd
spec:
  project: default
  source:
    repoURL: https://git.example.com/vm-performance-config
    targetRevision: HEAD
    path: performance
  destination:
    server: https://kubernetes.default.svc
    namespace: vm-workloads
  syncPolicy:
    automated:
      prune: false
      selfHeal: true
    syncOptions:
    - CreateNamespace=true

Multi-Cluster Performance Management¶

Centralized Monitoring: Aggregate performance metrics across multiple clusters
Cross-Cluster Scaling: Implement scaling policies that consider cluster resource availability
Performance Benchmarking: Establish performance baselines across different cluster configurations

This comprehensive guide provides the tools and strategies needed to implement effective VM scaling and performance optimization within the RH OVE ecosystem, ensuring optimal resource utilization and application performance.