Performance Tuning¶

Overview¶

This document provides comprehensive performance tuning guidelines for the RH OVE ecosystem, covering optimization strategies for virtual machines, networking, storage, and cluster-wide performance enhancements.

Performance Optimization Strategy¶

Performance Monitoring Approach¶

graph TB
    A[Baseline Metrics] --> B[Identify Bottlenecks]
    B --> C[Apply Optimizations]
    C --> D[Measure Impact]
    D --> E{Performance Improved?}
    E -->|Yes| F[Document Changes]
    E -->|No| G[Try Alternative Approach]
    G --> C
    F --> H[Continuous Monitoring]

Key Performance Indicators (KPIs)¶

VM Performance: CPU utilization, memory usage, disk I/O, network throughput
Cluster Performance: Node utilization, pod scheduling latency, API response times
Network Performance: Latency, packet loss, bandwidth utilization
Storage Performance: IOPS, throughput, latency

Virtual Machine Performance Tuning¶

CPU Optimization¶

CPU Pinning for High-Performance VMs¶

apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
  name: high-performance-vm
spec:
  template:
    spec:
      domain:
        cpu:
          cores: 4
          dedicatedCpuPlacement: true
          isolateEmulatorThread: true
        resources:
          requests:
            cpu: 4
            memory: 8Gi
          limits:
            cpu: 4
            memory: 8Gi
      nodeSelector:
        node-role.kubernetes.io/worker: ""
        cpumanager: "true"

CPU Manager Configuration¶

apiVersion: machineconfiguration.openshift.io/v1
kind: KubeletConfig
metadata:
  name: cpumanager-enabled
spec:
  machineConfigPoolSelector:
    matchLabels:
      pools.operator.machineconfiguration.openshift.io/worker: ""
  kubeletConfig:
    cpuManagerPolicy: static
    cpuManagerReconcilePeriod: 5s
    reservedSystemCPUs: "0,1"

NUMA Topology Awareness¶

apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
  name: numa-optimized-vm
spec:
  template:
    spec:
      domain:
        cpu:
          cores: 8
          numa:
            guestMappingPassthrough: {}
        memory:
          guest: 16Gi
          hugepages:
            pageSize: 1Gi

Memory Optimization¶

Hugepages Configuration¶

# Node configuration for hugepages
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  name: hugepages-worker
  labels:
    machineconfiguration.openshift.io/role: worker
spec:
  config:
    ignition:
      version: 3.2.0
    systemd:
      units:
      - name: hugepages-1gi.service
        enabled: true
        contents: |
          [Unit]
          Description=Configure 1Gi hugepages
          [Service]
          Type=oneshot
          ExecStart=/bin/bash -c 'echo 8 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages'
          [Install]
          WantedBy=multi-user.target

VM Memory Configuration with Hugepages¶

apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
  name: memory-optimized-vm
spec:
  template:
    spec:
      domain:
        memory:
          guest: 8Gi
          hugepages:
            pageSize: 1Gi
        resources:
          requests:
            memory: 8Gi
            hugepages-1Gi: 8Gi
          limits:
            memory: 8Gi
            hugepages-1Gi: 8Gi

Storage Performance Optimization¶

High-Performance Storage Configuration¶

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: high-performance-ssd
provisioner: kubernetes.io/no-provisioner
parameters:
  type: ssd
  fsType: ext4
  # Optimize for performance
  mountOptions: "noatime,nodiratime"
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer

VM Disk Performance Tuning¶

apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
  name: storage-optimized-vm
spec:
  template:
    spec:
      domain:
        devices:
          disks:
          - name: rootdisk
            disk:
              bus: virtio
              # Enable disk cache for better performance
              cache: writeback
          - name: datadisk
            disk:
              bus: virtio
              cache: none
              # Use native I/O for better performance
              io: native
        resources:
          requests:
            cpu: 2
            memory: 4Gi
      volumes:
      - name: rootdisk
        dataVolume:
          name: vm-root-disk
      - name: datadisk
        dataVolume:
          name: vm-data-disk

Storage I/O Optimization¶

apiVersion: cdi.kubevirt.io/v1beta1
kind: DataVolume
metadata:
  name: optimized-datavolume
spec:
  pvc:
    accessModes:
    - ReadWriteOnce
    resources:
      requests:
        storage: 100Gi
    storageClassName: high-performance-ssd
    # Optimize volume for performance
    volumeMode: Block
  source:
    blank: {}

Network Performance Tuning¶

Cilium Performance Optimization¶

eBPF Optimization Configuration¶

apiVersion: v1
kind: ConfigMap
metadata:
  name: cilium-config
  namespace: kube-system
data:
  # Enable bandwidth manager for better QoS
  enable-bandwidth-manager: "true"

  # Enable local redirect policy for better performance
  enable-local-redirect-policy: "true"

  # Optimize datapath
  datapath-mode: "veth"

  # Enable XDP acceleration where supported
  enable-xdp-acceleration: "true"

  # kube-proxy replacement for better performance
  kube-proxy-replacement: "strict"

  # Optimize for performance
  enable-cilium-endpoint-slice: "true"

Network Device Optimization¶

apiVersion: v1
kind: ConfigMap
metadata:
  name: cilium-config
  namespace: kube-system
data:
  # Specify devices for optimal performance
  devices: "eth0"

  # Enable auto direct node routes
  auto-direct-node-routes: "true"

  # Optimize tunnel protocol
  tunnel: "disabled"

  # Use native routing when possible
  enable-ipv4-masquerade: "false"
  enable-ipv6-masquerade: "false"

VM Network Performance¶

SR-IOV Configuration for High-Performance Networking¶

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: high-performance-network
  namespace: openshift-sriov-network-operator
spec:
  nodeSelector:
    feature.node.kubernetes.io/network-sriov.capable: "true"
  nicSelector:
    vendor: "15b3"
    deviceID: "1017"
  numVfs: 8
  priority: 99
  resourceName: "high_perf_nic"

VM with SR-IOV Network Attachment¶

apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
  name: sriov-vm
spec:
  template:
    spec:
      domain:
        devices:
          interfaces:
          - name: default
            masquerade: {}
          - name: sriov-network
            sriov: {}
        resources:
          requests:
            cpu: 4
            memory: 8Gi
      networks:
      - name: default
        pod: {}
      - name: sriov-network
        multus:
          networkName: high-performance-network

Multi-Network Performance with Multus¶

Dedicated Network Interfaces for Different Traffic Types¶

apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
  name: multi-interface-performance-vm
  namespace: high-performance-workloads
  annotations:
    k8s.v1.cni.cncf.io/networks: |
      [
        {
          "name": "management-network",
          "ips": ["192.168.1.5/24"]
        },
        {
          "name": "storage-network",
          "ips": ["192.168.2.5/24"]
        },
        {
          "name": "sriov-data-network",
          "ips": ["10.0.0.5/24"]
        }
      ]
spec:
  running: true
  template:
    spec:
      domain:
        cpu:
          cores: 16
          dedicatedCpuPlacement: true
          isolateEmulatorThread: true
        memory:
          guest: 32Gi
          hugepages:
            pageSize: 1Gi
        devices:
          interfaces:
          - name: default
            masquerade: {}
          - name: management
            bridge:
              port: []
          - name: storage
            bridge:
              port: []
          - name: sriov-data
            sriov: {}
          disks:
          - name: rootdisk
            disk:
              bus: virtio
              cache: writeback
        resources:
          requests:
            cpu: 16
            memory: 32Gi
            hugepages-1Gi: 32Gi
          limits:
            cpu: 16
            memory: 32Gi
            hugepages-1Gi: 32Gi
      networks:
      - name: default
        pod: {}
      - name: management
        multus:
          networkName: management-network
      - name: storage
        multus:
          networkName: storage-network
      - name: sriov-data
        multus:
          networkName: sriov-data-network
      volumes:
      - name: rootdisk
        dataVolume:
          name: multi-interface-vm-root

High-Performance NAD Configurations¶

# High-performance management network
apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
  name: management-network
  namespace: high-performance-workloads
spec:
  config: |
    {
      "cniVersion": "0.3.1",
      "name": "management-network",
      "type": "macvlan",
      "master": "ens192",
      "mode": "bridge",
      "capabilities": {
        "ips": true
      },
      "ipam": {
        "type": "static"
      }
    }
---
# Dedicated storage network with optimized MTU
apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
  name: storage-network
  namespace: high-performance-workloads
spec:
  config: |
    {
      "cniVersion": "0.3.1",
      "name": "storage-network",
      "type": "macvlan",
      "master": "ens224",
      "mode": "bridge",
      "mtu": 9000,
      "capabilities": {
        "ips": true
      },
      "ipam": {
        "type": "static"
      }
    }
---
# SR-IOV high-performance data network
apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
  name: sriov-data-network
  namespace: high-performance-workloads
spec:
  config: |
    {
      "cniVersion": "0.3.1",
      "name": "sriov-data-network",
      "type": "sriov",
      "deviceID": "1017",
      "vf": 0,
      "spoofchk": "off",
      "trust": "on",
      "capabilities": {
        "ips": true
      },
      "ipam": {
        "type": "static"
      }
    }

Bond Network for High Availability¶

apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
  name: bond-ha-network
  namespace: high-performance-workloads
spec:
  config: |
    {
      "cniVersion": "0.3.1",
      "name": "bond-ha-network",
      "type": "bond",
      "mode": "802.3ad",
      "miimon": "100",
      "updelay": "200",
      "downdelay": "200",
      "links": [
        {
          "name": "ens256"
        },
        {
          "name": "ens257"
        }
      ],
      "ipam": {
        "type": "static"
      }
    }

Cluster Performance Optimization¶

Node-Level Optimizations¶

Performance Profile for Worker Nodes¶

apiVersion: performance.openshift.io/v2
kind: PerformanceProfile
metadata:
  name: high-performance-worker
spec:
  cpu:
    isolated: "2-47"
    reserved: "0-1"
  hugepages:
    defaultHugepagesSize: 1G
    pages:
    - count: 16
      size: 1G
  nodeSelector:
    node-role.kubernetes.io/worker-rt: ""
  realTimeKernel:
    enabled: true
  numa:
    topologyPolicy: "single-numa-node"

Machine Config for Kernel Tuning¶

apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  name: performance-tuning
  labels:
    machineconfiguration.openshift.io/role: worker
spec:
  config:
    ignition:
      version: 3.2.0
    storage:
      files:
      - path: /etc/sysctl.d/99-performance.conf
        mode: 0644
        contents:
          inline: |
            # Network performance tuning
            net.core.rmem_max = 268435456
            net.core.wmem_max = 268435456
            net.ipv4.tcp_rmem = 4096 131072 268435456
            net.ipv4.tcp_wmem = 4096 65536 268435456

            # Virtual memory tuning
            vm.swappiness = 1
            vm.dirty_ratio = 15
            vm.dirty_background_ratio = 5

            # CPU scheduler tuning
            kernel.sched_migration_cost_ns = 5000000

Resource Management Optimization¶

Cluster Resource Allocation¶

apiVersion: v1
kind: ResourceQuota
metadata:
  name: performance-quota
  namespace: high-performance-workloads
spec:
  hard:
    requests.cpu: "100"
    requests.memory: 200Gi
    limits.cpu: "200"
    limits.memory: 400Gi
    hugepages-1Gi: 64Gi
    persistentvolumeclaims: "50"

Priority Classes for Critical Workloads¶

apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: high-performance-priority
value: 1000
globalDefault: false
description: "Priority class for high-performance VMs"
---
apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
  name: critical-vm
spec:
  template:
    spec:
      priorityClassName: high-performance-priority
      domain:
        cpu:
          cores: 8
        memory:
          guest: 16Gi

Monitoring Performance Optimizations¶

Efficient Metrics Collection¶

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: performance-metrics
spec:
  selector:
    matchLabels:
      app: high-performance-app
  endpoints:
  - port: metrics
    interval: 15s  # Reduced interval for better granularity
    scrapeTimeout: 10s
    path: /metrics
    metricRelabelings:
    - sourceLabels: [__name__]
      regex: 'go_.*|process_.*'
      action: drop  # Drop unnecessary metrics

Performance Dashboard Configuration¶

apiVersion: v1
kind: ConfigMap
metadata:
  name: performance-dashboard
data:
  dashboard.json: |
    {
      "dashboard": {
        "title": "RH OVE Performance Dashboard",
        "panels": [
          {
            "title": "VM CPU Usage",
            "type": "graph",
            "targets": [
              {
                "expr": "rate(kubevirt_vm_cpu_usage_seconds_total[5m]) * 100",
                "legendFormat": "{{name}} CPU %"
              }
            ]
          },
          {
            "title": "VM Memory Usage",
            "type": "graph",
            "targets": [
              {
                "expr": "kubevirt_vm_memory_usage_bytes / kubevirt_vm_memory_available_bytes * 100",
                "legendFormat": "{{name}} Memory %"
              }
            ]
          }
        ]
      }
    }

Performance Testing and Benchmarking¶

VM Performance Testing¶

#!/bin/bash
# VM Performance Test Script

VM_NAME="performance-test-vm"
NAMESPACE="testing"

# CPU Performance Test
virtctl console $VM_NAME << EOF
# Install and run CPU benchmark
yum install -y stress-ng
stress-ng --cpu 0 --timeout 60s --metrics-brief
EOF

# Memory Performance Test
virtctl console $VM_NAME << EOF
# Memory bandwidth test
stress-ng --vm 1 --vm-bytes 4G --timeout 60s --metrics-brief
EOF

# Disk I/O Performance Test
virtctl console $VM_NAME << EOF
# Disk performance test
dd if=/dev/zero of=/tmp/testfile bs=1G count=1 oflag=direct
dd if=/tmp/testfile of=/dev/null bs=1G count=1 iflag=direct
rm /tmp/testfile
EOF

Network Performance Testing¶

apiVersion: v1
kind: Pod
metadata:
  name: network-performance-test
spec:
  containers:
  - name: iperf-server
    image: networkstatic/iperf3
    command: ['iperf3', '-s']
    ports:
    - containerPort: 5201
  - name: iperf-client
    image: networkstatic/iperf3
    command: ['sleep', '3600']

Performance Troubleshooting¶

Common Performance Issues¶

High CPU Usage¶

# Identify CPU-intensive processes
oc adm top pods --all-namespaces --sort-by=cpu

# Check node CPU utilization
oc adm top nodes

# Analyze CPU usage patterns
virtctl console <vm-name>
top -p 1

Memory Pressure¶

# Check memory usage
oc adm top pods --all-namespaces --sort-by=memory

# Verify hugepages allocation
oc get nodes -o custom-columns=NAME:.metadata.name,HUGEPAGES:.status.allocatable.hugepages-1Gi

# Check for memory leaks in VM
virtctl console <vm-name>
free -h
cat /proc/meminfo

Storage Performance Issues¶

# Check storage performance metrics
oc get pvc
oc describe pvc <pvc-name>

# Monitor I/O patterns
virtctl console <vm-name>
iostat -x 1

# Check storage backend performance
oc get nodes -o wide

Best Practices Summary¶

VM Performance Best Practices¶

CPU Optimization
Use CPU pinning for latency-sensitive workloads
Enable NUMA topology awareness
Configure appropriate CPU limits and requests
Memory Optimization
Use hugepages for memory-intensive applications
Configure appropriate memory ballooning
Monitor memory usage patterns
Storage Optimization
Use high-performance storage classes for critical workloads
Optimize disk cache settings
Consider using block storage for high I/O workloads
Network Optimization
Use SR-IOV for high-bandwidth applications
Optimize Cilium configuration for performance
Consider DPDK for packet processing workloads

Monitoring and Maintenance¶

Regular Performance Reviews
Monitor KPIs continuously
Perform regular performance testing
Document performance baselines
Capacity Planning
Plan for growth and scaling
Monitor resource utilization trends
Implement proper resource quotas
Optimization Cycles
Regular performance tuning reviews
Test optimizations in non-production environments
Document all performance changes

This performance tuning guide provides comprehensive strategies for optimizing the RH OVE ecosystem. Regular application of these practices ensures optimal performance for virtualized workloads while maintaining system stability and reliability.