ADR-005: Cilium CNI with Multus Multi-Network Strategy¶
Status¶
Accepted
Date¶
2024-12-01
Context¶
The RH OVE ecosystem requires advanced networking capabilities to support both container and VM workloads with enterprise-grade security, performance, and observability. Traditional iptables-based CNI solutions lack the performance and security features needed for modern hybrid workloads.
Decision¶
We will implement Cilium as the primary CNI with Multus for multi-network support, providing eBPF-powered networking with advanced security and observability capabilities.
Rationale¶
Advantages of Cilium¶
- eBPF Performance: Superior performance compared to iptables-based solutions
- Identity-Aware Security: Security policies based on workload identity, not IP addresses
- L7 Security: Application-layer security without sidecar proxies
- Service Mesh Capabilities: Built-in service mesh functionality
- Red Hat Certification: Certified CNI plugin for OpenShift
- Hubble Observability: Deep network visibility and monitoring
- Transparent Encryption: Built-in WireGuard and IPsec support
Advantages of Multus Integration¶
- Multi-Network Support: Attach multiple network interfaces to VMs
- Legacy Network Integration: Support for existing VLAN-based networks
- Performance Networks: SR-IOV for high-performance workloads
- Network Segmentation: Separate management, storage, and data networks
Alternatives Considered¶
1. OVN-Kubernetes (OpenShift Default)¶
- Pros: Native OpenShift integration, mature
- Cons: Limited eBPF features, performance overhead
- Rejected: Cilium provides superior performance and security
2. Calico¶
- Pros: Strong network policies, eBPF support
- Cons: No built-in service mesh, complex multi-network setup
- Rejected: Cilium offers better integrated solution
3. Flannel¶
- Pros: Simple, lightweight
- Cons: Limited security features, no eBPF support
- Rejected: Insufficient for enterprise requirements
Implementation Details¶
Cilium Configuration¶
apiVersion: v1
kind: ConfigMap
metadata:
name: cilium-config
namespace: kube-system
data:
# Enable Cilium features
enable-ipv4: "true"
enable-ipv6: "false"
# eBPF configuration
enable-bpf-masquerade: "true"
enable-host-reachable-services: "true"
# Security features
enable-l7-proxy: "true"
enable-policy: "default"
policy-enforcement-mode: "default"
# Service mesh capabilities
enable-envoy-config: "true"
# Encryption
enable-wireguard: "true"
wireguard-userspace-fallback: "true"
# Observability
enable-hubble: "true"
hubble-listen-address: ":4244"
hubble-metrics-server: ":9091"
hubble-metrics: |
dns:query;ignoreAAAA
drop
tcp
flow
icmp
http
# Performance optimizations
enable-bandwidth-manager: "true"
enable-local-redirect-policy: "true"
kube-proxy-replacement: "strict"
Multus Installation¶
apiVersion: operator.openshift.io/v1
kind: Network
metadata:
name: cluster
spec:
additionalNetworks:
- name: management-network
namespace: default
type: Raw
rawCNIConfig: |
{
"cniVersion": "0.3.1",
"name": "management-network",
"type": "macvlan",
"master": "ens192",
"mode": "bridge",
"ipam": {
"type": "static"
}
}
Network Attachment Definitions¶
# Management Network
apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
name: management-net
namespace: vm-infrastructure
spec:
config: |
{
"cniVersion": "0.3.1",
"name": "management-net",
"type": "macvlan",
"master": "ens192",
"mode": "bridge",
"ipam": {
"type": "static",
"addresses": [
{
"address": "192.168.100.0/24",
"gateway": "192.168.100.1"
}
]
}
}
---
# High-Performance SR-IOV Network
apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
name: sriov-net
namespace: vm-production
spec:
config: |
{
"cniVersion": "0.3.1",
"name": "sriov-net",
"type": "sriov",
"deviceID": "1017",
"vf": 0,
"ipam": {
"type": "static"
}
}
Identity-Aware Network Policies¶
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
name: web-to-database-policy
namespace: app-web-prod
spec:
endpointSelector:
matchLabels:
app: web-frontend
egress:
- toEndpoints:
- matchLabels:
app: database
environment: production
toPorts:
- ports:
- port: "5432"
protocol: TCP
rules:
http:
- method: "GET"
path: "/health"
L7 Security Policies¶
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
name: api-security-policy
namespace: app-api-prod
spec:
endpointSelector:
matchLabels:
app: api-server
ingress:
- fromEndpoints:
- matchLabels:
app: web-frontend
toPorts:
- ports:
- port: "8080"
protocol: TCP
rules:
http:
- method: "GET"
path: "/api/v1/.*"
- method: "POST"
path: "/api/v1/users"
headers:
- "Content-Type: application/json"
VM Multi-Network Configuration¶
apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
name: multi-network-vm
namespace: vm-infrastructure
annotations:
k8s.v1.cni.cncf.io/networks: |
[
{
"name": "management-net",
"ips": ["192.168.100.10/24"]
},
{
"name": "storage-net",
"ips": ["10.0.1.10/24"]
}
]
spec:
running: true
template:
spec:
domain:
devices:
interfaces:
- name: default
masquerade: {}
- name: management
bridge: {}
- name: storage
bridge: {}
networks:
- name: default
pod: {}
- name: management
multus:
networkName: management-net
- name: storage
multus:
networkName: storage-net
Security Implementation¶
Transparent Encryption¶
# WireGuard encryption configuration
apiVersion: v1
kind: ConfigMap
metadata:
name: cilium-config
namespace: kube-system
data:
enable-wireguard: "true"
wireguard-userspace-fallback: "true"
encryption-node: "true"
Zero Trust Network Policies¶
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
name: default-deny-all
namespace: app-web-prod
spec:
endpointSelector: {}
ingress: []
egress:
# Allow DNS
- toEndpoints:
- matchLabels:
k8s:io.kubernetes.pod.namespace: kube-system
k8s:k8s-app: kube-dns
toPorts:
- ports:
- port: "53"
protocol: UDP
Observability with Hubble¶
Hubble Relay Configuration¶
apiVersion: v1
kind: ConfigMap
metadata:
name: hubble-config
namespace: kube-system
data:
config.yaml: |
server:
address: 0.0.0.0:4245
relay:
address: hubble-relay.kube-system.svc.cluster.local:80
tls:
enabled: false
Network Flow Monitoring¶
# Monitor network flows
hubble observe --namespace app-web-prod
# Check policy violations
hubble observe --verdict DENIED
# Monitor specific VM traffic
hubble observe --pod vm-database-xxx
Performance Optimization¶
eBPF Host Routing¶
apiVersion: v1
kind: ConfigMap
metadata:
name: cilium-config
namespace: kube-system
data:
enable-host-routing: "true"
enable-external-ips: "true"
enable-node-port: "true"
enable-host-port: "true"
Bandwidth Management¶
apiVersion: cilium.io/v2
kind: CiliumBandwidthPolicy
metadata:
name: bandwidth-limit
namespace: app-web-prod
spec:
endpointSelector:
matchLabels:
app: web-frontend
egress:
- bandwidth: "100M"
- bandwidth: "1G"
dscp: 46 # High priority traffic
Consequences¶
Positive¶
- Superior Performance: eBPF provides 10-100x better performance than iptables
- Enhanced Security: Identity-aware policies and L7 security without sidecars
- Deep Observability: Hubble provides comprehensive network visibility
- Future-Proof: eBPF is the future of Linux networking
- Multi-Network Support: Seamless integration with legacy and high-performance networks
Negative¶
- Learning Curve: Teams need to learn eBPF concepts and Cilium specifics
- Debugging Complexity: eBPF programs can be harder to debug than traditional networking
- Resource Requirements: Higher memory usage compared to simpler CNI solutions
- Compatibility Concerns: Some legacy applications may need network policy adjustments
Migration Strategy¶
Phase 1: Preparation¶
- Audit existing network policies and requirements
- Set up test clusters with Cilium/Multus
- Train operations team on eBPF and Cilium concepts
Phase 2: Non-Production Deployment¶
- Deploy Cilium in development and staging clusters
- Migrate network policies to Cilium format
- Implement Hubble monitoring and alerting
Phase 3: Production Migration¶
- Schedule maintenance window for CNI migration
- Deploy Cilium with careful monitoring
- Gradually enable advanced features (encryption, L7 policies)
Phase 4: Advanced Features¶
- Enable service mesh capabilities
- Implement advanced security policies
- Optimize performance settings based on workload patterns
Monitoring and Alerting¶
Key Metrics¶
- Network throughput per namespace/pod
- Policy enforcement latency
- eBPF program load and execution time
- Hubble flow processing rate
- Encryption overhead metrics
Alerting Rules¶
groups:
- name: cilium-alerts
rules:
- alert: CiliumAgentDown
expr: up{job="cilium-agent"} == 0
for: 5m
labels:
severity: critical
- alert: NetworkPolicyViolation
expr: increase(cilium_policy_verdicts_total{verdict="DENIED"}[5m]) > 10
labels:
severity: warning
This network architecture provides enterprise-grade performance, security, and observability for the RH OVE ecosystem while supporting both modern cloud-native applications and traditional VM workloads.