Day-2 Operations¶

Overview¶

This document covers day-2 operational activities essential for maintaining the multi-cluster RH OVE ecosystem. It includes guidelines for managing the management cluster and multiple application clusters, covering ongoing maintenance, upgrades, performance tuning, and operational tasks across the entire fleet.

Maintenance Tasks¶

Regular Cluster Health Checks¶

Node Status Monitoring: Regularly check node health and availability.
```
oc get nodes -o wide
```
Resource Usage Monitoring: Monitor CPU, memory, and storage utilization.
```
oc adm top nodes
oc adm top pods --all-namespaces
```

Backup Management¶

Review Backup Logs: Ensure completion and verify logs for any anomalies.
```
oc logs -n rubrik rubrik-agent-
```
Data Integrity Checks: Periodically verify backup integrity and accessibility.

Upgrades¶

OpenShift Cluster Upgrades¶

Plan Your Upgrade: Evaluate impact, and schedule during maintenance windows.
Review OpenShift Upgrade Guide
In-place Upgrades: Use OpenShift's upgrade capabilities to update cluster components.
```
oc adm upgrade
```

Component Upgrades¶

Operator Lifecycle Management (OLM): Upgrade operators using OLM.
```
oc get clusterserviceversions -n openshift-operators
```
KubeVirt Upgrades: Follow the KubeVirt upgrade process for virtualization components.
Refer to KubeVirt Upgrade Guide

Performance Tuning¶

Resource Balancing¶

Node Selector and Affinity Rules: Ensure workloads are distributed evenly.

apiVersion: v1
kind: Pod
metadata:
  name: example-pod
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: disktype
            operator: In
            values:
            - ssd

Vertical and Horizontal Scaling: Utilize HPA and VPA for scaling applications.

Network Optimization¶

Cilium Policy Management: Optimize and tune Cilium network policies for performance.

apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: optimized-policy
spec:
  endpointSelector:
    matchLabels:
      app: myapp
  ingress:
  - fromEndpoints:
    - matchLabels:
        app: trusted

Security and Compliance¶

Regular Security Audits¶

Policy Compliance: Ensure adherence to Kyverno policies and security standards.
```
kubectl get cpol -o yaml
```
Vulnerability Scans: Run regular vulnerability assessments on container images and hosts.

Documentation and Reporting¶

Keeping Documentation Up-to-Date¶

Change Logs: Maintain a changelog for all configurations and updates.
Operational Runbooks: Create and update runbooks for standard operations.

Performance and Utilization Reports¶

Utilize Metrics Dashboards: Use Grafana and Prometheus to generate reports.

Conclusion¶

Following these day-2 operation guidelines helps maintain a stable, secure, and efficient RH OVE environment. Regular monitoring, updates, optimizations, and documentation ensure long-term success and reliability of the platform.