📋 k8s-analyzer Data Models

This document describes the data models and structures used by k8s-analyzer for representing Kubernetes resources, relationships, and analysis results.

Core Data Models

KubernetesResource

The primary model representing a Kubernetes resource.

class KubernetesResource(BaseModel):
    api_version: str
    kind: str
    metadata: ResourceMetadata
    spec: Dict[str, Any] = Field(default_factory=dict)
    status: Optional[Dict[str, Any]] = None
    relationships: List[ResourceRelationship] = Field(default_factory=list)
    health_status: ResourceStatus = ResourceStatus.UNKNOWN
    issues: List[str] = Field(default_factory=list)

Fields: - api_version: Kubernetes API version (e.g., "v1", "apps/v1") - kind: Resource type (e.g., "Pod", "Service", "Deployment") - metadata: Resource metadata including name, namespace, labels, etc. - spec: Resource specification as defined in Kubernetes - status: Current status of the resource (optional) - relationships: List of relationships to other resources - health_status: Computed health status - issues: List of identified configuration issues

Properties: - ref: Returns a ResourceReference for this resource - full_name: Returns namespace/name or just name for cluster-scoped resources

ResourceMetadata

Contains Kubernetes resource metadata.

class ResourceMetadata(BaseModel):
    name: str
    namespace: Optional[str] = None
    uid: Optional[str] = None
    resource_version: Optional[str] = None
    generation: Optional[int] = None
    creation_timestamp: Optional[datetime] = None
    deletion_timestamp: Optional[datetime] = None
    labels: Dict[str, str] = Field(default_factory=dict)
    annotations: Dict[str, str] = Field(default_factory=dict)
    owner_references: List[Dict[str, Any]] = Field(default_factory=list)
    finalizers: List[str] = Field(default_factory=list)

Key Fields: - name: Resource name (required) - namespace: Namespace (None for cluster-scoped resources) - uid: Unique identifier assigned by Kubernetes - creation_timestamp: When the resource was created - labels: Key-value pairs for resource identification - annotations: Key-value pairs for additional metadata - owner_references: References to resources that own this resource

ResourceReference

Lightweight reference to a Kubernetes resource.

class ResourceReference(BaseModel):
    api_version: str
    kind: str
    name: str
    namespace: Optional[str] = None
    uid: Optional[str] = None

Usage: - Used in relationships to reference target resources - Provides string representation and hashing for collections - Enables efficient lookups and comparisons

ResourceRelationship

Represents a relationship between two Kubernetes resources.

class ResourceRelationship(BaseModel):
    source: ResourceReference
    target: ResourceReference
    relationship_type: RelationshipType
    direction: RelationshipDirection = RelationshipDirection.OUTBOUND
    metadata: Dict[str, Any] = Field(default_factory=dict)

Fields: - source: Source resource reference - target: Target resource reference - relationship_type: Type of relationship (see RelationshipType enum) - direction: Direction of the relationship - metadata: Additional relationship metadata

Enumerations

RelationshipType

Defines the types of relationships between resources.

class RelationshipType(str, Enum):
    OWNS = "owns"              # Resource owns another (Deployment owns ReplicaSet)
    USES = "uses"              # Resource uses another (Pod uses ConfigMap)
    EXPOSES = "exposes"        # Resource exposes another (Service exposes Pod)
    BINDS = "binds"            # Resource binds to another (PVC binds to PV)
    REFERENCES = "references"   # Resource references another (Pod references ServiceAccount)
    DEPENDS_ON = "depends_on"   # Resource depends on another (Pod depends on Node)
    MANAGES = "manages"         # Resource manages another (ReplicaSet manages Pods)
    SELECTS = "selects"         # Resource selects another (Service selects Pods)

RelationshipDirection

Indicates the direction of a relationship.

class RelationshipDirection(str, Enum):
    OUTBOUND = "outbound"       # Relationship points away from source
    INBOUND = "inbound"         # Relationship points toward source
    BIDIRECTIONAL = "bidirectional"  # Relationship works both ways

ResourceStatus

Represents the health status of a resource.

class ResourceStatus(str, Enum):
    HEALTHY = "healthy"     # Resource is functioning correctly
    WARNING = "warning"     # Resource has issues but is functional
    ERROR = "error"         # Resource has critical issues
    UNKNOWN = "unknown"     # Resource status cannot be determined

Container Data Models

ClusterState

Represents the complete analyzed state of a Kubernetes cluster.

class ClusterState(BaseModel):
    resources: List[KubernetesResource] = Field(default_factory=list)
    relationships: List[ResourceRelationship] = Field(default_factory=list)
    analysis_timestamp: datetime = Field(default_factory=datetime.now)
    cluster_info: Dict[str, Any] = Field(default_factory=dict)
    summary: Dict[str, Any] = Field(default_factory=dict)

Methods: - add_resource(resource): Add a resource to the cluster state - get_resources_by_kind(kind): Get all resources of a specific kind - get_resource_by_ref(ref): Find a resource by its reference - get_namespaces(): Get all unique namespaces in the cluster - generate_summary(): Generate cluster analysis summary

Entity-Relationship Schema

Below is an Entity-Relationship diagram that highlights the primary data models and their connections within k8s-analyzer. This schema helps to visualize how different Kubernetes resources relate to each other and supports deep analysis of the cluster. The ER diagram showcases relationships like ownership, usage, and dependencies between resources.

ER Diagram Overview

erDiagram
    KubernetesResource {
        string api_version
        string kind
        json spec
        json status
        string health_status
        string-array issues
    }

    ResourceMetadata {
        string name
        string namespace
        string uid
        string resource_version
        int generation
        datetime creation_timestamp
        datetime deletion_timestamp
        json labels
        json annotations
        json owner_references
        string-array finalizers
    }

    ResourceRelationship {
        string relationship_type
        string direction
        json metadata
        datetime created_at
    }

    ResourceReference {
        string api_version
        string kind
        string name
        string namespace
        string uid
    }

    ClusterState {
        datetime analysis_timestamp
        json cluster_info
        json summary
    }

    %% Relationships
    KubernetesResource ||--|| ResourceMetadata : has
    KubernetesResource ||--o{ ResourceRelationship : source
    KubernetesResource ||--o{ ResourceRelationship : target
    ResourceRelationship ||--|| ResourceReference : source_ref
    ResourceRelationship ||--|| ResourceReference : target_ref
    ClusterState ||--o{ KubernetesResource : contains
    ClusterState ||--o{ ResourceRelationship : tracks

Resources Table

Stores all Kubernetes resources with their metadata and analysis results.

CREATE TABLE resources (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    uid TEXT UNIQUE,
    name TEXT NOT NULL,
    namespace TEXT,
    kind TEXT NOT NULL,
    api_version TEXT NOT NULL,
    health_status TEXT NOT NULL,
    issues TEXT,  -- JSON array of issue descriptions
    labels TEXT,  -- JSON object of labels
    annotations TEXT,  -- JSON object of annotations
    spec TEXT,    -- JSON object of resource spec
    status TEXT,  -- JSON object of resource status
    creation_timestamp DATETIME,
    deletion_timestamp DATETIME,
    resource_version TEXT,
    generation INTEGER,
    owner_references TEXT,  -- JSON array of owner references
    finalizers TEXT,  -- JSON array of finalizers
    created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
    updated_at DATETIME DEFAULT CURRENT_TIMESTAMP
);

Indexes:

CREATE INDEX idx_resources_kind ON resources (kind);
CREATE INDEX idx_resources_namespace ON resources (namespace);
CREATE INDEX idx_resources_health_status ON resources (health_status);
CREATE INDEX idx_resources_creation_timestamp ON resources (creation_timestamp);
CREATE UNIQUE INDEX idx_resources_uid ON resources (uid);

Relationships Table

Stores relationships between resources.

CREATE TABLE relationships (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    source_uid TEXT NOT NULL,
    source_kind TEXT NOT NULL,
    source_name TEXT NOT NULL,
    source_namespace TEXT,
    target_uid TEXT,
    target_kind TEXT NOT NULL,
    target_name TEXT NOT NULL,
    target_namespace TEXT,
    target_resource TEXT NOT NULL,  -- Full target reference
    relationship_type TEXT NOT NULL,
    direction TEXT NOT NULL DEFAULT 'outbound',
    strength REAL DEFAULT 1.0,
    metadata TEXT,  -- JSON object of relationship metadata
    created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
    FOREIGN KEY (source_uid) REFERENCES resources (uid)
);

Indexes:

CREATE INDEX idx_relationships_source_uid ON relationships (source_uid);
CREATE INDEX idx_relationships_target_resource ON relationships (target_resource);
CREATE INDEX idx_relationships_type ON relationships (relationship_type);

Resource Health History Table

Tracks changes in resource health over time.

CREATE TABLE resource_health_history (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    resource_uid TEXT NOT NULL,
    health_status TEXT NOT NULL,
    issues TEXT,  -- JSON array of issues at this point in time
    timestamp DATETIME NOT NULL,
    analysis_run_id TEXT,  -- Optional: link to analysis run
    FOREIGN KEY (resource_uid) REFERENCES resources (uid)
);

Analysis Summary Table

Stores high-level analysis metadata and statistics.

CREATE TABLE analysis_summary (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    analysis_timestamp DATETIME NOT NULL,
    analysis_duration_seconds REAL,
    total_resources INTEGER NOT NULL,
    total_relationships INTEGER NOT NULL,
    health_summary TEXT,  -- JSON object with health statistics
    resource_types TEXT,  -- JSON object with resource type counts
    namespace_summary TEXT,  -- JSON object with namespace statistics
    cluster_info TEXT,  -- JSON object with cluster metadata
    created_at DATETIME DEFAULT CURRENT_TIMESTAMP
);

Relationship Detection Logic

Ownership Relationships

Detected through metadata.ownerReferences:

def detect_ownership_relationships(resource: KubernetesResource) -> List[ResourceRelationship]:
    relationships = []
    for owner_ref in resource.metadata.owner_references:
        target = ResourceReference(
            api_version=owner_ref.get("apiVersion"),
            kind=owner_ref.get("kind"),
            name=owner_ref.get("name"),
            namespace=resource.metadata.namespace,
            uid=owner_ref.get("uid")
        )
        relationships.append(ResourceRelationship(
            source=resource.ref,
            target=target,
            relationship_type=RelationshipType.OWNS,
            direction=RelationshipDirection.INBOUND
        ))
    return relationships

Usage Relationships

Detected through spec analysis:

def detect_usage_relationships(resource: KubernetesResource) -> List[ResourceRelationship]:
    relationships = []

    if resource.kind == "Pod":
        # ConfigMap usage
        for volume in resource.spec.get("volumes", []):
            if "configMap" in volume:
                configmap_name = volume["configMap"]["name"]
                target = ResourceReference(
                    api_version="v1",
                    kind="ConfigMap",
                    name=configmap_name,
                    namespace=resource.metadata.namespace
                )
                relationships.append(ResourceRelationship(
                    source=resource.ref,
                    target=target,
                    relationship_type=RelationshipType.USES
                ))

        # Secret usage
        for volume in resource.spec.get("volumes", []):
            if "secret" in volume:
                secret_name = volume["secret"]["secretName"]
                target = ResourceReference(
                    api_version="v1",
                    kind="Secret",
                    name=secret_name,
                    namespace=resource.metadata.namespace
                )
                relationships.append(ResourceRelationship(
                    source=resource.ref,
                    target=target,
                    relationship_type=RelationshipType.USES
                ))

    return relationships

Service Selection Relationships

Detected through label selectors:

def detect_service_relationships(service: KubernetesResource) -> List[ResourceRelationship]:
    relationships = []

    if service.kind == "Service":
        selector = service.spec.get("selector", {})
        if selector:
            # This creates a conceptual relationship
            # Actual pod matching would require cluster state
            relationships.append(ResourceRelationship(
                source=service.ref,
                target=ResourceReference(
                    api_version="v1",
                    kind="Pod",
                    name="*",  # Wildcard for selector-based relationship
                    namespace=service.metadata.namespace
                ),
                relationship_type=RelationshipType.SELECTS,
                metadata={"selector": selector}
            ))

    return relationships

Health Assessment Logic

Resource Health Evaluation

def assess_resource_health(resource: KubernetesResource) -> Tuple[ResourceStatus, List[str]]:
    issues = []
    status = ResourceStatus.HEALTHY

    # Check for deletion timestamp
    if resource.metadata.deletion_timestamp:
        issues.append("Resource is being deleted")
        status = ResourceStatus.WARNING

    # Pod-specific health checks
    if resource.kind == "Pod":
        pod_status = resource.status.get("phase")
        if pod_status == "Failed":
            issues.append("Pod is in Failed state")
            status = ResourceStatus.ERROR
        elif pod_status == "Pending":
            issues.append("Pod is in Pending state")
            status = ResourceStatus.WARNING

        # Check container statuses
        container_statuses = resource.status.get("containerStatuses", [])
        for container_status in container_statuses:
            if not container_status.get("ready", False):
                issues.append(f"Container {container_status.get('name')} is not ready")
                status = max(status, ResourceStatus.WARNING)

            waiting_state = container_status.get("state", {}).get("waiting")
            if waiting_state:
                reason = waiting_state.get("reason", "Unknown")
                issues.append(f"Container {container_status.get('name')} is waiting: {reason}")
                if reason in ["ImagePullBackOff", "CrashLoopBackOff"]:
                    status = ResourceStatus.ERROR

    # Service-specific health checks
    elif resource.kind == "Service":
        if not resource.spec.get("selector"):
            issues.append("Service has no selector")
            status = ResourceStatus.WARNING

    # ConfigMap/Secret size checks
    elif resource.kind in ["ConfigMap", "Secret"]:
        data = resource.data or {}
        total_size = sum(len(str(v)) for v in data.values())
        if total_size > 1048576:  # 1MB
            issues.append(f"{resource.kind} is large ({total_size} bytes)")
            status = ResourceStatus.WARNING

    return status, issues

Data Validation

Pydantic Validators

from pydantic import validator

class KubernetesResource(BaseModel):
    # ... fields ...

    @validator('kind')
    def validate_kind(cls, v):
        valid_kinds = [
            'Pod', 'Service', 'ConfigMap', 'Secret', 'PersistentVolume',
            'PersistentVolumeClaim', 'Deployment', 'ReplicaSet', 'StatefulSet',
            'DaemonSet', 'Job', 'CronJob', 'Ingress', 'ServiceAccount',
            'Role', 'RoleBinding', 'ClusterRole', 'ClusterRoleBinding'
        ]
        if v not in valid_kinds:
            raise ValueError(f'Unsupported resource kind: {v}')
        return v

    @validator('api_version')
    def validate_api_version(cls, v):
        if not v or '/' not in v and v not in ['v1']:
            if not v.startswith(('v1', 'apps/', 'extensions/', 'networking.k8s.io/', 'rbac.authorization.k8s.io/')):
                raise ValueError(f'Invalid API version: {v}')
        return v

Usage Examples

Creating a Resource

from k8s_analyzer.models import KubernetesResource, ResourceMetadata, ResourceStatus

# Create a Pod resource
pod = KubernetesResource(
    api_version="v1",
    kind="Pod",
    metadata=ResourceMetadata(
        name="my-pod",
        namespace="default",
        labels={"app": "my-app"}
    ),
    spec={
        "containers": [{
            "name": "app",
            "image": "nginx:latest"
        }]
    },
    health_status=ResourceStatus.HEALTHY
)

Adding Relationships

from k8s_analyzer.models import ResourceRelationship, RelationshipType, ResourceReference

# Create a ConfigMap reference
configmap_ref = ResourceReference(
    api_version="v1",
    kind="ConfigMap",
    name="app-config",
    namespace="default"
)

# Add usage relationship
relationship = ResourceRelationship(
    source=pod.ref,
    target=configmap_ref,
    relationship_type=RelationshipType.USES
)

pod.relationships.append(relationship)

Querying Cluster State

# Find all pods
pods = cluster_state.get_resources_by_kind("Pod")

# Find pods with issues
problematic_pods = [
    pod for pod in pods 
    if pod.health_status in [ResourceStatus.WARNING, ResourceStatus.ERROR]
]

# Get all namespaces
namespaces = cluster_state.get_namespaces()

# Generate summary
summary = cluster_state.generate_summary()
print(f"Total resources: {summary['total_resources']}")
print(f"Resource types: {summary['resource_type_distribution']}")

This data model documentation provides a comprehensive understanding of how k8s-analyzer structures and processes Kubernetes data for analysis and reporting.