Value Stream Mapping¶
Document Type: Business Architecture
Status: Draft
Version: 1.0
Last Updated: 2024-12-30
Owner: Architecture Team
Purpose¶
This document maps the end-to-end value streams that flow through Dokploy, from user need to delivered value. Value stream mapping helps identify bottlenecks, waste, and optimization opportunities in how the platform delivers value to stakeholders.
Value Stream Overview¶
Dokploy delivers value through five primary value streams:
- Application Deployment: From code to running application
- Infrastructure Provisioning: From resource need to available capacity
- Incident Response: From problem detection to resolution
- User Onboarding: From new user to productive user
- Feature Development: From idea to deployed feature
Value Stream 1: Application Deployment¶
Overview¶
Trigger: Developer has new code to deploy
Outcome: Application running in production, accessible to users
Frequency: 10-100+ times per day per team
Critical Success Factor: Speed and reliability
Current State Map¶
graph LR
A[Code Change] -->|Push| B[Git Repository]
B -->|Webhook| C[Dokploy Receives Event]
C -->|Queue| D[Build Process]
D -->|Image| E[Container Registry]
E -->|Pull| F[Docker Swarm]
F -->|Deploy| G[Running Container]
G -->|Health Check| H[Traffic Routing]
H -->|Live| I[Users Access App]
style A fill:#e1f5ff
style I fill:#c8e6c9
Detailed Steps¶
Step 1: Code Commit¶
- Actor: Developer
- Actions:
- Write code
- Run local tests
- Commit to Git
- Push to repository
- Duration: 5-60 minutes (variable)
- Value-Add: Yes (creating feature/fix)
- Wait Time: 0
Step 2: Git Webhook Trigger¶
- Actor: Git provider (GitHub, GitLab)
- Actions:
- Detect push event
- Call Dokploy webhook endpoint
- Duration: 1-5 seconds
- Value-Add: No (waiting)
- Wait Time: 1-5 seconds
- Automation: Fully automated
Step 3: Webhook Receipt & Validation¶
- Actor: Dokploy API
- Actions:
- Verify webhook signature
- Parse payload
- Identify target application
- Queue deployment job
- Duration: 100-500ms
- Value-Add: Yes (security, routing)
- Wait Time: 0
- Automation: Fully automated
Step 4: Build Queuing¶
- Actor: Deployment queue
- Actions:
- Add job to queue
- Wait for worker availability
- Duration: 0-60 seconds (depends on queue depth)
- Value-Add: No (waiting)
- Wait Time: 0-60 seconds
- Bottleneck: High load periods
Step 5: Image Build¶
- Actor: Build worker
- Actions:
- Clone repository
- Detect build context (Dockerfile, Buildpack)
- Build Docker image
- Tag image
- Push to registry
- Duration: 30 seconds - 10 minutes
- Value-Add: Yes (creating deployable artifact)
- Wait Time: 0
- Automation: Fully automated
- Bottleneck: Large dependencies, slow network
Step 6: Registry Push¶
- Actor: Docker Registry
- Actions:
- Receive image layers
- Store image
- Confirm receipt
- Duration: 5-60 seconds
- Value-Add: No (storage)
- Wait Time: 0
- Automation: Fully automated
Step 7: Service Update¶
- Actor: Docker Swarm
- Actions:
- Pull new image
- Create new container
- Start container
- Wait for health check
- Stop old container (rolling)
- Duration: 10-120 seconds
- Value-Add: Yes (deploying)
- Wait Time: Health check period (10-30s)
- Automation: Fully automated
Step 8: Traffic Routing¶
- Actor: Traefik
- Actions:
- Detect new container
- Update routing rules
- Start routing traffic
- Duration: 1-5 seconds
- Value-Add: Yes (making available)
- Wait Time: 0
- Automation: Fully automated
Step 9: Verification¶
- Actor: Developer
- Actions:
- Check deployment status
- Verify application works
- Monitor for errors
- Duration: 1-10 minutes
- Value-Add: Yes (quality assurance)
- Wait Time: 0
Metrics¶
| Metric | Target | Current | Gap |
|---|---|---|---|
| Lead Time (commit to live) | <5 minutes | 2-15 minutes | Optimize build |
| Process Time (actual work) | ~2 minutes | ~2 minutes | ✅ |
| Wait Time (queuing, health checks) | <30 seconds | 0-90 seconds | Reduce queue |
| Deployment Success Rate | >95% | ~92% | Improve health checks |
| Rollback Time | <2 minutes | 1-3 minutes | ✅ |
Value Stream Efficiency¶
Waste Identification¶
Type 1: Waiting - Queue wait time (0-60s) - Health check wait (10-30s) - Improvement: Increase worker pool, optimize health checks
Type 2: Overprocessing - Rebuild unchanged dependencies every time - Improvement: Layer caching, dependency caching
Type 3: Defects - Failed deployments due to config errors - Improvement: Pre-deployment validation, config templates
Type 4: Transportation - Pushing large images to registry - Improvement: Use smaller base images, multi-stage builds
Improvement Opportunities¶
Quick Wins (Implement in v1.5)¶
- Build caching: 40% faster builds
- Parallel builds: Handle multiple simultaneous deployments
- Smarter health checks: Reduce wait time by 50%
- Pre-flight validation: Catch errors before deployment
Medium Term (v2.0)¶
- Predictive scaling: Pre-scale before traffic spikes
- Progressive delivery: Canary deployments for safer updates
- Build analytics: Identify slow build steps
Long Term (v3.0)¶
- Edge deployment: Deploy closer to users
- Smart caching: AI-powered cache optimization
Value Stream 2: Infrastructure Provisioning¶
Overview¶
Trigger: Need for new compute/storage capacity
Outcome: Resources available and ready for workloads
Frequency: 5-20 times per week per team
Critical Success Factor: Speed and cost-efficiency
Current State Map¶
graph LR
A[Resource Need] -->|Request| B[Provision API]
B -->|Create| C[Docker Service]
C -->|Allocate| D[Container]
D -->|Mount| E[Volumes]
E -->|Configure| F[Network]
F -->|Start| G[Ready]
style A fill:#e1f5ff
style G fill:#c8e6c9
Detailed Steps¶
Step 1: Identify Need¶
- Actor: Developer/Team Lead
- Actions: Determine resource requirements
- Duration: 5-30 minutes
- Value-Add: Yes (planning)
Step 2: Configure Resource¶
- Actor: Developer
- Actions:
- Open Dokploy UI
- Select resource type (database, application)
- Configure settings (size, replicas, etc.)
- Review estimated cost
- Duration: 2-10 minutes
- Value-Add: Yes (configuration)
Step 3: Submit Request¶
- Actor: Dokploy API
- Actions:
- Validate configuration
- Check quotas
- Authorize request
- Duration: 100-500ms
- Value-Add: Yes (validation)
Step 4: Provision Service¶
- Actor: Docker Swarm
- Actions:
- Pull image
- Create service
- Schedule containers
- Allocate resources
- Duration: 10-60 seconds
- Value-Add: Yes (provisioning)
Step 5: Configure Networking¶
- Actor: Docker Swarm + Traefik
- Actions:
- Assign IP address
- Configure DNS
- Set up load balancing
- Configure TLS
- Duration: 5-20 seconds
- Value-Add: Yes (networking)
Step 6: Storage Setup¶
- Actor: Docker Volumes
- Actions:
- Create volume
- Mount to container
- Set permissions
- Duration: 1-10 seconds
- Value-Add: Yes (persistence)
Step 7: Health Verification¶
- Actor: Dokploy
- Actions:
- Run health checks
- Verify connectivity
- Test access
- Duration: 5-30 seconds
- Value-Add: Yes (verification)
Step 8: Notify & Document¶
- Actor: Dokploy
- Actions:
- Send notification to requester
- Update inventory
- Generate connection info
- Duration: 1-5 seconds
- Value-Add: Yes (communication)
Metrics¶
| Metric | Target | Current | Gap |
|---|---|---|---|
| Time to Available | <2 minutes | 1-3 minutes | ✅ |
| Configuration Errors | <5% | ~8% | Improve validation |
| Resource Utilization | 70-85% | ~65% | Better sizing |
| Cost per Resource | Minimize | Baseline | Optimize |
Improvement Opportunities¶
- Resource templates: Pre-configured common setups
- Smart sizing: ML-based resource recommendations
- Cost analytics: Real-time cost tracking
- Auto-cleanup: Remove unused resources
Value Stream 3: Incident Response¶
Overview¶
Trigger: Application error or outage detected
Outcome: Service restored, root cause identified
Frequency: 1-10 times per week (varies)
Critical Success Factor: Mean time to resolution (MTTR)
Current State Map¶
graph LR
A[Issue Occurs] -->|Detect| B[Alert Triggered]
B -->|Notify| C[Team Notified]
C -->|Investigate| D[Log Analysis]
D -->|Identify| E[Root Cause]
E -->|Fix| F[Deploy Fix]
F -->|Verify| G[Resolved]
style A fill:#ffccbc
style G fill:#c8e6c9
Detailed Steps¶
Step 1: Issue Detection¶
- Actor: Monitoring system
- Actions:
- Health check fails
- Error rate spike detected
- Resource exhaustion
- Duration: 30 seconds - 5 minutes (detection lag)
- Value-Add: Yes (detection)
- Bottleneck: Alert delay
Step 2: Alert Generation¶
- Actor: Alerting system
- Actions:
- Evaluate alert rules
- Determine severity
- Route to appropriate channel
- Duration: 5-30 seconds
- Value-Add: Yes (notification)
Step 3: Team Notification¶
- Actor: Notification system
- Actions:
- Send email/Slack/webhook
- Page on-call (critical issues)
- Duration: 1-5 minutes (includes human response time)
- Value-Add: No (waiting for human)
- Bottleneck: Human availability
Step 4: Initial Triage¶
- Actor: On-call engineer
- Actions:
- Acknowledge alert
- Assess severity
- Determine if escalation needed
- Duration: 2-10 minutes
- Value-Add: Yes (assessment)
Step 5: Investigation¶
- Actor: Engineer
- Actions:
- Review logs (Dokploy log viewer)
- Check metrics (Grafana dashboards)
- Review recent changes (deployment history)
- Check resource utilization
- Duration: 5-30 minutes
- Value-Add: Yes (diagnosis)
- Bottleneck: Log accessibility, tool switching
Step 6: Root Cause Identification¶
- Actor: Engineer
- Actions:
- Correlate symptoms
- Identify root cause
- Determine fix strategy
- Duration: 5-60 minutes (highly variable)
- Value-Add: Yes (diagnosis)
Step 7: Remediation¶
- Actor: Engineer
- Actions:
- Immediate: Rollback, restart, scale up
- Short-term: Config change, hotfix deployment
- Long-term: Code fix, architecture change
- Duration: 1-30 minutes (immediate), hours-days (long-term)
- Value-Add: Yes (resolution)
Step 8: Verification¶
- Actor: Engineer
- Actions:
- Verify metrics recovered
- Check error rates
- Confirm user impact resolved
- Duration: 5-15 minutes
- Value-Add: Yes (verification)
Step 9: Post-Mortem¶
- Actor: Team
- Actions:
- Document incident
- Identify prevention measures
- Create follow-up tasks
- Duration: 30-60 minutes
- Value-Add: Yes (learning)
Metrics¶
| Metric | Target | Current | Gap |
|---|---|---|---|
| MTTD (Mean Time to Detect) | <2 minutes | 1-5 minutes | Improve monitoring |
| MTTR (Mean Time to Resolve) | <30 minutes | 15-120 minutes | Varies widely |
| False Positive Rate | <10% | ~20% | Tune alerts |
| Repeat Incidents | <5% | ~12% | Better root cause analysis |
Improvement Opportunities¶
v1.5 (Quick Wins)¶
- One-click rollback: Reduce resolution time by 50%
- Integrated log viewer: Eliminate tool switching
- Smart alerts: Reduce false positives
- Runbooks: Guided troubleshooting
v2.0 (Medium Term)¶
- AIOps: Anomaly detection, predictive alerts
- Auto-remediation: Automatic scaling, restarts
- Correlation engine: Link related events
- Incident timeline: Automatic chronology
v3.0 (Long Term)¶
- Self-healing: Automatic issue resolution
- Chaos engineering: Proactive resilience testing
- AI assistant: Guided troubleshooting
Value Stream 4: User Onboarding¶
Overview¶
Trigger: New user signs up
Outcome: User successfully deploys first application
Frequency: Varies by growth (100s-1000s per month at scale)
Critical Success Factor: Time to first value
Current State Map¶
graph LR
A[Sign Up] -->|Create| B[Account Created]
B -->|Setup| C[Connect Git]
C -->|Configure| D[Create App]
D -->|Deploy| E[First Deployment]
E -->|Verify| F[App Running]
F -->|Use| G[Productive User]
style A fill:#e1f5ff
style G fill:#c8e6c9
Detailed Steps¶
Step 1: Discovery & Sign Up¶
- Actor: Potential user
- Actions:
- Find Dokploy (search, referral, etc.)
- Visit website
- Read documentation
- Decide to try
- Sign up (email, OAuth)
- Duration: 5-60 minutes
- Value-Add: Yes (discovery)
- Bottleneck: Documentation clarity
Step 2: Initial Setup¶
- Actor: New user
- Actions:
- Complete registration
- Verify email
- Set password
- Configure profile
- Duration: 2-5 minutes
- Value-Add: No (necessary friction)
Step 3: Environment Setup¶
- Actor: User
- Actions:
- Install Dokploy (if self-hosted)
- Or provision server
- Configure DNS
- Set up TLS
- Duration: 10-60 minutes
- Value-Add: Yes (preparation)
- Bottleneck: Technical complexity
Step 4: First Project Creation¶
- Actor: User
- Actions:
- Create project
- Invite team members (optional)
- Set project settings
- Duration: 2-5 minutes
- Value-Add: Yes (organization)
Step 5: Connect Git Repository¶
- Actor: User
- Actions:
- Authenticate with Git provider
- Select repository
- Configure webhook
- Duration: 3-10 minutes
- Value-Add: Yes (integration)
- Bottleneck: OAuth complexity
Step 6: Application Configuration¶
- Actor: User
- Actions:
- Detect build settings (auto or manual)
- Configure environment variables
- Set resource limits
- Configure domain
- Duration: 5-20 minutes
- Value-Add: Yes (configuration)
- Bottleneck: Too many options, unclear defaults
Step 7: First Deployment¶
- Actor: User + Dokploy
- Actions:
- Trigger deployment
- Watch build logs
- Wait for completion
- Duration: 2-10 minutes
- Value-Add: Yes (deployment)
- Bottleneck: Build time, unclear errors
Step 8: Verification & Success¶
- Actor: User
- Actions:
- Visit deployed application
- Verify it works
- Celebrate! 🎉
- Duration: 1-5 minutes
- Value-Add: Yes (verification)
Step 9: Explore & Learn¶
- Actor: User
- Actions:
- Explore other features
- Read advanced docs
- Join community
- Duration: Ongoing
- Value-Add: Yes (education)
Metrics¶
| Metric | Target | Current | Gap |
|---|---|---|---|
| Time to First Deployment | <30 minutes | 45-90 minutes | Simplify setup |
| Setup Success Rate | >80% | ~65% | Reduce friction |
| Activation Rate (deploy within 7 days) | >70% | ~55% | Improve onboarding |
| Retention (active after 30 days) | >60% | ~45% | Prove value faster |
Improvement Opportunities¶
v1.0 (Launch)¶
- Quick start guide: 5-minute deployment tutorial
- Sample applications: Pre-configured examples
- One-click deploy: Deploy from template
- Better error messages: Help users fix issues
v1.5 (Enhanced)¶
- Interactive tutorial: In-app guided setup
- Video walkthroughs: Visual learning
- Auto-detection: Detect framework, auto-configure
- Hosted demo: Try without installing
v2.0 (Advanced)¶
- AI setup assistant: Conversational setup
- Instant preview: Deploy to preview environment
- Migration tools: Import from Heroku, Vercel
- Onboarding analytics: Identify drop-off points
Value Stream 5: Feature Development¶
Overview¶
Trigger: User need or strategic initiative
Outcome: Feature deployed and adopted by users
Frequency: Continuous (sprints)
Critical Success Factor: Time to market, user adoption
Current State Map¶
graph LR
A[Idea/Need] -->|Validate| B[Requirement]
B -->|Design| C[Architecture]
C -->|Develop| D[Code]
D -->|Test| E[QA]
E -->|Deploy| F[Production]
F -->|Measure| G[Adoption]
style A fill:#e1f5ff
style G fill:#c8e6c9
Detailed Steps¶
Step 1: Ideation & Validation¶
- Actor: Product team + users
- Actions:
- Gather user feedback
- Analyze usage data
- Identify pain points
- Prioritize features
- Duration: 1-7 days
- Value-Add: Yes (validation)
Step 2: Requirements Definition¶
- Actor: Product manager
- Actions:
- Write user stories
- Define acceptance criteria
- Create mockups
- Review with stakeholders
- Duration: 1-3 days
- Value-Add: Yes (definition)
Step 3: Architecture & Design¶
- Actor: Architects + engineers
- Actions:
- Design solution
- Create technical spec
- Review alternatives
- Get approval
- Duration: 1-5 days
- Value-Add: Yes (planning)
Step 4: Development¶
- Actor: Engineers
- Actions:
- Write code
- Local testing
- Code review
- Merge to main
- Duration: 2-10 days
- Value-Add: Yes (building)
Step 5: Testing¶
- Actor: QA + engineers
- Actions:
- Unit tests
- Integration tests
- Manual testing
- Security scanning
- Duration: 1-3 days
- Value-Add: Yes (quality)
Step 6: Documentation¶
- Actor: Technical writer + engineers
- Actions:
- Update user docs
- Create API docs
- Write release notes
- Update examples
- Duration: 1-2 days
- Value-Add: Yes (enablement)
Step 7: Release¶
- Actor: Release manager
- Actions:
- Deploy to staging
- Smoke testing
- Deploy to production
- Monitor for issues
- Duration: 2-4 hours
- Value-Add: Yes (delivery)
Step 8: Announcement¶
- Actor: Marketing + product
- Actions:
- Write blog post
- Social media announcement
- Email newsletter
- Update website
- Duration: 1-2 days
- Value-Add: Yes (awareness)
Step 9: Adoption & Feedback¶
- Actor: Users + team
- Actions:
- Users try feature
- Collect feedback
- Monitor usage metrics
- Iterate based on learnings
- Duration: Ongoing (2-4 weeks active monitoring)
- Value-Add: Yes (learning)
Metrics¶
| Metric | Target | Current | Gap |
|---|---|---|---|
| Lead Time (idea to production) | <14 days | 14-30 days | Streamline process |
| Cycle Time (development to production) | <7 days | 5-14 days | ✅ |
| Feature Adoption (used within 30 days) | >40% | ~30% | Better communication |
| User Satisfaction | >4.5/5 | ~4.⅖ | Better quality |
Improvement Opportunities¶
- Feature flags: Gradual rollout, A/B testing
- Telemetry: Automatic usage tracking
- In-app announcements: Notify users of new features
- Feedback loops: In-app feedback collection
Cross-Stream Patterns¶
Pattern 1: Automation¶
Benefit: Reduce manual steps, increase consistency
Implementation: Webhooks, CI/CD, auto-scaling
Impact: 50% reduction in manual tasks
Pattern 2: Observability¶
Benefit: Faster issue detection and resolution
Implementation: Metrics, logs, traces, alerts
Impact: 40% reduction in MTTR
Pattern 3: Self-Service¶
Benefit: Reduce bottlenecks, empower users
Implementation: UI, API, documentation
Impact: 10x increase in capacity
Pattern 4: Validation¶
Benefit: Catch errors early, reduce failures
Implementation: Pre-flight checks, configuration validation
Impact: 60% reduction in failed deployments
Value Stream Optimization Roadmap¶
Phase 1: v1.0 (Foundation)¶
Focus: Core value streams functional - ✅ Basic deployment pipeline - ✅ Manual provisioning - ✅ Basic monitoring - ✅ Documentation
Phase 2: v1.5 (Automation)¶
Focus: Reduce manual work - Git webhooks - Build caching - Alert automation - Quick-start templates
Phase 3: v2.0 (Intelligence)¶
Focus: Smart optimization - Auto-scaling - Predictive alerts - Smart caching - AI-powered troubleshooting
Phase 4: v3.0 (Self-Optimization)¶
Focus: Continuous improvement - Self-healing systems - Automatic optimization - Proactive capacity management - AI-driven onboarding
Success Metrics Dashboard¶
Overall Platform Health¶
- Deployment Lead Time: <5 minutes (avg)
- Deployment Success Rate: >95%
- Platform Uptime: >99.9%
- MTTR: <30 minutes
User Experience¶
- Time to First Deployment: <30 minutes
- User Activation Rate: >70%
- User Retention (30 days): >60%
- NPS Score: >40
Efficiency¶
- Process Efficiency: >40%
- Resource Utilization: 70-85%
- Cost per Deployment: Minimize
- Support Ticket Volume: <10% of users/month
Related Documents¶
- Business Capability Model: Capabilities that enable these value streams
- Architecture Principles: Principles guiding optimization decisions
- Stakeholder Analysis: Stakeholders impacted by each value stream
- Deployment Diagram: Technical infrastructure supporting value streams
Document Version: 1.0
Last Updated: 2024-12-30
Next Review: 2025-03-30
Reviewed By: Architecture Team, Product Team, Operations Team