12 min read
Dillon Browne

CI/CD Orchestration Beyond Bash Scripts

Learn when bash scripts become technical debt in CI/CD pipelines. Discover orchestration patterns for complex deployments with Argo, Temporal, and more.

cicd devops orchestration automation infrastructure
CI/CD Orchestration Beyond Bash Scripts

The Bash Script Trap

I’ve been guilty of this more times than I’d like to admit. A deployment starts simple: clone the repo, run a few commands, deploy. A single bash script handles it all. Six months later, that script is 800 lines of conditional logic, error handling that only catches 60% of failures, and nobody wants to touch it.

The problem isn’t bash itself—it’s using bash for CI/CD orchestration when you need proper workflow management. In my experience working with multi-region Kubernetes deployments and complex CI/CD pipelines, I’ve learned the hard way when simple scripts become technical debt and proper orchestration becomes essential.

When Scripts Stop Scaling

The breaking point usually happens when your deployment needs any of these:

Parallel execution with dependencies: You need to deploy to three regions simultaneously, but only after the database migration succeeds. Bash can do this with background jobs and wait commands, but error handling becomes a nightmare.

Retry logic with exponential backoff: A flaky integration test fails intermittently. Your bash script retries, but implementing exponential backoff, jitter, and circuit breakers in bash is painful and error-prone.

Dynamic workflow based on runtime conditions: Deploy strategy changes based on feature flags, environment health checks, or canary metrics. Bash conditionals work, but they’re hard to test and maintain.

Observability and debugging: When a deployment fails at 3 AM, you need structured logs, execution traces, and the ability to replay specific steps. Bash scripts dump to stdout with inconsistent formatting.

I hit this wall on a project where we were deploying microservices across AWS, Azure, and GCP. Our bash-based deployment script grew to handle region-specific logic, cloud provider differences, and complex rollback scenarios. It worked—until it didn’t.

Adopt Workflow Orchestration Thinking

True orchestration tools treat workflows as data structures, not shell commands. This fundamental shift enables capabilities that are difficult or impossible with bash:

Build Workflows with DAGs

Instead of sequential or parallel execution, workflows become graphs where nodes represent tasks and edges represent dependencies. This makes complex workflows explicit and verifiable.

Here’s what this looks like with Argo Workflows (Kubernetes-native):

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: multi-region-deploy-
spec:
  entrypoint: deploy-app
  templates:
  - name: deploy-app
    dag:
      tasks:
      - name: run-migrations
        template: db-migrate
      
      - name: deploy-us-east
        dependencies: [run-migrations]
        template: deploy-region
        arguments:
          parameters:
          - name: region
            value: "us-east-1"
      
      - name: deploy-eu-west
        dependencies: [run-migrations]
        template: deploy-region
        arguments:
          parameters:
          - name: region
            value: "eu-west-1"
      
      - name: smoke-tests
        dependencies: [deploy-us-east, deploy-eu-west]
        template: run-tests
      
      - name: update-dns
        dependencies: [smoke-tests]
        template: dns-update

  - name: deploy-region
    inputs:
      parameters:
      - name: region
    retryStrategy:
      limit: 3
      retryPolicy: "Always"
      backoff:
        duration: "1m"
        factor: 2
        maxDuration: "10m"
    container:
      image: deployment-image:latest
      command: ["/deploy.sh"]
      args: ["{{inputs.parameters.region}}"]

This workflow makes dependencies explicit, handles retries declaratively, and provides a clear execution graph. The equivalent bash script would need complex background job management and state tracking.

Implement Proven Orchestration Patterns

Pattern 1: State Management for Idempotency

Orchestrators maintain execution state, making workflows resumable. If a deployment fails halfway through, you can retry from the failed step—not from the beginning.

I implemented this pattern using Temporal for a fintech application where regulatory requirements demanded exact deployment reproducibility. Here’s the workflow structure:

import asyncio
from temporalio import workflow
from temporalio.common import RetryPolicy
from datetime import timedelta

@workflow.defn
class DeploymentWorkflow:
    @workflow.run
    async def run(self, deployment_config: dict) -> str:
        # Database migration - critical path
        migration_result = await workflow.execute_activity(
            run_database_migration,
            deployment_config["db"],
            start_to_close_timeout=timedelta(minutes=30),
            retry_policy=RetryPolicy(
                maximum_attempts=3,
                initial_interval=timedelta(seconds=30),
                maximum_interval=timedelta(minutes=5),
            )
        )
        
        # Parallel regional deployments
        deploy_tasks = []
        for region in deployment_config["regions"]:
            task = workflow.execute_activity(
                deploy_to_region,
                region,
                start_to_close_timeout=timedelta(minutes=15)
            )
            deploy_tasks.append(task)
        
        # Wait for all deployments
        results = await asyncio.gather(*deploy_tasks)
        
        # Health checks before DNS cutover
        health_ok = await workflow.execute_activity(
            check_deployment_health,
            results,
            start_to_close_timeout=timedelta(minutes=5)
        )
        
        if not health_ok:
            # Automatic rollback
            await workflow.execute_activity(
                rollback_deployment,
                results
            )
            raise Exception("Health checks failed, rolled back")
        
        # Final DNS update
        return await workflow.execute_activity(
            update_dns_records,
            deployment_config["dns"]
        )

Temporal maintains workflow history, so if deploy_to_region fails for eu-west-1 but succeeds for us-east-1, the retry only redeploys to eu-west-1. With bash, you’d need external state management—usually a database or files that add complexity and failure modes.

Pattern 2: Long-Running Workflows with Human Gates

Some deployments need approval steps or wait for external events. Orchestrators handle these natively with workflow signals and timers.

In my work with healthcare infrastructure, we needed compliance approvals before production deployments. The orchestrator paused the workflow, sent notifications, and resumed only after explicit approval:

@workflow.defn
class ComplianceDeploymentWorkflow:
    def __init__(self):
        self.approval_received = False
    
    @workflow.run
    async def run(self, config: dict) -> str:
        # Deploy to staging
        staging_result = await workflow.execute_activity(
            deploy_to_staging,
            config
        )
        
        # Run automated compliance checks
        compliance_passed = await workflow.execute_activity(
            run_compliance_tests,
            staging_result
        )
        
        if not compliance_passed:
            raise Exception("Compliance tests failed")
        
        # Request human approval
        await workflow.execute_activity(
            send_approval_request,
            staging_result
        )
        
        # Wait for approval signal (or timeout after 24 hours)
        await workflow.wait_condition(
            lambda: self.approval_received,
            timeout=timedelta(hours=24)
        )
        
        # Proceed with production deployment
        return await workflow.execute_activity(
            deploy_to_production,
            config
        )
    
    @workflow.signal
    def approve_deployment(self):
        self.approval_received = True

This pattern is nearly impossible to implement cleanly with bash. You’d need to persist workflow state, poll for approval, and handle timeout scenarios—all while maintaining idempotency.

Pattern 3: Observability and Debugging

Production orchestrators provide structured execution histories, making post-mortem analysis straightforward. When a deployment fails, you get:

  • Complete execution timeline with step durations
  • Input/output data for each step
  • Retry attempts and failure reasons
  • Ability to replay workflows with different parameters

I use this extensively with Argo Workflows. After a failed deployment, I can inspect the workflow execution:

# Get workflow status
argo get multi-region-deploy-xyz123

# View detailed logs for a specific workflow step
argo logs multi-region-deploy-xyz123 -n deploy-us-east -c main

# Get execution timeline
argo get multi-region-deploy-xyz123 -o json | jq '.status.nodes'

# Retry failed workflow from last checkpoint
argo retry multi-region-deploy-xyz123

The structured output includes timestamps, resource usage, and step dependencies—critical for understanding what went wrong and when.

Select Your Orchestration Tool

Not every CI/CD pipeline needs orchestration. Here’s my decision framework:

Stick with bash if:

  • Deployment has fewer than 10 steps
  • No parallel execution or complex dependencies
  • Execution time under 10 minutes
  • Single environment or simple multi-region replication
  • Team is comfortable maintaining shell scripts

Consider orchestration when:

  • Workflows have complex DAG structures
  • Need retry logic, timeouts, or circuit breakers
  • Long-running workflows (over 30 minutes)
  • Require audit trails and compliance reporting
  • Multiple teams contribute to deployment logic
  • Need to pause workflows for human approval

For Kubernetes-native environments, I reach for Argo Workflows or Tekton. They integrate naturally with existing cluster infrastructure and don’t require additional services.

For language-agnostic orchestration with strong durability guarantees, Temporal is my choice. It handles complex state management and provides excellent visibility into workflow execution.

For cloud-specific deployments, provider-native tools work well: AWS Step Functions for AWS, Azure Durable Functions for Azure. They integrate with cloud services and don’t require infrastructure management.

Migrate From Bash to Orchestration

Moving from bash to orchestration doesn’t have to be all-or-nothing. I’ve successfully used this incremental approach:

Phase 1: Orchestrate the Orchestrator

Keep existing bash scripts but wrap them in an orchestrator that handles high-level workflow:

# Argo Workflow calling existing scripts
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: legacy-deploy-
spec:
  entrypoint: deploy
  templates:
  - name: deploy
    steps:
    - - name: migrate-db
        template: run-script
        arguments:
          parameters:
          - name: script
            value: "./scripts/migrate-db.sh"
    
    - - name: deploy-app
        template: run-script
        arguments:
          parameters:
          - name: script
            value: "./scripts/deploy-app.sh"
  
  - name: run-script
    inputs:
      parameters:
      - name: script
    script:
      image: deployment-image:latest
      command: [bash]
      source: |
        bash {{inputs.parameters.script}}

This gives you workflow observability and dependency management while keeping existing deployment logic intact.

Phase 2: Extract Critical Paths

Identify the most complex or failure-prone parts of your bash scripts and rewrite them as orchestrator activities. Database migrations, health checks, and rollback logic are good candidates.

Phase 3: Standardize Patterns

As you migrate more logic, patterns emerge. Create reusable workflow templates that teams can compose for their specific needs.

The Real Cost of Orchestration

Orchestration adds operational complexity. You’re trading bash script sprawl for workflow engine management. Is it worth it?

In my experience, the crossover point is around 50-100 deployments per week or when debugging deployment failures takes more than 30 minutes on average. Below that threshold, the operational overhead of running an orchestrator might outweigh the benefits.

For a startup with a dozen services deploying a few times per day, bash scripts with good error handling are often sufficient. For a platform team managing hundreds of microservices with complex deployment dependencies, orchestration becomes essential infrastructure.

The key is recognizing when you’ve crossed that threshold—usually when you start avoiding deployments because the process is too fragile, or when deployment failures create multi-hour debugging sessions.

Practical Lessons

After migrating several teams from bash-based deployments to orchestrated workflows, here’s what I’ve learned:

Start simple: Don’t over-engineer. If a bash script works reliably, leave it. Orchestrate when you feel the pain of complexity, not preemptively.

Observability first: The primary value of orchestration is visibility into workflow execution. If your orchestrator doesn’t provide clear execution histories and debugging tools, you’ve gained nothing.

Idempotency matters more than speed: Design activities to be safely retryable. A deployment that takes 20 minutes but can resume from any failure point is better than a 5-minute deployment that has to restart completely on errors.

Test the unhappy path: Orchestration shines during failures. Test timeout scenarios, partial failures, and rollback logic. Your confidence in the system should come from knowing it handles failures gracefully.

Document workflow patterns: Teams need to understand common patterns—parallel execution, conditional logic, retry strategies. Invest in templates and examples.

The transition from bash scripts to CI/CD orchestration represents a maturity threshold in deployment automation. You’re not just running commands anymore—you’re managing complex state machines with error handling, observability, and reproducibility requirements that bash wasn’t designed to handle.

When you find yourself adding the third level of nested conditionals to your deployment script, or when debugging a failed deployment requires parsing through thousands of lines of unstructured logs, it’s time to implement proper orchestration. Your future self will thank you.

Found this helpful? Share it with others: