Bash Scripting Best Practices for DevOps Engineers

Bash Is a Minefield (But You Can't Avoid It)

Every DevOps engineer writes bash scripts. Deployment scripts, CI/CD glue, cron jobs, bootstrapping automation -- bash is everywhere. The problem is that bash is also a language where a missing quote can delete your entire filesystem, an unset variable silently becomes an empty string, and a failing command in the middle of a pipeline goes unnoticed. Writing reliable bash requires discipline and knowing where the traps are.

This guide covers the practices that separate fragile scripts from production-grade ones: strict mode, proper quoting, modern test syntax, idempotent patterns, ShellCheck, and knowing when bash is the wrong tool for the job.

What Is set -euo pipefail?

Definition: set -euo pipefail is a bash "strict mode" that combines three safety options: -e exits on any command failure, -u treats unset variables as errors, and -o pipefail makes pipelines fail if any command in the pipe fails (not just the last one). Together they catch the most common classes of silent bash failures.

Every bash script should start with this:

#!/usr/bin/env bash
set -euo pipefail

Here's what each flag does:

Flag	Without It	With It
`-e`	Script continues after a failed command	Script exits immediately on failure
`-u`	Unset variables expand to empty string	Unset variables cause an error and exit
`-o pipefail`	Pipeline exits with status of last command	Pipeline exits with status of first failure

# Without -e: this silently continues after the failing command
cd /nonexistent/directory
rm -rf *  # This runs in whatever directory you were in. Dangerous.

# Without -u: typos in variable names silently become empty
echo "Deploying to $DEPLYOMENT_TARGET"  # Empty string, no error

# Without pipefail: pipeline hides the real failure
curl https://example.com/data | jq '.items' | head -5
# If curl fails, jq gets empty input, head succeeds, exit code 0

Watch out: set -e has quirks. Commands in if/while conditions, the left side of &&/||, and commands in ! expressions don't trigger an exit. Also, a function returning non-zero exits the whole script, which trips people up. If a command is expected to fail, use command || true to explicitly suppress the exit.

Quoting: The Single Most Important Habit

Unquoted variables are the source of more bash bugs than anything else. Without quotes, variables undergo word splitting and glob expansion.

# WRONG: if filename has spaces, this breaks
cp $file /backup/

# RIGHT: always double-quote variables
cp "$file" /backup/

# WRONG: glob expansion can match files
if [ $status == * ]; then  # Expands * to filenames!

# RIGHT: quotes prevent expansion
if [ "$status" == "*" ]; then

# WRONG: unquoted command substitution
files=$(ls /tmp)
for f in $files; do  # Breaks on files with spaces

# RIGHT: use a glob instead
for f in /tmp/*; do
  echo "$f"
done

When to use single vs double quotes in bash

Double quotes allow variable expansion and command substitution while preventing word splitting and globbing -- they're what you want 95% of the time. Single quotes prevent all expansion -- use them for literal strings where you don't want any variable or special character interpretation. Dollar-single-quotes ($'...') allow escape sequences like \n and \t.

# Double quotes: variable expands
echo "Hello $USER"    # Hello abhi

# Single quotes: literal string
echo 'Hello $USER'    # Hello $USER

# Dollar-single quotes: escape sequences work
echo $'first\nsecond'  # Prints on two lines

Modern Test Syntax: [[ ]] vs [ ]

Feature	`[ ]` (test)	`[[ ]]` (bash keyword)
Word splitting on variables	Yes (must quote)	No (safe unquoted)
Glob expansion	Yes (dangerous)	No
Pattern matching	No	Yes (`==` with globs)
Regex matching	No	Yes (`=~`)
Logical operators	`-a`, `-o` (deprecated)	`&&`, `\|\|`
POSIX compliant	Yes	No (bash/zsh only)

# Prefer [[ ]] in bash scripts
if [[ -z "$variable" ]]; then
  echo "variable is empty"
fi

# Pattern matching (not available with [ ])
if [[ "$filename" == *.tar.gz ]]; then
  echo "It's a tarball"
fi

# Regex matching
if [[ "$email" =~ ^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+.[a-zA-Z]{2,}$ ]]; then
  echo "Valid email format"
fi

# Safe without quotes (but still quote by habit)
if [[ -n $possibly_empty ]]; then
  echo "not empty"
fi

Writing Idempotent Scripts

An idempotent script produces the same result whether you run it once or ten times. This is critical for deployment scripts, provisioning, and any automation that might get re-run.

#!/usr/bin/env bash
set -euo pipefail

# Idempotent directory creation
mkdir -p /opt/myapp/logs

# Idempotent user creation
if ! id -u myapp >/dev/null 2>&1; then
  useradd -r -s /bin/false myapp
fi

# Idempotent symlink
ln -sfn /opt/myapp/current /opt/myapp/active

# Idempotent package install (Debian)
dpkg -l nginx 2>/dev/null | grep -q '^ii' || apt-get install -y nginx

# Idempotent config line addition
grep -qF 'net.core.somaxconn = 65535' /etc/sysctl.conf \
  || echo 'net.core.somaxconn = 65535' >> /etc/sysctl.conf

# Idempotent systemd enable
systemctl is-enabled --quiet nginx || systemctl enable nginx

Pro tip: Test idempotency by running your script twice in a row. If it fails or changes something on the second run that it shouldn't, it's not idempotent. Common offenders: mkdir without -p, useradd without an existence check, and appending to config files without checking if the line exists.

Functions and Error Handling

#!/usr/bin/env bash
set -euo pipefail

# Use functions for reusable logic
log() {
  echo "[$(date +'%Y-%m-%d %H:%M:%S')] $*" >&2
}

die() {
  log "ERROR: $*"
  exit 1
}

# Cleanup trap -- runs on exit regardless of success/failure
cleanup() {
  log "Cleaning up temp files..."
  rm -rf "$TMPDIR"
}
trap cleanup EXIT

# Create temp directory safely
TMPDIR=$(mktemp -d) || die "Failed to create temp directory"

# Validate required environment variables
: "${DATABASE_URL:?DATABASE_URL must be set}"
: "${DEPLOY_ENV:?DEPLOY_ENV must be set}"

# Main logic
main() {
  log "Starting deployment to $DEPLOY_ENV"

  # Download artifact
  curl -fsSL "https://artifacts.example.com/app.tar.gz" \
    -o "$TMPDIR/app.tar.gz" \
    || die "Failed to download artifact"

  # Extract
  tar xzf "$TMPDIR/app.tar.gz" -C /opt/myapp/releases/new \
    || die "Failed to extract artifact"

  # Switch symlink atomically
  ln -sfn /opt/myapp/releases/new /opt/myapp/current

  log "Deployment complete"
}

main "$@"

Common Pitfalls

Word Splitting

# BUG: word splitting breaks on spaces
files="file one.txt file two.txt"
for f in $files; do  # Iterates 5 times, not 2
  echo "$f"
done

# FIX: use arrays
files=("file one.txt" "file two.txt")
for f in "${files[@]}"; do  # Iterates 2 times
  echo "$f"
done

Subshell Variable Scoping

# BUG: pipe creates a subshell; variable change is lost
count=0
echo "a b c" | while read -r word; do
  count=$((count + 1))
done
echo "$count"  # Prints 0, not 3!

# FIX: use process substitution or here-string
count=0
while read -r word; do
  count=$((count + 1))
done <<< "a b c"
echo "$count"  # Prints 1 (one line)

# FIX: for multiple lines, use process substitution
count=0
while read -r line; do
  count=$((count + 1))
done < <(printf 'a\nb\nc\n')
echo "$count"  # Prints 3

Globbing Surprises

# BUG: if /tmp has no .log files, the glob is literal
for f in /tmp/*.log; do
  echo "Processing $f"  # Prints "Processing /tmp/*.log"
done

# FIX: use nullglob
shopt -s nullglob
for f in /tmp/*.log; do
  echo "Processing $f"  # Loop body doesn't execute if no matches
done

ShellCheck: Your Bash Linter

ShellCheck is a static analysis tool that catches common bash mistakes. Install it and integrate it into your CI pipeline.

# Install
apt-get install shellcheck  # Debian/Ubuntu
brew install shellcheck     # macOS

# Run on a script
shellcheck deploy.sh

# Run on all scripts in a directory
find scripts/ -name '*.sh' -print0 | xargs -0 shellcheck

# Integrate with CI (GitHub Actions example)
# - name: ShellCheck
#   uses: ludeeus/action-shellcheck@master
#   with:
#     scandir: './scripts'

ShellCheck catches quoting issues, deprecated syntax, common logical errors, and suggests safer alternatives. It's opinionated and almost always right.

When to Stop Using Bash

Use Bash When	Use Python/Go When
Gluing CLI tools together	Complex data structures needed
Script is under 100 lines	Script exceeds 200 lines
No complex error handling needed	Error handling matters (API calls, retries)
Running on minimal systems (containers)	JSON/YAML parsing required
Simple file operations	Concurrent operations needed
Environment setup / dotfiles	Unit testing is important

The threshold is around 100-200 lines. Beyond that, bash's lack of proper data structures, error handling, and testing infrastructure becomes a liability. If you're parsing JSON with jq pipes inside bash, you've probably crossed the line.

DevOps Tooling and CI/CD Platform Costs

Bash scripts often run inside CI/CD pipelines. Here's what the platforms cost:

Platform	Free Tier	Paid Starting At	Notes
GitHub Actions	2,000 min/month	$0.008/min (Linux)	Largest marketplace of actions
GitLab CI	400 min/month	$10/user/month (Premium)	Built into GitLab, no separate product
CircleCI	6,000 min/month	$15/user/month	Fast, good caching
Buildkite	Free (self-hosted agents)	$15/user/month (hosted)	Self-hosted agents, no vendor lock-in
Jenkins	Free (OSS)	Self-hosted infra costs	Maximum flexibility, maximum maintenance

Frequently Asked Questions

What does set -euo pipefail do in bash?

set -e exits the script when any command fails. set -u treats references to unset variables as errors. set -o pipefail makes a pipeline return the exit code of the first failing command instead of the last command. Together, they form "strict mode" and catch the majority of silent bash failures that cause data loss or incorrect behavior.

Why should I always quote variables in bash?

Unquoted variables undergo word splitting (spaces in values become separate arguments) and glob expansion (values containing * or ? match filenames). Both behaviors cause subtle, dangerous bugs. The classic example: rm -rf $DIR/ with an empty $DIR becomes rm -rf /. Double-quoting ("$DIR") prevents both word splitting and globbing.

What is the difference between [[ ]] and [ ] in bash?

[ ] is the POSIX test command -- an external binary that follows standard command parsing. [[ ]] is a bash keyword with special parsing: it doesn't word-split variables, supports pattern matching (== *.txt) and regex (=~), and allows &&/|| inside the brackets. Use [[ ]] in bash scripts; use [ ] only for POSIX sh compatibility.

How do I make a bash script idempotent?

Check for the desired state before making changes. Use mkdir -p instead of mkdir. Check if users exist before useradd. Use grep -q before appending to config files. Use ln -sfn for symlinks. Test by running the script twice -- if the second run changes nothing and exits cleanly, it's idempotent.

What is ShellCheck and should I use it?

ShellCheck is a static analysis tool that finds bugs, style issues, and security problems in bash scripts. It catches unquoted variables, deprecated syntax, unreachable code, and common logical errors. Yes, you should use it -- add it to your editor (VS Code extension available) and CI pipeline. It catches real bugs that experienced scripters miss.

When should I use Python instead of bash?

Switch to Python when your script exceeds 100-200 lines, needs complex data structures (dictionaries, nested objects), requires robust error handling with retries, parses JSON or YAML extensively, needs unit tests, or performs concurrent operations. Bash excels at gluing CLI commands together for short, linear tasks.

How do I handle errors and cleanup in bash?

Use trap cleanup EXIT to register a function that runs when the script exits (success or failure). This is the bash equivalent of try/finally. Create temp files with mktemp and clean them up in the trap. For specific error handling, use command || die "message" where die is a function that logs and exits.

Conclusion

Reliable bash comes down to a few habits: start every script with set -euo pipefail, always double-quote variables, use [[ ]] instead of [ ], write idempotent operations, use trap for cleanup, and run ShellCheck before committing. These practices catch 90% of the bugs that make bash scripts dangerous. And when a script grows past 200 lines or needs real data structures, switch to Python. Bash is a great tool -- but only within its limits.