Bash Scripting Best Practices for DevOps Engineers
Write reliable bash scripts with set -euo pipefail, proper quoting, [[ ]] tests, idempotent patterns, cleanup traps, ShellCheck, and knowing when to switch to Python.
Infrastructure engineer with 10+ years building production systems on AWS, GCP,…

Bash Is a Minefield (But You Can't Avoid It)
Every DevOps engineer writes bash scripts. Deployment scripts, CI/CD glue, cron jobs, bootstrapping automation -- bash is everywhere. The problem is that bash is also a language where a missing quote can delete your entire filesystem, an unset variable silently becomes an empty string, and a failing command in the middle of a pipeline goes unnoticed. Writing reliable bash requires discipline and knowing where the traps are.
This guide covers the practices that separate fragile scripts from production-grade ones: strict mode, proper quoting, modern test syntax, idempotent patterns, ShellCheck, and knowing when bash is the wrong tool for the job.
What Is set -euo pipefail?
Definition:
set -euo pipefailis a bash "strict mode" that combines three safety options:-eexits on any command failure,-utreats unset variables as errors, and-o pipefailmakes pipelines fail if any command in the pipe fails (not just the last one). Together they catch the most common classes of silent bash failures.
Every bash script should start with this:
#!/usr/bin/env bash
set -euo pipefail
Here's what each flag does:
| Flag | Without It | With It |
|---|---|---|
-e | Script continues after a failed command | Script exits immediately on failure |
-u | Unset variables expand to empty string | Unset variables cause an error and exit |
-o pipefail | Pipeline exits with status of last command | Pipeline exits with status of first failure |
# Without -e: this silently continues after the failing command
cd /nonexistent/directory
rm -rf * # This runs in whatever directory you were in. Dangerous.
# Without -u: typos in variable names silently become empty
echo "Deploying to $DEPLYOMENT_TARGET" # Empty string, no error
# Without pipefail: pipeline hides the real failure
curl https://example.com/data | jq '.items' | head -5
# If curl fails, jq gets empty input, head succeeds, exit code 0
Watch out:
set -ehas quirks. Commands in if/while conditions, the left side of&&/||, and commands in!expressions don't trigger an exit. Also, a function returning non-zero exits the whole script, which trips people up. If a command is expected to fail, usecommand || trueto explicitly suppress the exit.
Quoting: The Single Most Important Habit
Unquoted variables are the source of more bash bugs than anything else. Without quotes, variables undergo word splitting and glob expansion.
# WRONG: if filename has spaces, this breaks
cp $file /backup/
# RIGHT: always double-quote variables
cp "$file" /backup/
# WRONG: glob expansion can match files
if [ $status == * ]; then # Expands * to filenames!
# RIGHT: quotes prevent expansion
if [ "$status" == "*" ]; then
# WRONG: unquoted command substitution
files=$(ls /tmp)
for f in $files; do # Breaks on files with spaces
# RIGHT: use a glob instead
for f in /tmp/*; do
echo "$f"
done
When to use single vs double quotes in bash
Double quotes allow variable expansion and command substitution while preventing word splitting and globbing -- they're what you want 95% of the time. Single quotes prevent all expansion -- use them for literal strings where you don't want any variable or special character interpretation. Dollar-single-quotes ($'...') allow escape sequences like \n and \t.
# Double quotes: variable expands
echo "Hello $USER" # Hello abhi
# Single quotes: literal string
echo 'Hello $USER' # Hello $USER
# Dollar-single quotes: escape sequences work
echo $'first\nsecond' # Prints on two lines
Modern Test Syntax: [[ ]] vs [ ]
| Feature | [ ] (test) | [[ ]] (bash keyword) |
|---|---|---|
| Word splitting on variables | Yes (must quote) | No (safe unquoted) |
| Glob expansion | Yes (dangerous) | No |
| Pattern matching | No | Yes (== with globs) |
| Regex matching | No | Yes (=~) |
| Logical operators | -a, -o (deprecated) | &&, || |
| POSIX compliant | Yes | No (bash/zsh only) |
# Prefer [[ ]] in bash scripts
if [[ -z "$variable" ]]; then
echo "variable is empty"
fi
# Pattern matching (not available with [ ])
if [[ "$filename" == *.tar.gz ]]; then
echo "It's a tarball"
fi
# Regex matching
if [[ "$email" =~ ^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+.[a-zA-Z]{2,}$ ]]; then
echo "Valid email format"
fi
# Safe without quotes (but still quote by habit)
if [[ -n $possibly_empty ]]; then
echo "not empty"
fi
Writing Idempotent Scripts
An idempotent script produces the same result whether you run it once or ten times. This is critical for deployment scripts, provisioning, and any automation that might get re-run.
#!/usr/bin/env bash
set -euo pipefail
# Idempotent directory creation
mkdir -p /opt/myapp/logs
# Idempotent user creation
if ! id -u myapp >/dev/null 2>&1; then
useradd -r -s /bin/false myapp
fi
# Idempotent symlink
ln -sfn /opt/myapp/current /opt/myapp/active
# Idempotent package install (Debian)
dpkg -l nginx 2>/dev/null | grep -q '^ii' || apt-get install -y nginx
# Idempotent config line addition
grep -qF 'net.core.somaxconn = 65535' /etc/sysctl.conf \
|| echo 'net.core.somaxconn = 65535' >> /etc/sysctl.conf
# Idempotent systemd enable
systemctl is-enabled --quiet nginx || systemctl enable nginx
Pro tip: Test idempotency by running your script twice in a row. If it fails or changes something on the second run that it shouldn't, it's not idempotent. Common offenders:
mkdirwithout-p,useraddwithout an existence check, and appending to config files without checking if the line exists.
Functions and Error Handling
#!/usr/bin/env bash
set -euo pipefail
# Use functions for reusable logic
log() {
echo "[$(date +'%Y-%m-%d %H:%M:%S')] $*" >&2
}
die() {
log "ERROR: $*"
exit 1
}
# Cleanup trap -- runs on exit regardless of success/failure
cleanup() {
log "Cleaning up temp files..."
rm -rf "$TMPDIR"
}
trap cleanup EXIT
# Create temp directory safely
TMPDIR=$(mktemp -d) || die "Failed to create temp directory"
# Validate required environment variables
: "${DATABASE_URL:?DATABASE_URL must be set}"
: "${DEPLOY_ENV:?DEPLOY_ENV must be set}"
# Main logic
main() {
log "Starting deployment to $DEPLOY_ENV"
# Download artifact
curl -fsSL "https://artifacts.example.com/app.tar.gz" \
-o "$TMPDIR/app.tar.gz" \
|| die "Failed to download artifact"
# Extract
tar xzf "$TMPDIR/app.tar.gz" -C /opt/myapp/releases/new \
|| die "Failed to extract artifact"
# Switch symlink atomically
ln -sfn /opt/myapp/releases/new /opt/myapp/current
log "Deployment complete"
}
main "$@"
Common Pitfalls
Word Splitting
# BUG: word splitting breaks on spaces
files="file one.txt file two.txt"
for f in $files; do # Iterates 5 times, not 2
echo "$f"
done
# FIX: use arrays
files=("file one.txt" "file two.txt")
for f in "${files[@]}"; do # Iterates 2 times
echo "$f"
done
Subshell Variable Scoping
# BUG: pipe creates a subshell; variable change is lost
count=0
echo "a b c" | while read -r word; do
count=$((count + 1))
done
echo "$count" # Prints 0, not 3!
# FIX: use process substitution or here-string
count=0
while read -r word; do
count=$((count + 1))
done <<< "a b c"
echo "$count" # Prints 1 (one line)
# FIX: for multiple lines, use process substitution
count=0
while read -r line; do
count=$((count + 1))
done < <(printf 'a\nb\nc\n')
echo "$count" # Prints 3
Globbing Surprises
# BUG: if /tmp has no .log files, the glob is literal
for f in /tmp/*.log; do
echo "Processing $f" # Prints "Processing /tmp/*.log"
done
# FIX: use nullglob
shopt -s nullglob
for f in /tmp/*.log; do
echo "Processing $f" # Loop body doesn't execute if no matches
done
ShellCheck: Your Bash Linter
ShellCheck is a static analysis tool that catches common bash mistakes. Install it and integrate it into your CI pipeline.
# Install
apt-get install shellcheck # Debian/Ubuntu
brew install shellcheck # macOS
# Run on a script
shellcheck deploy.sh
# Run on all scripts in a directory
find scripts/ -name '*.sh' -print0 | xargs -0 shellcheck
# Integrate with CI (GitHub Actions example)
# - name: ShellCheck
# uses: ludeeus/action-shellcheck@master
# with:
# scandir: './scripts'
ShellCheck catches quoting issues, deprecated syntax, common logical errors, and suggests safer alternatives. It's opinionated and almost always right.
When to Stop Using Bash
| Use Bash When | Use Python/Go When |
|---|---|
| Gluing CLI tools together | Complex data structures needed |
| Script is under 100 lines | Script exceeds 200 lines |
| No complex error handling needed | Error handling matters (API calls, retries) |
| Running on minimal systems (containers) | JSON/YAML parsing required |
| Simple file operations | Concurrent operations needed |
| Environment setup / dotfiles | Unit testing is important |
The threshold is around 100-200 lines. Beyond that, bash's lack of proper data structures, error handling, and testing infrastructure becomes a liability. If you're parsing JSON with jq pipes inside bash, you've probably crossed the line.
DevOps Tooling and CI/CD Platform Costs
Bash scripts often run inside CI/CD pipelines. Here's what the platforms cost:
| Platform | Free Tier | Paid Starting At | Notes |
|---|---|---|---|
| GitHub Actions | 2,000 min/month | $0.008/min (Linux) | Largest marketplace of actions |
| GitLab CI | 400 min/month | $10/user/month (Premium) | Built into GitLab, no separate product |
| CircleCI | 6,000 min/month | $15/user/month | Fast, good caching |
| Buildkite | Free (self-hosted agents) | $15/user/month (hosted) | Self-hosted agents, no vendor lock-in |
| Jenkins | Free (OSS) | Self-hosted infra costs | Maximum flexibility, maximum maintenance |
Frequently Asked Questions
What does set -euo pipefail do in bash?
set -e exits the script when any command fails. set -u treats references to unset variables as errors. set -o pipefail makes a pipeline return the exit code of the first failing command instead of the last command. Together, they form "strict mode" and catch the majority of silent bash failures that cause data loss or incorrect behavior.
Why should I always quote variables in bash?
Unquoted variables undergo word splitting (spaces in values become separate arguments) and glob expansion (values containing * or ? match filenames). Both behaviors cause subtle, dangerous bugs. The classic example: rm -rf $DIR/ with an empty $DIR becomes rm -rf /. Double-quoting ("$DIR") prevents both word splitting and globbing.
What is the difference between [[ ]] and [ ] in bash?
[ ] is the POSIX test command -- an external binary that follows standard command parsing. [[ ]] is a bash keyword with special parsing: it doesn't word-split variables, supports pattern matching (== *.txt) and regex (=~), and allows &&/|| inside the brackets. Use [[ ]] in bash scripts; use [ ] only for POSIX sh compatibility.
How do I make a bash script idempotent?
Check for the desired state before making changes. Use mkdir -p instead of mkdir. Check if users exist before useradd. Use grep -q before appending to config files. Use ln -sfn for symlinks. Test by running the script twice -- if the second run changes nothing and exits cleanly, it's idempotent.
What is ShellCheck and should I use it?
ShellCheck is a static analysis tool that finds bugs, style issues, and security problems in bash scripts. It catches unquoted variables, deprecated syntax, unreachable code, and common logical errors. Yes, you should use it -- add it to your editor (VS Code extension available) and CI pipeline. It catches real bugs that experienced scripters miss.
When should I use Python instead of bash?
Switch to Python when your script exceeds 100-200 lines, needs complex data structures (dictionaries, nested objects), requires robust error handling with retries, parses JSON or YAML extensively, needs unit tests, or performs concurrent operations. Bash excels at gluing CLI commands together for short, linear tasks.
How do I handle errors and cleanup in bash?
Use trap cleanup EXIT to register a function that runs when the script exits (success or failure). This is the bash equivalent of try/finally. Create temp files with mktemp and clean them up in the trap. For specific error handling, use command || die "message" where die is a function that logs and exits.
Conclusion
Reliable bash comes down to a few habits: start every script with set -euo pipefail, always double-quote variables, use [[ ]] instead of [ ], write idempotent operations, use trap for cleanup, and run ShellCheck before committing. These practices catch 90% of the bugs that make bash scripts dangerous. And when a script grows past 200 lines or needs real data structures, switch to Python. Bash is a great tool -- but only within its limits.
Written by
Abhishek Patel
Infrastructure engineer with 10+ years building production systems on AWS, GCP, and bare metal. Writes practical guides on cloud architecture, containers, networking, and Linux for developers who want to understand how things actually work under the hood.
Related Articles
Linux File Permissions Explained: chmod, chown, and ACLs
Build the mental model for Linux file permissions from scratch. Learn chmod octal and symbolic notation, chown, umask, setuid/setgid/sticky bits, and POSIX ACLs with real-world scenarios.
12 min read
LinuxThe Linux Networking Stack: From Socket to NIC
Trace a packet through the entire Linux networking stack: socket buffers, the TCP state machine, IP routing, netfilter/iptables, traffic control, and NIC drivers with practical diagnostic tools.
10 min read
LinuxLinux Performance Troubleshooting: A Systematic Approach
Learn the USE method for systematic Linux performance analysis. Master vmstat, iostat, sar, ss, strace, and perf with three real-world troubleshooting scenarios for CPU, memory, and disk bottlenecks.
10 min read
Enjoyed this article?
Get more like this in your inbox. No spam, unsubscribe anytime.