Skip to content

Benchmarks

All benchmarks run on an AMD Ryzen 9 9900X 12-Core Processor (24 cores, 121 GB RAM), averaged over 100 runs after 10 warmup iterations. Times in milliseconds (lower is better) unless noted otherwise.


How fast each shell can execute shell -c 'true' and exit.

shell -c 'true' — round-trip
dash
0.26ms
bash
0.43ms
sh
0.47ms
lash
0.55ms
zsh
0.56ms
fish
6.5ms
shell -c 'echo x' — round-trip
dash
0.21ms
sh
0.42ms
bash
0.43ms
zsh
0.51ms
lash
0.57ms
fish
6.4ms
shell -c 'echo x | cat' — round-trip
dash
0.61ms
bash
0.74ms
lash
0.75ms
sh
0.77ms
zsh
0.91ms
fish
6.1ms

Raw data throughput through pipes. MB/s charts are higher-is-better.

64 MB single pipe — throughput
lash
8878 MB/s
lash-turbo
8745 MB/s
dash
8325 MB/s
zsh
7566 MB/s
bash
7409 MB/s
sh
7073 MB/s
fish
4257 MB/s
1 GB single pipe — throughput
lash-turbo
8669 MB/s
bash
7375 MB/s
sh
7312 MB/s
zsh
7309 MB/s
lash
7240 MB/s
dash
7107 MB/s
fish
6955 MB/s
64 MB through 3 cat stages
lash-turbo
7.4ms
sh
8.3ms
bash
8.8ms
lash
8.9ms
zsh
8.9ms
dash
9.3ms
fish
16ms
16 MB streamed to sink
dash
2.3ms
lash-turbo
2.6ms
bash
2.7ms
sh
2.7ms
zsh
2.9ms
lash
2.9ms
fish
8.9ms
echo | cat — command latency
dash
0.46ms
lash-turbo
0.63ms
lash
0.68ms
bash
0.71ms
sh
0.71ms
zsh
1.0ms
fish
6.0ms
echo through 5 cat stages
dash
0.79ms
lash-turbo
0.91ms
lash
0.93ms
bash
1.1ms
sh
1.1ms
zsh
1.4ms
fish
7.0ms
echo through 10 cat stages
dash
0.98ms
lash-turbo
1.1ms
bash
1.2ms
lash
1.3ms
sh
1.4ms
zsh
1.9ms
fish
8.1ms
16 MB — 1 cat stage
lash-turbo
2.6ms
dash
2.7ms
sh
2.9ms
bash
3.0ms
zsh
3.0ms
lash
3.1ms
fish
9.5ms
16 MB — 4 cat stages
lash-turbo
2.3ms
dash
3.4ms
sh
3.6ms
bash
3.8ms
lash
3.8ms
zsh
4.0ms
fish
11ms
16 MB — 8 cat stages
lash-turbo
2.4ms
dash
3.6ms
bash
5.2ms
lash
5.3ms
zsh
5.9ms
sh
6.7ms
fish
11ms
16 MB — 16 cat stages
lash-turbo
2.6ms
bash
6.8ms
dash
7.0ms
lash
7.0ms
zsh
8.5ms
sh
8.9ms
fish
14ms
16 MB write to file
sh
2.5ms
zsh
2.6ms
dash
3.1ms
bash
3.1ms
lash
3.1ms
lash-turbo
3.6ms
fish
9.4ms
16 MB pipe to file
lash-turbo
3.9ms
dash
4.5ms
bash
4.5ms
sh
4.5ms
lash
4.6ms
zsh
4.7ms
fish
12ms
16 MB read from file through pipe
lash-turbo
6.9ms
sh
7.0ms
dash
7.2ms
lash
7.5ms
zsh
7.5ms
bash
7.6ms
fish
14ms
16 MB to /dev/null (overhead baseline)
dash
0.56ms
sh
0.82ms
bash
0.84ms
lash-turbo
0.89ms
zsh
0.90ms
lash
0.93ms
fish
6.7ms

Turbo mode rewrites common pipelines into native array operations — no fork/exec overhead. Speedups are turbo vs forked lash.

seq 1M | sort | tail — turbo 6.9x vs forked
lash-turbo
39ms
lash
298ms
fish
300ms
bash
305ms
zsh
307ms
dash
308ms
sh
314ms
sort 100K lines — turbo 3.9x vs forked
lash-turbo
6.9ms
lash
27ms
sh
27ms
zsh
28ms
dash
28ms
bash
29ms
fish
35ms
sort | head from 100K — turbo 7.7x vs forked
lash-turbo
4.0ms
zsh
24ms
lash
24ms
sh
24ms
bash
24ms
dash
24ms
fish
30ms
sort | tail from 100K — turbo 5.2x vs forked
lash-turbo
4.4ms
lash
26ms
sh
26ms
bash
26ms
dash
26ms
zsh
26ms
fish
32ms
grep | sort | head from 100K — turbo 2.6x vs forked
lash-turbo
3.7ms
dash
10.0ms
lash
10ms
sh
10ms
bash
11ms
zsh
11ms
fish
18ms
sort 1M lines reverse numeric — turbo 7.8x vs forked
lash-turbo
40ms
bash
305ms
lash
307ms
fish
313ms
sh
315ms
dash
316ms
zsh
320ms
sort+head+sort+tail from 100K — turbo 6.2x vs forked
sh
39ms
zsh
39ms
bash
41ms
lash
41ms
dash
41ms
fish
47ms
lash-turbo
494ms
5-stage pipeline on 500K — turbo 4.2x vs forked
zsh
101ms
sh
103ms
dash
108ms
lash
109ms
bash
109ms
fish
112ms
lash-turbo
3154ms
generate+sort+uniq+sort from 100K — turbo 6.5x vs forked
lash-turbo
36ms
lash
193ms
bash
194ms
sh
197ms
zsh
203ms
dash
203ms
fish
215ms

Common data-processing patterns across shells.

sort 1K lines — reverse numeric
lash-turbo
0.65ms
dash
0.66ms
lash
0.96ms
bash
1.0ms
sh
1.0ms
zsh
1.2ms
fish
6.9ms
sort 10K lines — reverse numeric
lash-turbo
1.2ms
dash
2.8ms
lash
3.0ms
sh
3.0ms
bash
3.0ms
zsh
3.3ms
fish
10ms
grep filter 1K lines
lash-turbo
0.59ms
dash
0.62ms
bash
0.83ms
lash
0.86ms
sh
0.93ms
zsh
1.3ms
fish
6.9ms
awk filter 10K lines
dash
1.8ms
sh
2.0ms
lash
2.0ms
bash
2.0ms
lash-turbo
2.2ms
zsh
2.3ms
fish
7.7ms
awk map (x2) 10K lines
dash
2.2ms
sh
2.3ms
bash
2.4ms
lash
2.4ms
lash-turbo
2.5ms
zsh
2.6ms
fish
9.7ms
grep pattern in 10K lines
dash
0.64ms
lash-turbo
0.75ms
sh
0.89ms
bash
0.98ms
lash
1.0ms
zsh
1.1ms
fish
6.8ms
100K lines through grep filter
dash
2.0ms
lash
2.1ms
bash
2.2ms
sh
2.3ms
zsh
2.5ms
lash-turbo
3.5ms
fish
11ms
100K small lines through pipe
dash
1.1ms
lash-turbo
1.2ms
lash
1.4ms
bash
1.5ms
sh
1.8ms
zsh
2.0ms
fish
8.9ms
1M small lines through pipe
dash
4.3ms
lash-turbo
4.3ms
sh
4.6ms
zsh
4.8ms
bash
4.8ms
lash
4.9ms
fish
12ms

How fast each shell can fork and exec processes.

single fork+exec (no pipe)
dash
0.23ms
bash
0.47ms
sh
0.51ms
zsh
0.64ms
lash-turbo
0.64ms
lash
0.72ms
fish
6.7ms
2-stage no-op pipe setup
dash
0.41ms
sh
0.67ms
bash
0.68ms
lash-turbo
0.71ms
lash
0.74ms
zsh
0.77ms
fish
6.9ms
5-stage no-op pipe setup
dash
0.54ms
zsh
0.82ms
sh
0.84ms
bash
0.85ms
lash-turbo
0.91ms
lash
0.98ms
fish
6.8ms
10-stage no-op pipe setup
dash
0.69ms
bash
1.0ms
sh
1.1ms
lash-turbo
1.1ms
lash
1.1ms
zsh
1.5ms
fish
6.4ms

Turbo mode applies these optimizations automatically:

  • Passthrough stripping — removes identity operations so they never execute
  • Numeric sort key pre-computation — pre-computes keys in O(N) instead of parsing inside the comparator at O(N log N)
  • Streaming wc -l — counts newlines in the byte stream without collecting lines
  • C strtod for numeric conversion — calls C’s strtod directly, avoiding D’s to!double exception overhead
  • Fused operationsgrep | head, grep | tail, and grep | wc run in a single pass over the data
Terminal window
dub run :benchmarks
FlagDescription
--runs NNumber of iterations per scenario
--warmup NWarmup iterations before measurement
--scenario SRun only the named scenario
--jsonOutput results in JSON format
--verbosePrint per-iteration timings

To reproduce these numbers:

Terminal window
dub run :benchmarks -- --runs 100 --warmup 10 --verbose