Skip to content

Benchmarks

All benchmarks run on an AMD Ryzen 9 9900X 12-Core Processor (24 cores, 121 GB RAM), averaged over 100 runs after 10 warmup iterations. Times in milliseconds (lower is better) unless noted otherwise.


How fast each shell can execute a command and exit.

shell -c 'true' round-trip
dash
0.24ms
sh
0.42ms
bash
0.43ms
zsh
0.54ms
lash
0.57ms
fish
5.8ms
shell -c 'echo x' round-trip
dash
0.22ms
bash
0.01ms
sh
0.01ms
zsh
0.52ms
lash
0.03ms
fish
0.06ms
shell -c 'echo x | cat' round-trip
dash
0.29ms
lash
0.13ms
sh
0.36ms
bash
0.36ms
zsh
0.34ms
fish
0.29ms

Raw data throughput through pipes. MB/s charts are higher-is-better.

64MB through single pipe, minimal output
lash-turbo
145125 MB/s
bash
9315 MB/s
dash
8847 MB/s
sh
8740 MB/s
lash
8623 MB/s
zsh
8376 MB/s
fish
5264 MB/s
64MB through 3 cat stages
lash-turbo
142222 MB/s
sh
8277 MB/s
dash
8101 MB/s
bash
8072 MB/s
lash
7842 MB/s
zsh
7498 MB/s
fish
4566 MB/s
16MB output streamed to sink
lash-turbo
7027 MB/s
dash
6845 MB/s
lash
6270 MB/s
bash
6091 MB/s
zsh
5835 MB/s
sh
5689 MB/s
fish
1896 MB/s
echo|cat round-trip latency — turbo 1.1x vs forked
dash
0.24ms
lash-turbo
0.08ms
bash
0.30ms
sh
0.30ms
lash
0.11ms
zsh
0.33ms
fish
0.23ms
1GB through single pipe, minimal output
lash-turbo
2255507 MB/s
dash
8423 MB/s
lash
8064 MB/s
fish
7875 MB/s
sh
6602 MB/s
bash
6511 MB/s
zsh
6112 MB/s
seq 1M | sort | tail — turbo 7.6x vs forked
lash-turbo
37ms
zsh
288ms
sh
288ms
lash
289ms
dash
292ms
bash
302ms
fish
308ms
echo through 5 cat stages — turbo 1.0x vs forked
dash
0.52ms
lash
0.35ms
lash-turbo
0.37ms
sh
0.62ms
bash
0.64ms
zsh
0.80ms
fish
1.2ms
echo through 10 cat stages — turbo 1.0x vs forked
dash
0.58ms
lash-turbo
0.49ms
lash
0.50ms
bash
0.68ms
sh
0.71ms
zsh
1.2ms
fish
1.2ms
sort 100K lines — turbo 52.7x vs forked
lash-turbo
0.49ms
dash
25ms
sh
25ms
bash
25ms
lash
25ms
zsh
25ms
fish
26ms
sort | head from 100K lines — turbo 9.5x vs forked
lash-turbo
1.9ms
dash
23ms
sh
23ms
bash
23ms
lash
23ms
zsh
23ms
fish
23ms
sort | tail from 100K lines — turbo 6.2x vs forked
lash-turbo
3.5ms
dash
25ms
lash
25ms
bash
25ms
zsh
25ms
sh
26ms
fish
25ms
grep | sort | head from 100K — turbo 5.8x vs forked
lash-turbo
1.1ms
lash
9.4ms
dash
9.8ms
sh
9.8ms
bash
9.7ms
zsh
9.9ms
fish
9.7ms
16MB through 1 cat stage
lash-turbo
37736 MB/s
dash
6376 MB/s
bash
5742 MB/s
zsh
5456 MB/s
lash
5451 MB/s
sh
5304 MB/s
fish
1877 MB/s
16MB through 2 cat stages
lash-turbo
32922 MB/s
dash
5652 MB/s
zsh
4940 MB/s
bash
4853 MB/s
lash
4811 MB/s
sh
4654 MB/s
fish
1864 MB/s
16MB through 4 cat stages
lash-turbo
36364 MB/s
dash
5269 MB/s
lash
5189 MB/s
sh
4812 MB/s
bash
4698 MB/s
zsh
4623 MB/s
fish
1724 MB/s
16MB through 8 cat stages
lash-turbo
36199 MB/s
dash
4713 MB/s
sh
4269 MB/s
bash
4263 MB/s
lash
4200 MB/s
zsh
3788 MB/s
fish
1576 MB/s
16MB through 16 cat stages
lash-turbo
32258 MB/s
dash
2711 MB/s
bash
2649 MB/s
lash
2583 MB/s
sh
2531 MB/s
zsh
2164 MB/s
fish
1328 MB/s
16MB direct write to file
dash
7468 MB/s
zsh
6261 MB/s
sh
5991 MB/s
lash-turbo
5619 MB/s
bash
5560 MB/s
lash
5435 MB/s
fish
1774 MB/s
16MB through pipe then to file
lash-turbo
4648 MB/s
dash
4599 MB/s
zsh
4005 MB/s
bash
3939 MB/s
lash
3787 MB/s
sh
3784 MB/s
fish
1733 MB/s
16MB read from file through pipe
lash-turbo
4513 MB/s
lash
2808 MB/s
dash
2748 MB/s
bash
2654 MB/s
sh
2568 MB/s
zsh
2479 MB/s
fish
1141 MB/s
16MB to /dev/null (overhead baseline)
dash
28319 MB/s
bash
19196 MB/s
lash
19025 MB/s
sh
18486 MB/s
zsh
18401 MB/s
lash-turbo
18223 MB/s
fish
2356 MB/s

Common data-processing patterns across shells.

Sort 1000 lines (reverse numeric) — turbo 2.0x vs forked
lash-turbo
0.44ms
dash
0.45ms
lash
0.27ms
bash
0.50ms
sh
0.52ms
zsh
0.56ms
fish
0.67ms
Sort 10000 lines (reverse numeric) — turbo 5.5x vs forked
lash-turbo
0.52ms
dash
2.5ms
lash
2.2ms
bash
2.5ms
sh
2.5ms
zsh
2.5ms
fish
4.4ms
Filter odd-ending numbers from 1K via grep — turbo 1.0x vs forked
lash-turbo
0.20ms
lash
0.21ms
bash
0.41ms
sh
0.44ms
dash
0.55ms
zsh
0.52ms
fish
0.54ms
Filter even numbers from 10K via awk — turbo 1.0x vs forked
dash
1.6ms
lash
1.4ms
sh
1.5ms
bash
1.6ms
lash-turbo
1.4ms
zsh
1.7ms
fish
2.0ms
Transform (x2) 1K lines via awk — turbo 1.0x vs forked
dash
0.70ms
lash
0.54ms
lash-turbo
0.53ms
sh
0.74ms
bash
0.74ms
zsh
0.75ms
fish
1.7ms
Transform (x2) 10K lines via awk — turbo 1.0x vs forked
dash
1.8ms
lash
1.6ms
lash-turbo
1.6ms
sh
1.9ms
bash
2.0ms
zsh
2.0ms
fish
2.6ms
Sort 1K then take first 10 (sort | head) — turbo 1.1x vs forked
dash
0.47ms
lash-turbo
0.03ms
lash
0.27ms
bash
0.50ms
sh
0.53ms
zsh
0.53ms
fish
0.67ms
Filter even then double from 1K (awk combo) — turbo 1.1x vs forked
dash
0.74ms
lash-turbo
0.54ms
lash
0.56ms
bash
0.79ms
sh
0.87ms
zsh
0.85ms
fish
0.84ms
Filter+sort+take pipeline from 1K — turbo 1.1x vs forked
dash
0.80ms
lash-turbo
0.48ms
lash
0.59ms
bash
0.82ms
sh
0.84ms
zsh
0.91ms
fish
1.3ms
Substring grep '42' in 10K lines — turbo 1.0x vs forked
dash
0.41ms
lash
0.20ms
lash-turbo
0.26ms
bash
0.44ms
sh
0.45ms
zsh
0.47ms
fish
1.6ms
Sort 1M lines (reverse numeric) — turbo 14.9x vs forked
lash-turbo
19ms
bash
299ms
lash
299ms
dash
299ms
sh
300ms
zsh
304ms
fish
311ms
Filter lines starting with even digit from 1M — turbo 20.0x vs forked
lash-turbo
0.58ms
dash
11ms
bash
11ms
lash
11ms
sh
11ms
zsh
12ms
fish
12ms
Prepend prefix to 1M lines via sed — turbo 75.6x vs forked
lash-turbo
0.47ms
dash
35ms
bash
34ms
zsh
34ms
lash
35ms
sh
38ms
fish
35ms
Sort+head+sort+tail pipeline from 100K — turbo 5.8x vs forked
lash-turbo
6.3ms
dash
38ms
sh
39ms
zsh
39ms
lash
39ms
fish
38ms
bash
44ms
5-stage pipeline: grep+sort+head+sort+wc on 500K — turbo 4.3x vs forked
lash-turbo
23ms
lash
101ms
bash
102ms
sh
102ms
zsh
102ms
dash
103ms
fish
103ms
Generate+sort+uniq+sort pipeline from 100K — turbo 8.8x vs forked
lash-turbo
22ms
lash
197ms
bash
200ms
sh
201ms
dash
200ms
zsh
204ms
fish
203ms
100k small lines through pipe — turbo 2.6x vs forked
lash-turbo
0.42ms
dash
0.66ms
lash
0.50ms
bash
0.68ms
sh
0.69ms
zsh
0.73ms
fish
1.1ms
1M small lines through pipe — turbo 10.0x vs forked
lash-turbo
0.43ms
dash
3.9ms
lash
3.7ms
sh
3.9ms
bash
4.0ms
zsh
4.0ms
fish
4.4ms
100k lines through grep filter — turbo 4.1x vs forked
lash-turbo
0.48ms
dash
1.6ms
lash
1.3ms
sh
1.6ms
bash
1.6ms
zsh
1.7ms
fish
2.3ms
single fork+exec (no pipe) — turbo 1.0x vs forked
dash
0.30ms
bash
0.44ms
sh
0.44ms
zsh
0.59ms
lash-turbo
0.61ms
lash
0.01ms
fish
0.01ms
2-stage no-op pipe setup — turbo 1.1x vs forked
dash
0.09ms
lash-turbo
0.04ms
bash
0.13ms
sh
0.13ms
lash
0.04ms
zsh
0.09ms
fish
0.05ms
5-stage no-op pipe setup — turbo 0.9x vs forked
dash
0.16ms
bash
0.28ms
sh
0.27ms
lash
0.17ms
zsh
0.26ms
lash-turbo
0.19ms
fish
0.01ms
10-stage no-op pipe setup — turbo 1.0x vs forked
dash
0.27ms
sh
0.37ms
bash
0.39ms
lash
0.40ms
lash-turbo
0.36ms
zsh
0.50ms
fish
0.05ms

Turbo mode rewrites common pipelines into native array operations — no fork/exec overhead.

sort 1K lines — turbo 2.0x vs forked
lash-turbo
0.47ms
lash
0.31ms
sort -n 1K lines — turbo 2.3x vs forked
lash-turbo
0.40ms
lash
0.31ms
sort -rn 1K lines — turbo 1.9x vs forked
lash-turbo
0.49ms
lash
0.32ms
grep pattern from 1K lines — turbo 1.1x vs forked
lash-turbo
0.21ms
lash
0.24ms
grep -v pattern from 1K lines — turbo 1.0x vs forked
lash-turbo
0.26ms
lash
0.29ms
head -10 from 1K lines — turbo 2.1x vs forked
lash-turbo
0.39ms
lash
0.16ms
tail -10 from 1K lines — turbo 1.7x vs forked
lash-turbo
0.47ms
lash
0.16ms
uniq 1K sorted lines — turbo 1.1x vs forked
lash-turbo
0.35ms
lash
0.33ms
tac (reverse) 1K lines — turbo 1.8x vs forked
lash-turbo
0.47ms
lash
0.19ms
wc -l count 1K lines — turbo 1.8x vs forked
lash-turbo
0.46ms
lash
0.22ms
sort | head -10 from 1K — turbo 1.6x vs forked
lash-turbo
0.07ms
lash
0.37ms
grep | sort | tail from 1K — turbo 1.0x vs forked
lash-turbo
0.23ms
lash
0.32ms
sort | uniq from 1K — turbo 1.4x vs forked
lash-turbo
0.08ms
lash
0.27ms
sort | head -10 from 10K — turbo 2.6x vs forked
lash-turbo
0.30ms
lash
1.6ms
sort | tail -10 from 10K — turbo 2.8x vs forked
lash-turbo
0.30ms
lash
1.8ms
sort -n | head -10 from 10K — turbo 3.4x vs forked
lash-turbo
0.19ms
lash
1.9ms
sort -n | tail -10 from 10K — turbo 3.0x vs forked
lash-turbo
0.36ms
lash
2.1ms
sort -r | head -10 from 10K — turbo 2.8x vs forked
lash-turbo
0.31ms
lash
1.8ms
sort -r | tail -10 from 10K — turbo 3.1x vs forked
lash-turbo
0.29ms
lash
2.0ms
sort -rn | head -10 from 10K — turbo 3.1x vs forked
lash-turbo
0.36ms
lash
2.1ms
sort -rn | tail -10 from 10K — turbo 3.2x vs forked
lash-turbo
0.27ms
lash
2.4ms
sort | head -10 from 100K — turbo 5.7x vs forked
lash-turbo
2.9ms
lash
19ms
sort -n | head -10 from 100K — turbo 9.7x vs forked
lash-turbo
1.8ms
lash
23ms
sort -rn | head -10 from 100K — turbo 6.0x vs forked
lash-turbo
3.5ms
lash
24ms
sort | tail -10 from 100K — turbo 6.6x vs forked
lash-turbo
2.8ms
lash
22ms
grep | head -5 from 10K (early term) — turbo 1.9x vs forked
lash-turbo
0.50ms
lash
0.33ms
grep -v | head -5 from 10K (early term) — turbo 2.3x vs forked
lash-turbo
0.40ms
lash
0.25ms
grep | tail -5 from 10K (ring buffer) — turbo 1.9x vs forked
lash-turbo
0.41ms
lash
0.22ms
grep | wc -l from 10K (count) — turbo 1.7x vs forked
lash-turbo
0.50ms
lash
0.23ms
grep -v | wc -l from 10K (count) — turbo 1.7x vs forked
lash-turbo
0.52ms
lash
0.33ms
grep | head -5 from 100K (early term) — turbo 2.7x vs forked
lash-turbo
0.40ms
lash
0.48ms
grep | tail -5 from 100K (ring buffer) — turbo 2.8x vs forked
lash-turbo
0.48ms
lash
0.65ms
grep | wc -l from 100K (count) — turbo 2.9x vs forked
lash-turbo
0.44ms
lash
0.66ms
tac | head -10 from 10K (rewrite) — turbo 1.3x vs forked
lash-turbo
0.10ms
lash
0.23ms
tac | tail -10 from 10K (rewrite) — turbo 2.2x vs forked
lash-turbo
0.41ms
lash
0.28ms
sort 10K lines — turbo 5.9x vs forked
lash-turbo
0.41ms
lash
1.8ms
grep pattern from 10K lines — turbo 1.0x vs forked
lash
0.28ms
lash-turbo
0.28ms
sort | grep | head from 10K — turbo 2.3x vs forked
lash-turbo
0.70ms
lash
2.3ms
single true (no-op baseline) — turbo 1.0x vs forked
lash
0.62ms
lash-turbo
0.62ms
2-stage true pipe (not optimizable) — turbo 1.0x vs forked
lash-turbo
0.08ms
lash
0.10ms
10-stage true pipe (not optimizable) — turbo 1.0x vs forked
lash
0.41ms
lash-turbo
0.42ms
awk pipe (not optimizable) — turbo 1.0x vs forked
lash-turbo
0.55ms
lash
0.53ms
100k lines through grep filter — turbo 1.0x vs forked
lash
0.68ms
lash-turbo
0.55ms
sort+head | awk | sort+head from 100K — turbo 4.6x vs forked
lash-turbo
4.9ms
lash
24ms

Prompt rendering latency with starship.

starship prompt render
bash
5.3ms
lash
5.2ms
dash
5.9ms
zsh
6.0ms
sh
6.4ms
fish
6.4ms
starship prompt in git repo
dash
3.1ms
bash
3.2ms
sh
3.4ms
zsh
3.3ms
lash
3.8ms
fish
4.0ms

Internal protocol performance (lash-direct only).

10k small lines (protocol batching stress)
lash-direct
0.06ms
16 x 1MB writes (large chunk throughput)
lash-direct
5541 MB/s
small lines then bulk data burst
lash-direct
1.8ms
16MB bulk data delivered to client
lash-direct
7628 MB/s
1GB bulk data delivered to client
lash-direct
9792 MB/s
~1MB as individual lines to client
lash-direct
0.46ms

Turbo mode applies these optimizations automatically:

  • Passthrough stripping — removes identity operations so they never execute
  • Numeric sort key pre-computation — pre-computes keys in O(N) instead of parsing inside the comparator at O(N log N)
  • Streaming wc -l — counts newlines in the byte stream without collecting lines
  • C strtod for numeric conversion — calls C’s strtod directly, avoiding D’s to!double exception overhead
  • Fused operationsgrep | head, grep | tail, and grep | wc run in a single pass over the data
Terminal window
dub run :benchmarks
FlagDescription
--runs NNumber of iterations per scenario
--warmup NWarmup iterations before measurement
--scenario SRun only the named scenario
--jsonOutput results in JSON format
--verbosePrint per-iteration timings

To reproduce these numbers:

Terminal window
dub run :benchmarks -- --runs 100 --warmup 10 --verbose