Script: Analyze Server Logs

What’s a log analyzer?

A web server writes one line per request to its access log. After a few days that’s tens of thousands of lines, and the questions you actually want answered (“which endpoints are hot? which IPs hit us hardest? how many errors today?”) are buried in there.

A log analyzer is just a script that reads the file once, groups lines by some key, counts the groups, and prints the top few. Two ingredients carry most of the weight:

awk to pull a field out of each line.
.wordcount() to count unique values in a list and sort by frequency.

We’ll wrap both behind small named functions so the analyzer’s main reads as English.

Goal

Parse a Common Log Format access log and produce four summaries: the most-requested paths, the distribution of status codes, the top client IPs, and the count of 4xx/5xx error responses.

lash --test analyze-log.lash         # the unittests only
lash analyze-log.lash --help         # auto-generated help

The --test flag runs only the unittest { } blocks; the fn main body is skipped, so the tests don’t need a real log file.

Step 1: Decide what counts as an error

A lot of the script’s work comes back to “is this status code an error?” — and that’s a small pure function we should pin down before the rest of the wiring.

/// is_error: true for 4xx and 5xx codes, false otherwise
unittest {
    is_error("404").must.equal(true)
    is_error("500").must.equal(true)
    is_error("200").must.equal(false)
    is_error("301").must.equal(false)
}

fn is_error(status) {
    return status.startsWith("4") || status.startsWith("5")
}

Run lash --test analyze-log.lash; passes.

Step 2: Pretty-print one wordcount entry

The output for each top-N table is the same shape — <count> <value> — so factor it into a function and test it:

/// format_entry renders a {word, count} object as "<count> <word>"
unittest {
    let row = { word: "/api/users", count: 12045 }
    format_entry(row).must.equal("12045 /api/users")
}

fn format_entry(entry) {
    return "${entry["count"]} ${entry["word"]}"
}

Tiny function, but pinning it down means the script’s three “top N” sections all produce identical-looking rows. If a future contributor swaps the order of count and word, the test catches it.

Step 3: Wire it into a `main`

The wiring is mostly chains over awk output and for loops over the top-N results. We don’t unittest main — it touches the filesystem and shells out to awk and wc. The pure helpers above already have tests.

fn main(logfile: string, top: int = 10) {
    let check = `test -f $logfile`.capture
    if check.isFailure {
        exit "file not found: $logfile"
    }

    let total = `wc -l < $logfile`.first.trim()

    echo "Log Analysis: $logfile"
    echo "================================="
    echo "Total requests: $total"
    echo ""

    echo "Top $top requested paths:"
    let paths = `awk '{print $7}' $logfile`.wordcount().take(top)
    for entry in paths {
        echo "  ${format_entry(entry)}"
    }
    echo ""

    echo "Status codes:"
    let statuses = `awk '{print $9}' $logfile`.wordcount()
    for entry in statuses {
        echo "  ${entry["word"]}: ${entry["count"]}"
    }
    echo ""

    echo "Top $top IP addresses:"
    let ips = `awk '{print $1}' $logfile`.wordcount().take(top)
    for entry in ips {
        echo "  ${entry["count"]} requests from ${entry["word"]}"
    }
    echo ""

    let errors = `awk '{print $9}' $logfile`.filter(x => is_error(x)).length
    echo "Error responses (4xx/5xx): $errors"
}

.wordcount() counts occurrences of each unique line and returns a list of { word, count } objects sorted by frequency. .take(top) limits the result. The chain reads top-to-bottom: take awk’s output, group it, keep the most-frequent.

The error-count line uses the function we tested in Step 1:

let errors = `awk '{print $9}' $logfile`.filter(x => is_error(x)).length

.length on a list returns the element count — it’s not a method, no parentheses.

Complete script

#!/usr/bin/env lash

/// Analyze a web server access log and print a summary.
fn main(logfile: string, top: int = 10) {
    let check = `test -f $logfile`.capture
    if check.isFailure {
        exit "file not found: $logfile"
    }

    let total = `wc -l < $logfile`.first.trim()

    echo "Log Analysis: $logfile"
    echo "================================="
    echo "Total requests: $total"
    echo ""

    echo "Top $top requested paths:"
    let paths = `awk '{print $7}' $logfile`.wordcount().take(top)
    for entry in paths {
        echo "  ${format_entry(entry)}"
    }
    echo ""

    echo "Status codes:"
    let statuses = `awk '{print $9}' $logfile`.wordcount()
    for entry in statuses {
        echo "  ${entry["word"]}: ${entry["count"]}"
    }
    echo ""

    echo "Top $top IP addresses:"
    let ips = `awk '{print $1}' $logfile`.wordcount().take(top)
    for entry in ips {
        echo "  ${entry["count"]} requests from ${entry["word"]}"
    }
    echo ""

    let errors = `awk '{print $9}' $logfile`.filter(x => is_error(x)).length
    echo "Error responses (4xx/5xx): $errors"
}

fn is_error(status) {
    return status.startsWith("4") || status.startsWith("5")
}

fn format_entry(entry) {
    return "${entry["count"]} ${entry["word"]}"
}

/// is_error: true for 4xx and 5xx codes, false otherwise
unittest {
    is_error("404").must.equal(true)
    is_error("500").must.equal(true)
    is_error("200").must.equal(false)
    is_error("301").must.equal(false)
}

/// format_entry renders a {word, count} object as "<count> <word>"
unittest {
    format_entry({ word: "/api/users", count: 12045 }).must.equal("12045 /api/users")
}

Run it:

lash --test analyze-log.lash                              # unittests
lash analyze-log.lash --help                              # help
lash analyze-log.lash /var/log/nginx/access.log           # default top 10
lash analyze-log.lash /var/log/nginx/access.log 20        # top 20

Log Analysis: /var/log/nginx/access.log
=================================
Total requests: 48231

Top 10 requested paths:
  12045 /api/v1/users
  8923 /
  4521 /static/app.js
  3102 /api/v1/health
  ...

Status codes:
  200: 41023
  304: 3891
  404: 2105
  500: 212

Top 10 IP addresses:
  3421 requests from 10.0.0.1
  2918 requests from 10.0.0.5
  ...

Error responses (4xx/5xx): 2317

The doc-comment plus typed parameters become the --help:

analyze-log.lash — Analyze a web server access log and print a summary.

Arguments:
  logfile (string)
  top     (int)         default: 10

What makes this readable

This script replaces what would otherwise be a stack of shell pipelines:

awk '{print $7}' access.log | sort | uniq -c | sort -rn | head -10

For each summary, lash’s chain:

`awk '{print $7}' $logfile`.wordcount().take(top)

Is the same idea, but every step is a method, every method has a name, and the pure transforms (is_error, format_entry) get tested independently. Long chains can be split across lines for readability — a line starting with . continues the previous statement:

let paths = `awk '{print $7}' $logfile`
    .wordcount()
    .take(top)