2026.06 / ANALYTICS GUIDE

Privacy-Friendly Nginx Log Analytics for a Static Blog

A static blog can be measured without turning it into an ad-tech project. If the site is already served by nginx, the web server sees enough information to answer the questions that matter: which pages are being read, where readers arrive from, and whether the audience is growing.

The pattern I like is deliberately boring: keep nginx access logs, parse them into SQLite, count daily salted visitor hashes, ignore obvious bots, and avoid storing raw IP addresses after ingestion. It is not as feature-rich as a full analytics suite, but it is enough for a small independent site that cares about real readers more than dashboards.

Goal: measure useful audience growth without cookies, fingerprinting scripts, third-party pixels, or a database of visitor IPs.

What this is good for

This setup answers practical questions for a static site:

How many likely human visitors read the site today?
Which posts attract real attention instead of crawler noise?
Are Google, RSS readers, direct visits, or internal links sending traffic?
Is a new post finding an audience after it is published?

It does not try to identify people across weeks, build advertising profiles, replay sessions, track scroll depth, or collect personal data for fun. That constraint is the point.

The data boundary

nginx can log values such as the request path, status code, referrer, user agent, timestamp, and client address. The official nginx log module documents the configurable log_format and access_log directives, which means you can choose exactly what lands in the raw log.

The privacy boundary should be simple:

Read the raw server log locally.
Normalize the request into the small fields needed for counting.
Hash the client address with a private salt and the visit date.
Store the hash, page path, referrer source, user-agent classification, and timestamp.
Do not persist the raw IP address in SQLite.

raw nginx log
  ↓
local parser
  ↓
SQLite row:
  day, path, status, source, visitor_hash, bot_flag

The date in the hash matters. A daily visitor hash lets the site count unique readers per day without creating a stable identifier that follows someone forever.

A minimal SQLite shape

SQLite is a good fit because it is one file, easy to back up, easy to query, and already comfortable on small servers. A minimal table can look like this:

CREATE TABLE visits (
  id INTEGER PRIMARY KEY,
  seen_at TEXT NOT NULL,
  day TEXT NOT NULL,
  path TEXT NOT NULL,
  status INTEGER NOT NULL,
  source TEXT NOT NULL,
  visitor_hash TEXT NOT NULL,
  is_bot INTEGER NOT NULL DEFAULT 0
);

CREATE INDEX visits_day_idx ON visits(day);
CREATE INDEX visits_path_idx ON visits(path);
CREATE INDEX visits_visitor_idx ON visits(day, visitor_hash);

SQLite date functions such as strftime make simple daily and monthly rollups possible without a separate analytics service.

Hashing visitors without keeping IPs

The hash should be one-way and salted. In pseudocode:

visitor_hash = sha256(
  private_salt + "|" + day + "|" + client_address + "|" + coarse_user_agent
)

The salt belongs outside the website repository. Keep it in a local environment variable or a private server secret store. If the database is copied somewhere else, the hashes should not be useful for reconstructing addresses.

Do not publish the real salt, the real path to the salt file, or the raw production log location in public posts. Those are operational details, not content.

Filtering obvious non-readers

Static sites receive plenty of requests that are not readers: uptime checks, feed fetchers, link preview bots, search crawlers, vulnerability scanners, and random probes for files that do not exist. Some of those are useful; they just should not be counted as human visitors.

A small filter can mark requests as bot-like when:

The user agent contains obvious crawler terms such as bot, crawl, spider, or preview.
The path is for assets rather than pages: CSS, JavaScript, images, favicon, or verification files.
The status code is not a normal page response.
The path is a scanner target such as /wp-admin on a site that is not WordPress.

The filter does not need to be perfect. It only needs to be conservative enough that the trend line mostly reflects people reading pages.

Queries that are actually useful

Once the database has cleaned rows, the useful queries are small.

-- Daily unique likely-human visitors
SELECT day, COUNT(DISTINCT visitor_hash) AS unique_visitors
FROM visits
WHERE is_bot = 0 AND status = 200
GROUP BY day
ORDER BY day DESC;

-- Most-read posts
SELECT path,
       COUNT(*) AS pageviews,
       COUNT(DISTINCT visitor_hash) AS unique_visitors
FROM visits
WHERE is_bot = 0
  AND status = 200
  AND path LIKE '/posts/%'
GROUP BY path
ORDER BY unique_visitors DESC;

-- Referrer/source summary
SELECT source, COUNT(DISTINCT visitor_hash) AS unique_visitors
FROM visits
WHERE is_bot = 0 AND status = 200
GROUP BY source
ORDER BY unique_visitors DESC;

Those three views are enough to guide a legitimate growth sprint: write more of what people read, improve pages that earn impressions but not clicks, and keep publishing genuinely useful long-tail posts.

Where consent guidance fits

Privacy rules vary by jurisdiction, so this is not legal advice. The useful design principle is to minimize the data before arguing about compliance. CNIL’s public guidance on audience measurement describes consent-exemption conditions for analytics whose purpose is limited to measuring an audience. Whether a specific site qualifies depends on implementation and context, but the direction is clear: limit the purpose, limit retention, and avoid unnecessary tracking.

For a personal static blog, that means the analytics system should be boring by design: no cross-site tracking, no ad targeting, no ad network pixel, no raw IP archive, and no cross-site tracking. If a client-side analytics script is used, it should be self-hosted, lightweight, and limited to first-party audience measurement.

Retention rules

Measurement improves when old data ages out. A simple retention policy might be:

Keep raw nginx logs only as long as operationally necessary.
Keep hashed per-request rows for a short rolling window.
Keep aggregate daily counts longer because they no longer need request-level detail.
Rotate the salt if the threat model or hosting situation changes.

The exact number of days is less important than making the rule explicit and automating it.

How it fits this blog

This site is static HTML served by nginx and deployed through Coolify, so log-based measurement matches the architecture. There is a small self-hosted Umami tag for page-level trends plus nginx log checks for independent verification. The goal is still the same: useful measurement without ad pixels, cross-site profiling, or storing raw IP addresses in the local analytics database.

The workflow is:

publish a practical post
  ↓
verify the live URL
  ↓
collect cleaned nginx visits
  ↓
review unique readers and sources
  ↓
choose the next useful topic

That loop is slower than buying traffic and less flashy than a real-time dashboard. It is also healthier. The metric is not “can I inflate a counter?” The metric is “are real people finding useful pages?”

Practical checklist

Define an nginx log format with only the fields you need.
Parse logs locally into SQLite.
Hash visitor identifiers with a private salt and the date.
Never store raw IP addresses in the analytics database.
Filter obvious bots, assets, scanners, and non-page responses.
Report unique readers, pageviews, top posts, and sources.
Use the results to write better posts, not to track individuals.

If a static blog is meant to be fast, independent, and reader-friendly, its analytics should follow the same values. Start with the smallest measurement system that can inform better publishing decisions, then resist the urge to collect more just because you can.