CF Logpush — Format Transform & Automated Push Guide

Overview

🎯

Purpose

Automatically transform Cloudflare edge access logs from JSON (Logpush format) into a custom CDN partner log format, and deliver them to a remote log ingestion endpoint in near real-time.

🏗️

Stack

100% Cloudflare-native: Logpush + R2 + Queues + Workers. Zero servers, zero infrastructure maintenance, auto-scaling by design.

📋

Log Format

Target: 145-field \u0001-delimited plaintext format. Body compressed with Gzip. Authentication via MD5 HMAC URL signing.

⚡

End-to-End Latency

Approximately 70 seconds from request to log delivery. Typically well within partner SLA requirements.

Cloudflare Services Used

Service	Role	Location in Dashboard
Logpush	Exports edge HTTP request logs to R2 (~1 min batches, gzip-compressed ndjson)	Domain → Analytics & Logs → Logpush
R2 Object Storage	Stores raw log files and processed batch files (temporary)	Dashboard → R2 Object Storage
R2 Event Notifications	Triggers a Queue message on every new object created in R2	R2 Bucket → Settings → Event Notifications
Queues	Decouples parse and send stages; guarantees at-least-once delivery with automatic retry	Dashboard → Workers & Pages → Queues
Workers	Two logical workers in one script: Parser and Sender	Deployed via `wrangler deploy` or GitHub Actions CI/CD

Architecture

┌─────────────────────────────────────────────────────────────────────────────────────┐ │ [Step 1] Log Generation │ │ │ │ End user opens a webpage / app │ │ │ │ │ ▼ │ │ Request reaches a Cloudflare edge node (300+ cities worldwide) │ │ │ │ │ │ CF records one log entry: client IP, timestamp, URL, status code, │ │ │ response size, cache status, latency... │ │ ▼ │ │ CF edge processes the request; log entry is buffered in CF internal system │ └───────────────────────────────────────┬─────────────────────────────────────────────┘ │ Every ~1 minute, Logpush automatically batches and pushes │ ▼ ┌─────────────────────────────────────────────────────────────────────────────────────┐ │ [Step 2] Logpush Writes to R2 │ │ │ │ Logpush service (fully automated): │ │ 1. Packages all logs from the past ~1 minute (ndjson, one JSON object per line) │ │ 2. Gzip compresses the file │ │ 3. Writes to R2 bucket │ │ │ │ ┌─────────────────────────────────────────┐ │ │ │ R2 Bucket: cdn-logs-raw │ │ │ │ logs/ │ │ │ │ └── 20260328/ │ ← auto-partitioned by date │ │ │ ├── 20260328T090000Z_xxx.log.gz │ ← each file a few KB to a few MB │ │ │ ├── 20260328T090100Z_xxx.log.gz │ ← filename contains timestamp │ │ │ └── ... (new file every ~1 min) │ │ │ └─────────────────────────────────────────┘ │ └───────────────────────────────────────┬─────────────────────────────────────────────┘ │ R2 detects new object → immediately fires Event Notification (seconds) │ ▼ ┌─────────────────────────────────────────────────────────────────────────────────────┐ │ [Step 3] R2 Event Notification → parse-queue │ │ │ │ R2 Event Notification sends a message to parse-queue: │ │ { "bucket": "cdn-logs-raw", "object": { "key": "logs/20260328/xxx.log.gz" } } │ │ (Message contains only the file path — a few dozen bytes, never exceeds 128 KB) │ │ │ │ ┌─────────────────────────────────────────┐ │ │ │ Queue: parse-queue │ │ │ │ msg 1: { key: "logs/.../a.log.gz" } │ ← file path only, no log content │ │ │ msg 2: { key: "logs/.../b.log.gz" } │ ← auto-batched, auto-retry │ │ │ ... │ │ │ └─────────────────────────────────────────┘ │ └───────────────────────────────────────┬─────────────────────────────────────────────┘ │ Queue wakes up Parser Worker when messages arrive (~3-5 seconds) │ ▼ ┌─────────────────────────────────────────────────────────────────────────────────────┐ │ [Step 4] Parser Worker — Read, Decompress, Parse, Transform │ │ │ │ ① Obtain streaming reference to .gz file from R2 (no full download) │ │ │ │ │ ▼ │ │ ② Stream decompress via DecompressionStream — only a few KB in memory at a time │ │ │ │ │ ▼ │ │ ③ Split by newline, parse each line as JSON (one CF log entry per line) │ │ │ │ │ ▼ │ │ ④ Format transform (transformEdge function) │ │ CF JSON → \u0001-delimited 145-field plaintext format │ │ │ │ Before: { "RayID":"9e34...", "ClientIP":"221.229...", "EdgeColoCode":"HGH" } │ │ After: cs_vod_v3.0\u0001[28/Mar/2026:16:42:25 +0800]\u00019e34...\u0001200 │ │ \u00011774680145.000\u00010.149\u0001...(145 fields total) │ │ │ │ │ ▼ │ │ ⑤ Every 1000 lines (or end of file) → write R2 temp file + send to send-queue │ └───────────────────────────────────────┬─────────────────────────────────────────────┘ │ Write R2 temp file, enqueue message │ ▼ ┌─────────────────────────────────────────────────────────────────────────────────────┐ │ [Step 5] R2 Temp File + send-queue │ │ │ │ R2 temp file (processed/ directory, already in target format): │ │ processed/logs_20260328_xxx-0-1774683718956.txt ← up to 1000 log lines │ │ │ │ ┌─────────────────────────────────────────┐ │ │ │ Queue: send-queue │ │ │ │ msg: { "key": "processed/...txt" } │ ← file path only, a few dozen bytes │ │ │ ... │ ← auto-retry up to 5x → send-dlq │ │ └─────────────────────────────────────────┘ │ └───────────────────────────────────────┬─────────────────────────────────────────────┘ │ Queue wakes up Sender Worker when messages arrive (~3-5 seconds) │ ▼ ┌─────────────────────────────────────────────────────────────────────────────────────┐ │ [Step 6] Sender Worker — Compress, Sign, Send │ │ │ │ ① Read temp file from R2 (up to 1000 lines in target format) │ │ │ │ │ ▼ │ │ ② Gzip compress body (required by partner spec) │ │ │ │ │ ▼ │ │ ③ Compute MD5 auth signature: │ │ ts = current Unix time + 300s │ │ rand = random integer │ │ sigstr = "{uri}-{ts}-{rand}-{privateKey}" │ │ md5hash = MD5(sigstr) │ │ URL = {endpoint}{uri}?auth_key={ts}-{rand}-{md5} │ │ │ │ │ ▼ │ │ ④ HTTP POST to partner log ingestion server (Content-Encoding: gzip) │ │ │ │ │ ├── Success (HTTP 200) → ack message + delete R2 temp file ✅ │ │ └── Failure → retry → up to 5x → send-dlq on exhaustion ⚠️ │ └───────────────────────────────────────┬─────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────────────────────────┐ │ [Step 7] Partner Ingestion │ │ Verify auth → Gzip decompress → split on \u0001 → write to billing/analytics DB ✓│ └─────────────────────────────────────────────────────────────────────────────────────┘

Queue messages carry only R2 object keys (a few dozen bytes), never log content. This sidesteps the 128 KB Queue message size limit entirely.

Key Design Decisions

Streaming Gzip Decompression

Raw log files are processed via DecompressionStream + ReadableStream. The Worker never buffers the full file in memory — it decompresses and transforms line-by-line, keeping memory usage constant regardless of file size. This is essential given the 128 MB Worker memory limit.

Two-Stage Queue Architecture

parse-queue handles R2 → transform. send-queue handles R2 → HTTP POST. The separation allows independent retry policies: parse failures retry 2×, send failures retry 5×. Both queues have dedicated dead-letter queues (DLQs) for observability and manual recovery.

Idempotent Delivery

Queues use at-least-once semantics. The Sender handles duplicate delivery gracefully: if the R2 temporary file has already been deleted (indicating a prior successful send), the message is silently acknowledged without re-sending.

Timestamp Handling

The parser auto-detects EdgeStartTimestamp in three formats: Unix seconds integer, Unix milliseconds integer, and RFC 3339 string. Logpush is recommended to be configured with timestamp_format = unix for consistency.

CDN Node Country Mapping

The target format requires the CDN node country (not client country). EdgeColoCode (IATA airport code, e.g. NRT = Tokyo) is mapped to ISO 3166-1 alpha-2 country code via a built-in lookup table covering 200+ airports globally. If the IATA code is not in the table, defaults to CN.

MD5 Authentication

Workers' SubtleCrypto API does not support MD5. A pure-JS RFC 1321 implementation is included in src/index.js. The auth URL signature format is: MD5({uri}-{ts}-{rand}-{privateKey}).

Response Content-Length (Field #21)

The http_requests dataset does not expose a dedicated Content-Length field. Field #21 is populated from ResponseHeaders['content-length'] when Logpush Custom Fields are configured via API (see Step 6b in Deployment). If not configured or the header is absent, the field returns -.

Custom Fields configuration for content-length is optional. Without it, field #21 will consistently output -, which is the standard placeholder for unavailable fields in this format.

Reliability Model

Failure Scenario	Behavior	Recovery
Remote endpoint temporarily unavailable	send-queue retries up to 5× with exponential backoff (~15 min window)	Automatic; R2 file preserved until delivery confirmed
Malformed log record in source file	Per-line error caught; line skipped; batch continues	Error count logged; file-level processing succeeds
Parser crash on corrupt .gz file	parse-queue retries 2×; moves to parse-dlq	Manual inspection of DLQ message + raw R2 file
Duplicate Queue delivery	R2 file not found → silent ack, no re-send	Automatic; no data duplication

Performance Characteristics

Measured: 2026-03-30, 6-hour window (11:00–17:00 GMT+8) · HTTP traffic: 13.31M requests · Data transfer: 70.39 GB

Worker

Metric	Observed	Assessment
Worker invocations (6h)	2k	Matches Logpush batch count
Errors (internal)	0	Clean
Exceeded CPU Time Limits	3	Occurred at peak; monitor if increasing
Exceeded Memory	0	Streaming design effective
Median CPU Time	3.52 ms	Far below 30s limit
P90 CPU Time	355 ms	Normal for large file batches
P99 CPU Time	3.17s	Within 30s limit
Wall Time P50 / P99	1.13s / 18.97s	Within limits including network wait
5xx subrequests (to partner)	36 / 15k	0.24% — partner-side intermittent; auto-retried

parse-queue

Metric	Observed	Assessment
Messages ingested (6h)	355	Matches Logpush file count
Messages acknowledged	347 (97.7%)	Healthy
Messages retried	0	No parse failures
Realtime backlog	0	Real-time consumption
Average backlog (avg/peak)	5.47 / 13	Light
Consumer lag time (avg/peak)	15.13s / ~250s	Peak lag at traffic burst; acceptable
Consumer concurrency (avg/peak)	1.37 / 14	Auto-scaled effectively
DLQ messages	0	No data loss

send-queue

Metric	Observed	Assessment
Messages ingested (6h)	14.9k	~14.9M log lines delivered
Messages acknowledged	15.45k	Including prior backlog clearance
Messages retried	0	No send failures escalated to queue
Realtime backlog	0	Real-time consumption
Average backlog (avg/peak)	32.6 / 106	Moderate at peak; within capacity
Consumer lag time (avg/peak)	5.62s / ~55s	Acceptable
Consumer concurrency (avg/peak)	22.07 / ~220	Near 250 concurrency limit at peak
DLQ messages	0	No data loss

Capacity Headroom

Dimension	Current Load	5× Traffic	10× Traffic
Worker CPU	~5% of limit	Safe	P99 near limit on large files
Worker Memory	0 overruns	Safe (streaming design)	Safe
parse-queue lag	avg 15s, peak 250s	Peak lag ~5–10 min	Sustained backlog risk
send-queue concurrency	avg 22, peak 220	Near 250 limit at peak	Exceeds 250 limit
Partner endpoint	0.24% 5xx	Monitor 5xx rate	Likely bottleneck

Overall assessment: the system operates comfortably at current traffic levels. Stable up to ~5× current volume. Beyond 5×, send-queue concurrency and parse-queue lag become primary constraints. The partner endpoint capacity should be confirmed before scaling.

Key Metrics to Monitor

Priority	Metric	Alert Threshold
★★★	DLQ message count (both queues)	> 0 — immediate action
★★★	Realtime Backlog (sustained)	> 100 messages for > 5 min
★★★	Exceeded CPU Time Limits	> 10 per hour
★★	Consumer Lag Time (parse-queue)	Sustained > 5 minutes
★★	5xx subrequests rate	> 0.5% of total
★★	send-queue Consumer Concurrency	Sustained > 200

★P99 CPU Time> 20s

Deployment Guide

Prerequisites: Node.js v18+ installed locally. Wrangler CLI installed (npm install -g wrangler). A Cloudflare API Token scoped to the target account with the following permissions: Workers Scripts:Edit, Workers R2 Storage:Edit, and Workers Queue:Edit (use the Edit Cloudflare Workers template and manually add the R2 and Queue permissions).

Before deploying, obtain from your log ingestion partner: server endpoint URL, authentication private key, and the target URI path.

Create R2 Bucket

Dashboard → R2 Object Storage → Create bucket

Name: cdn-logs-raw
Location: Automatic (default)

Create Queues

Dashboard → Workers & Pages → Queues → Create queue

Create all four queues (exact names required):

Queue Name	Purpose
`parse-queue`	Triggers Parser Worker on new R2 files
`send-queue`	Triggers Sender Worker on processed batches
`parse-dlq`	Dead-letter queue for parse failures
`send-dlq`	Dead-letter queue for send failures

Configure R2 Event Notification

R2 → cdn-logs-raw → Settings → Event Notifications → Add notification

Event type: object-create
Queue: parse-queue
Prefix (optional): logs/
Suffix (optional): .gz

Set Worker Secrets

From the project directory, set encrypted secrets via Wrangler:

# Log ingestion endpoint (base URL, no trailing slash)
wrangler secret put CTYUN_ENDPOINT

# Authentication private key
wrangler secret put CTYUN_PRIVATE_KEY

# Target URI path (e.g. /logpost/yourpath)
wrangler secret put CTYUN_URI_EDGE

# Verify
wrangler secret list

If using GitHub Actions CI/CD, secrets still need to be set via wrangler secret put directly. They persist in Cloudflare and do not need to be re-uploaded on each deployment.

Deploy the Worker

wrangler deploy

Expected output confirms: Producer bindings for parse-queue and send-queue. Consumer bindings for parse-queue and send-queue. Current Version ID assigned.

If deploying via GitHub Actions, simply push to the main branch. The workflow handles wrangler deploy automatically.

Configure Logpush Job

Dashboard → [Target Zone] → Analytics & Logs → Logpush → Create a Logpush job

Destination: R2
Bucket: cdn-logs-raw · Path prefix: logs/{DATE}/
Dataset: HTTP requests
Fields: Select all fields (the Worker handles missing fields gracefully)
Timestamp format: Unix
Sample rate: 1 (100%)

Timestamp format must be set to Unix. The parser auto-detects all three formats but Unix is recommended for consistency.

Enable Content-Length Logging — Optional

By default, field #21 (sent_http_content_length) falls back to EdgeResponseBodyBytes. To capture the true Content-Length response header value, configure Logpush Custom Fields via API. This is a one-time zone-level configuration.

This configuration requires API access. It cannot be set through the Cloudflare Dashboard UI.

Step A — Check for existing Custom Fields ruleset

# List zone rulesets and find http_log_custom_fields phase
curl "https://api.cloudflare.com/client/v4/zones/$ZONE_ID/rulesets" \
  --request GET \
  --header "Authorization: Bearer $CLOUDFLARE_API_TOKEN" \
  | jq '.result[] | select(.phase == "http_log_custom_fields") | {id, phase}'

If a ruleset is returned, note its id as RULESET_ID and skip to Step C. If no result, proceed to Step B.

Step B — Create the Custom Fields ruleset (only if Step A returned nothing)

curl "https://api.cloudflare.com/client/v4/zones/$ZONE_ID/rulesets" \
  --request POST \
  --header "Authorization: Bearer $CLOUDFLARE_API_TOKEN" \
  --json '{
    "name": "Zone-level phase entry point",
    "kind": "zone",
    "description": "Custom log fields for Logpush",
    "phase": "http_log_custom_fields"
  }' | jq '.result.id'

Note the returned id as RULESET_ID.

Step C — Configure content-length capture

Use raw_response_fields to capture the original content-length value from the origin before any CF transformations.

curl "https://api.cloudflare.com/client/v4/zones/$ZONE_ID/rulesets/$RULESET_ID" \
  --request PUT \
  --header "Authorization: Bearer $CLOUDFLARE_API_TOKEN" \
  --json '{
    "rules": [
      {
        "action": "log_custom_field",
        "expression": "true",
        "description": "Capture content-length response header for Logpush",
        "action_parameters": {
          "raw_response_fields": [
            { "name": "content-length" }
          ]
        }
      }
    ]
  }'

Step D — Add ResponseHeaders to the Logpush job

# Get your Logpush job ID first
curl "https://api.cloudflare.com/client/v4/zones/$ZONE_ID/logpush/jobs" \
  --request GET \
  --header "Authorization: Bearer $CLOUDFLARE_API_TOKEN" \
  | jq '.result[] | {id, name, dataset}'

# Update job to include ResponseHeaders field
curl "https://api.cloudflare.com/client/v4/zones/$ZONE_ID/logpush/jobs/$JOB_ID" \
  --request PUT \
  --header "Authorization: Bearer $CLOUDFLARE_API_TOKEN" \
  --json '{
    "output_options": {
      "field_names": [
        "CacheCacheStatus","ClientCountry","ClientIP","ClientRequestBytes",
        "ClientRequestHost","ClientRequestMethod","ClientRequestProtocol",
        "ClientRequestReferer","ClientRequestScheme","ClientRequestURI",
        "ClientRequestUserAgent","ClientSSLProtocol","ClientSrcPort",
        "EdgeColoCode","EdgeResponseBodyBytes","EdgeResponseBytes",
        "EdgeResponseContentType","EdgeResponseStatus","EdgeServerIP",
        "EdgeStartTimestamp","EdgeTimeToFirstByteMs","OriginIP",
        "OriginRequestHeaderSendDurationMs","OriginResponseDurationMs",
        "OriginResponseHeaderReceiveDurationMs","OriginResponseStatus",
        "OriginTLSHandshakeDurationMs","ParentRayID","RayID",
        "ResponseHeaders"
      ],
      "timestamp_format": "unix"
    }
  }'

Once configured, ResponseHeaders in each log line will contain {"content-length": "12345"} and the Worker will automatically use it for field #21.

Note: For HTTP/2 responses or gzip-compressed responses, CF may omit Content-Length and use chunked transfer encoding instead. In those cases field #21 will output -.

Verify End-to-End

Allow ~2 minutes for Logpush to initialize and write the first batch, then verify:

R2: cdn-logs-raw/logs/ contains .log.gz files
Queue: parse-queue → Messages Processed counter is incrementing
Worker logs: Real-time tail via wrangler tail ctyun-logpush

# Expected log output (LOG_LEVEL=info)
[INFO] Parsing: logs/20260328/xxx.log.gz
[INFO] Done: logs/20260328/xxx.log.gz | lines=73 batches=1 errors=0
[INFO] Sent 73 lines → HTTP 200 | processed/xxx.txt

HTTP 401/403 on send → verify CTYUN_PRIVATE_KEY value

HTTP 404 on send → verify CTYUN_URI_EDGE path

Configuration Reference

wrangler.toml

name                = "ctyun-logpush"
main                = "src/index.js"
compatibility_date  = "2026-03-27"
compatibility_flags = ["nodejs_compat"]
account_id          = "<your-account-id>"

[[r2_buckets]]
binding     = "RAW_BUCKET"
bucket_name = "cdn-logs-raw"

[[queues.producers]]
binding = "PARSE_QUEUE"
queue   = "parse-queue"

[[queues.producers]]
binding = "SEND_QUEUE"
queue   = "send-queue"

[[queues.consumers]]
queue              = "parse-queue"
max_batch_size     = 5
max_batch_timeout  = 10
max_retries        = 2
dead_letter_queue  = "parse-dlq"

[[queues.consumers]]
queue              = "send-queue"
max_batch_size     = 50
max_batch_timeout  = 5
max_retries        = 5
dead_letter_queue  = "send-dlq"

[vars]
BATCH_SIZE = "1000"  # Lines per HTTP POST batch
LOG_LEVEL  = "info"  # debug | info | warn | error

workers_dev  = false
preview_urls = false

Environment Variables

Name	Type	Example	Description
CTYUN_ENDPOINT	Secret	http://log.example.com:5580	Log ingestion server base URL (no trailing slash, no URI path)
CTYUN_PRIVATE_KEY	Secret	YourKey@1234	MD5 authentication private key (provided by log partner)
CTYUN_URI_EDGE	Secret	/logpost/yourpath	Target URI path for HTTP POST (provided by log partner)
BATCH_SIZE	Var	1000	Number of log lines per POST batch.
LOG_LEVEL	Var	info	Worker log verbosity. Use `debug` for troubleshooting; revert to `info` in production.

Secrets are set once via wrangler secret put <NAME> and persist in Cloudflare. They are never stored in wrangler.toml or committed to source control.

GitHub Actions CI/CD

The repository includes .github/workflows/deploy.yml. Any push to main triggers an automatic wrangler deploy (code and configuration only).

Required GitHub secret for CI/CD: CLOUDFLARE_API_TOKEN only. Worker secrets (CTYUN_ENDPOINT, CTYUN_PRIVATE_KEY, CTYUN_URI_EDGE) are managed separately via wrangler secret put.

Operations

Health Check — Dashboard Indicators

Indicator	Healthy State	Action if Abnormal
R2 `logs/`	New `.gz` files appearing every ~1 min	Check Logpush job is Enabled
parse-queue backlog	Near 0 at all times	Sustained backlog → check Worker logs for parse errors
send-queue backlog	Near 0 at all times	Sustained backlog → check remote endpoint availability
parse-dlq	Inactive (0 messages)	Messages present → inspect raw `.gz` file for corruption
send-dlq	Inactive (0 messages)	Messages present → check endpoint, manually reprocess from R2
R2 `processed/`	No file accumulation	Files accumulating → Sender not consuming; check Worker logs

Metric Interpretation Guide

Signal	Meaning	Action
Backlog = 0 + Lag < 30s	System consuming in real time	Healthy — no action needed
Backlog growing continuously	Consumption falling behind production — genuine bottleneck	Check Worker errors; scale if needed
Exceeded CPU Time Limits > 0	Individual Logpush file too large for single Worker invocation	Monitor frequency; do NOT reduce `max_batch_size`
Messages Retried > 0	Partner endpoint instability	Investigate 5xx subrequests; confirm endpoint capacity
DLQ > 0	Data loss risk — highest priority	Inspect DLQ messages immediately; manually reprocess from R2
send-queue Concurrency > 200	Near the 250 concurrent consumer limit	Increase `max_batch_size` for send-queue

Updating Secrets vs. Variables

Change Type	Method	Redeploy Required?
Rotate CTYUN_ENDPOINT / CTYUN_PRIVATE_KEY / CTYUN_URI_EDGE	`wrangler secret put <NAME>`	No — effective immediately
Change BATCH_SIZE / LOG_LEVEL	Edit `wrangler.toml` → `wrangler deploy`	Yes
Update Worker code (`src/index.js`)	`git push` → CI/CD auto-deploys	Yes (handled by CI/CD)