Writing

RSS

Notes on the Claude Code 2.1.0 outage

A changelog formatting change took down Claude Code. Lessons about parsing human docs as machine data.

Notes on the Eurostar chatbot "vulnerability" report

Looking at what makes something a vulnerability versus a hardening opportunity in LLM applications.

What I learned shipping 1,000+ PRs with Claude Code

Notes from using Claude Code in parallel git worktrees: Plan Mode, ultrathink, verification loops, and Chrome automation.

How AI Regulation Changed in 2025

Why "AI compliance questions" appeared in security questionnaires and RFPs, and how policy becomes contract requirements.

Why Attack Success Rate (ASR) Isn't Comparable Across Jailbreak Papers

ASR isn't portable across papers because measurement choices dominate the headline number. Includes math and a checklist for reading papers.

GPT-5.2 Initial Trust and Safety Assessment

Day-zero red team of GPT-5.2 focusing on jailbreak resilience and harmful content.

Real-Time Fact Checking for LLM Outputs

Introduces search-rubric, an assertion where a search-enabled judge verifies time-sensitive claims during evals and CI.

When AI becomes the attacker: The rise of AI-orchestrated cyberattacks

Connects malware querying LLMs at runtime with "vibe hacking" case studies. Defense needs continuous testing.

Reinforcement Learning with Verifiable Rewards Makes Models Faster, Not Smarter

RLVR gains are often "search compression" rather than new reasoning ability.

Prompt Injection vs Jailbreaking: What's the Difference?

Jailbreaking targets model safety training; prompt injection targets application trust boundaries.

AI Safety vs AI Security in LLM Applications: What Teams Must Know

Safety protects people from harmful outputs; security protects systems from adversarial manipulation.

Evaluating political bias in LLMs

Open methodology and dataset (2,500 political statements) to measure political leaning in models.

Testing Humanity's Last Exam with Promptfoo

Guide on using Promptfoo to test the HLE benchmark.