Skip to content

docs: add plan spec for manual testing workflows#38

Open
jlevy wants to merge 5 commits intomainfrom
claude/tryscript-manual-testing-ZPMvS
Open

docs: add plan spec for manual testing workflows#38
jlevy wants to merge 5 commits intomainfrom
claude/tryscript-manual-testing-ZPMvS

Conversation

@jlevy
Copy link
Owner

@jlevy jlevy commented Jan 31, 2026

Summary

This PR adds a comprehensive plan specification for supporting "manual" test scripts in tryscript - tests that facilitate human or agent review rather than strict pass/fail automation. This addresses use cases where outputs are inherently variable (LLM responses, web search results) or require subjective evaluation (visual UX, quality metrics).

Key insight: Manual testing isn't just about variable outputs - it's also about quality evaluation (evals) where both old and new results might be "correct" but need comparison for quality, relevance, or regression.

Changes

  • New plan spec: docs/project/specs/active/plan-2026-01-31-manual-testing-workflows.md

    • Documents 6 implementation phases for manual testing support
    • Includes comprehensive playbook with use cases, workflows, best practices, and anti-patterns
    • Proposes --review mode, validation: binary|manual frontmatter, and review annotations
    • Quality evaluation (evals) use case for search engines, rankings
    • Phase VI for comparison modes beyond diffs (side-by-side, evaluators, LLM-assisted)
  • tbd config updates: Updated tbd to v0.1.17, fixed gitignore for docs cache

Use Cases Covered

Use Case Comparison Strategy
Deterministic CLI Standard diff (automated)
LLM/AI responses Diff + manual review
Web scraping/search Structure validation
Quality evals Side-by-side, custom evaluators
Visual/UX output Manual review with annotations
Interactive workflows Scripted input testing

Test Plan

  • Documentation builds/renders correctly
  • Plan spec follows existing spec format conventions
  • No code changes - documentation only review

Manual Review Checklist

  • Use cases are comprehensive: Covers deterministic, LLM, search, evals, visual, interactive
  • Quality eval workflow is clear: Side-by-side comparison, custom evaluators, future LLM evaluation
  • Comparison modes are well-defined: diff (default), side-by-side, evaluator, llm (future)
  • Best practices are actionable: Examples show concrete do/don't patterns
  • Anti-patterns are clear: Each anti-pattern shows both bad and good alternatives
  • CI integration examples are copy-pasteable: GitHub Actions workflow is complete
  • Phase breakdown is logical: Features build on each other appropriately

Proposed Features Summary

Phase Feature Purpose
I Playbook Document patterns and anti-patterns
II --review mode Run + update + show diff, exit 0
III validation: manual Per-file designation
IV Review annotations <!-- REVIEW: ... --> guidance
V CI patterns GitHub Actions examples
VI Comparison modes Side-by-side, evaluators, LLM (future)

Example: Quality Eval Workflow

---
validation: manual
compare: side-by-side
---

# Eval: Search relevance

<!-- EVAL CRITERIA: Top 5 should be relevant, no obviously wrong results -->

```console
$ search-cli query "wireless headphones"
[.. results ..]

**Reviewer sees**:

┌─ Previous ─────────────────┬─ Current ──────────────────┐
│ 1. Sony WH-1000XM4 │ 1. Sony WH-1000XM5 │
│ 2. Bose QC45 │ 2. Bose QC45 │
│ 3. Apple AirPods Pro │ 3. JBL Tune 760NC │
└────────────────────────────┴────────────────────────────┘


## Related Beads

None - this is a new planning specification.

---

https://claude.ai/code/session_013zTMZFAZESM7uy9oAAxCKN

Design document for supporting "manual" test scripts that facilitate
human/agent review rather than strict pass/fail automation. Addresses
use cases for LLM responses, web scraping, visual UX, and other
variable outputs.

Key features proposed:
- Documentation playbook with patterns and anti-patterns
- --review mode for update + diff display
- validation: binary|manual frontmatter option
- Review annotations in test files
- CI integration patterns

https://claude.ai/code/session_013zTMZFAZESM7uy9oAAxCKN
@github-actions
Copy link

github-actions bot commented Jan 31, 2026

Coverage Report

Status Category Percentage Covered / Total
🔵 Lines 93.29% 2557 / 2741
🔵 Statements 93.29% 2557 / 2741
🔵 Functions 35.76% 54 / 151
🔵 Branches 36.87% 243 / 659
File CoverageNo changed files found.
Generated in workflow #137 for commit 6a05fe1 by the Vitest Coverage Report Action

Expands plan spec to include:
- Quality evaluation (evals) use case for search engines, rankings
- Side-by-side comparison mode (beyond simple diffs)
- Custom evaluator scripts for metric-based comparison
- Future LLM-assisted evaluation concept
- Generalization of comparison beyond diffs to evaluation strategies

https://claude.ai/code/session_013zTMZFAZESM7uy9oAAxCKN
- Upgrade tbd to v0.1.17
- Remove .tbd/docs/ and .tbd/state.yml from git tracking
- Update .tbd/.gitignore to properly ignore docs cache and state
- Update tbd config with new docs_cache format
- Add Claude Code integration scripts

https://claude.ai/code/session_013zTMZFAZESM7uy9oAAxCKN
Key changes:
- Add "Jupyter Notebooks for CLI Testing" mental model
- Remove HTML comment syntax for review criteria (use plain prose)
- Simplify Phase VI to defer advanced comparison modes
- Emphasize building on existing tools (tryscript + git diff)
- Add design principles: non-interactive, agent-friendly, prose is docs

The core insight: tryscript run --update + git diff already works.
Minimize new features, maximize reuse.

https://claude.ai/code/session_013zTMZFAZESM7uy9oAAxCKN
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants