project-page active pal-e-pac
project-pal-e-pac updated 2026-03-15

Project: pal-e-pac

Vision

The development experience that lives forever. A LangGraph-based CLI agent framework that replicates the DORA Elite AI Enterprise operating model using local models (Ollama + Qwen) as the sustainable foundation — so Lucas can continue developing with Betty Sue, Dev, QA, and Dottie regardless of Claude Code availability, cost, or policy. 2-pac lives forever.

Thesis: Benchmark-driven development. Define competence as 29 measurable test cases across 8 categories (tool selection, parameter correctness, safety compliance, instruction following, code generation, PR review, doc operations, multi-step reasoning). Every phase is validated against these benchmarks. “Which model handles the Dev prompt best?” becomes a measurable question, not a vibe.

User Stories

Who uses this project, what they need, and how we measure success. These stories drive the phased delivery and prompt evaluation priorities.

# Role Story Success Metric
1 Developer I run pac on CLI and get the same experience — MCP servers loaded, hooks enforced, CLAUDE.md read, Betty Sue personality active pac starts Goose session with forgejo-mcp + pal-e-docs-mcp connected, ~/.claude/CLAUDE.md parsed, personality prompt applied
2 Developer I can spawn a dev agent that reads a Forgejo issue and submits a working PR using a local model End-to-end: Forgejo issue → local Qwen model → code changes → PR submitted → passes ruff hook. On a real repo, not a toy example
3 Developer I can spawn a QA agent that reviews a PR with structured findings using a local model QA reads PR diff via forgejo-mcp, posts structured review comment with VERDICT line, triggers label hook
4 Developer I know which local model handles each agent role best Promptfoo evaluation suite with scored results per role (coordinator, dev, QA, doc). Results published as reference note in pal-e-docs
5 Developer My enforcement hooks work regardless of which agent framework runs spawn gate, ruff check, PR template check, Closes #N check — all functional under Goose or ported equivalent
6 Developer I have graceful degradation — best available model, not all-or-nothing Config supports: Claude API (when funded) → Qwen-8B (local GPU) → Qwen-4B (fallback, always fits)
7 Strategic The system has zero vendor lock-in in the enforcement/SOP layer Every prompt in claude-custom has a tested local-model variant. No Claude-specific API calls in hooks or MCP servers

Plan

Active plan: plan-pal-e-pac — Sovereign Development Experience

Completed plans: none (new project)

Board

Board: board-pal-e-pac

Status

Phase 3c COMPLETED (2026-03-14): MCP bridge wired (PR #8). Two hotfixes pushed to main: MultiServerMCPClient API change (no async context manager), tool_name_prefix=True for server-prefixed tool names, async graph invocation, permissions.yaml patterns updated to single underscore.

Phase 3 BLOCKER (2026-03-14): forgejo-sdk token auth not published. Source at ~/forgejo-sdk/ has token support but Forgejo PyPI registry has old version (password-only). MCP subprocess pulls old SDK via uv run. Fix: push forgejo-sdk to trigger CI publish. Until then, all MCP calls return 401.

Qwen3.5 baseline progress: Full pipeline proven: 51 tools loaded from 2 MCP servers, Qwen3.5-4B generates structured tool calls (picks forgejo_list_issues correctly), safety blocks writes (create_api_token blocked), audit logs written. Qwen3-4B still can't generate structured tool calls (text-only). Qwen3.5 is a massive upgrade for tool use.

Previous: Phase 3b COMPLETED — LangGraph StateGraph (PR #6). Phase 2 COMPLETED — SafetyLayer (PR #2). Phase 1 COMPLETED — Goose smoke test.

Milestones

None yet.

Architecture

Three views of the system:

  1. Domain Model — Agent Framework, Model Backend, Prompt Library, MCP Connectors, Evaluation Suite
  2. Data Flow — User → CLI → LangGraph StateGraph → Ollama → Qwen → tool calls → MCP → Forgejo/pal-e-docs
  3. Deployment — Local machine (LangGraph + Ollama) ↔ k8s cluster (MCP servers, Forgejo, pal-e-docs, Harbor)

Cross-project dependencies:

Project What pal-e-pac reads/uses Direction
pal-e-agency SOPs, conventions, agent definitions, hooks (claude-custom repo) Reads. Ports prompts for local-model compatibility. Does NOT modify agency process.
pal-e-platform Ollama deployment (Phase 6a), GPU resources (GTX 1070, 8GB VRAM) Uses. Ollama already deployed. No new platform work needed.
pal-e-docs MCP server (pal-e-docs-mcp), knowledge base Uses. Connects MCP server via langchain-mcp-adapters. No pal-e-docs changes needed.

Key architectural decisions:

Decision Rationale
LangGraph replaces Goose (Phase 3 pivot) Phase 3 baselines proved Goose's generic orchestration can't compensate for Qwen3-4B's weaknesses (scored 28/100). LangGraph gives us deterministic state machines where the graph controls flow and the model fills slots.
CLI-first, pac command (typer + rich + prompt_toolkit) Same UX as claude. Python CLI with interactive REPL mode. uv for packaging. No IDE dependency.
LangGraph over LangChain chains We need a state machine, not a chain. The model doesn't decide the path — we do. Route → select → model → correct → safety → call → respond. Deterministic graph with model filling slots.
langchain-mcp-adapters for MCP bridge Our MCP servers are already built (forgejo-mcp, pal-e-docs-mcp, woodpecker-mcp). The adapter converts MCP tools to LangChain tools. Zero MCP server changes needed.
Read ~/.claude for compatibility CLAUDE.md, hooks, agent configs already exist and are maintained. Don't duplicate — read the same source of truth.
GTX 1070 8GB is the hard constraint No hardware upgrades. Design for what we have. Qwen3-4B always fits, Qwen3-8B fits when embedding model paused.
Benchmark-driven development (Phase 1 finding) Define competence benchmarks BEFORE porting agents. TDD for AI. Smoke test proved 4B models need explicit guidance — benchmarks quantify the gap and track progress.
Safety guardrails before capability (Phase 1 finding) Qwen-4B called create_api_token unprompted. Weaker models are MORE dangerous with unrestricted MCP access. Permission system is a prerequisite.
Read-only testing until dev cluster Only one cluster (production). No write operations through untested models against production MCP servers.
Promptfoo for systematic evaluation YAML configs, CI-able, compares models on identical prompts. 29 test cases, 8 categories, proven baseline data.
NOT a fifth pillar pal-e-pac is a project that consumes all three pillars (Platform, Docs, Agency). It doesn't define organizational structure — it ensures the operating model survives.
Graceful degradation over hard cutover Claude when available → Qwen-8B when not → Qwen-4B as floor. Multi-tier, not all-or-nothing.

Repos

Repo Platform Role Status
pal-e-pac Forgejo Goose extensions + pac CLI wrapper + benchmarks (extension-first, NOT a fork) Created 2026-03-14. Issue #1 open (Phase 2: SafetyLayer).
claude-custom Forgejo Hooks/configs to port (read-only from pac's perspective) Exists (owned by pal-e-agency)

Inbox

All work scoped into plan phases. Query: list_notes(project="pal-e-pac", note_type="todo", status="open")

Slug Summary Discovered