pal-e-projects

project-page active pal-e-pac

project-pal-e-pac updated 2026-03-15

Project: pal-e-pac

Vision

The development experience that lives forever. A LangGraph-based CLI agent framework that replicates the DORA Elite AI Enterprise operating model using local models (Ollama + Qwen) as the sustainable foundation — so Lucas can continue developing with Betty Sue, Dev, QA, and Dottie regardless of Claude Code availability, cost, or policy. 2-pac lives forever.

Thesis: Benchmark-driven development. Define competence as 29 measurable test cases across 8 categories (tool selection, parameter correctness, safety compliance, instruction following, code generation, PR review, doc operations, multi-step reasoning). Every phase is validated against these benchmarks. “Which model handles the Dev prompt best?” becomes a measurable question, not a vibe.

User Stories

Who uses this project, what they need, and how we measure success. These stories drive the phased delivery and prompt evaluation priorities.

#	Role	Story	Success Metric
1	Developer	I run `pac` on CLI and get the same experience — MCP servers loaded, hooks enforced, CLAUDE.md read, Betty Sue personality active	`pac` starts Goose session with forgejo-mcp + pal-e-docs-mcp connected, ~/.claude/CLAUDE.md parsed, personality prompt applied
2	Developer	I can spawn a dev agent that reads a Forgejo issue and submits a working PR using a local model	End-to-end: Forgejo issue → local Qwen model → code changes → PR submitted → passes ruff hook. On a real repo, not a toy example
3	Developer	I can spawn a QA agent that reviews a PR with structured findings using a local model	QA reads PR diff via forgejo-mcp, posts structured review comment with VERDICT line, triggers label hook
4	Developer	I know which local model handles each agent role best	Promptfoo evaluation suite with scored results per role (coordinator, dev, QA, doc). Results published as reference note in pal-e-docs
5	Developer	My enforcement hooks work regardless of which agent framework runs	spawn gate, ruff check, PR template check, Closes #N check — all functional under Goose or ported equivalent
6	Developer	I have graceful degradation — best available model, not all-or-nothing	Config supports: Claude API (when funded) → Qwen-8B (local GPU) → Qwen-4B (fallback, always fits)
7	Strategic	The system has zero vendor lock-in in the enforcement/SOP layer	Every prompt in claude-custom has a tested local-model variant. No Claude-specific API calls in hooks or MCP servers

Plan

Active plan: plan-pal-e-pac — Sovereign Development Experience

Completed plans: none (new project)

Board

Board: board-pal-e-pac

Status

Phase 3c COMPLETED (2026-03-14): MCP bridge wired (PR #8). Two hotfixes pushed to main: MultiServerMCPClient API change (no async context manager), tool_name_prefix=True for server-prefixed tool names, async graph invocation, permissions.yaml patterns updated to single underscore.

Phase 3 BLOCKER (2026-03-14): forgejo-sdk token auth not published. Source at ~/forgejo-sdk/ has token support but Forgejo PyPI registry has old version (password-only). MCP subprocess pulls old SDK via uv run. Fix: push forgejo-sdk to trigger CI publish. Until then, all MCP calls return 401.

Qwen3.5 baseline progress: Full pipeline proven: 51 tools loaded from 2 MCP servers, Qwen3.5-4B generates structured tool calls (picks forgejo_list_issues correctly), safety blocks writes (create_api_token blocked), audit logs written. Qwen3-4B still can't generate structured tool calls (text-only). Qwen3.5 is a massive upgrade for tool use.

Previous: Phase 3b COMPLETED — LangGraph StateGraph (PR #6). Phase 2 COMPLETED — SafetyLayer (PR #2). Phase 1 COMPLETED — Goose smoke test.

Milestones

None yet.

Architecture

Three views of the system:

Domain Model — Agent Framework, Model Backend, Prompt Library, MCP Connectors, Evaluation Suite
Data Flow — User → CLI → LangGraph StateGraph → Ollama → Qwen → tool calls → MCP → Forgejo/pal-e-docs
Deployment — Local machine (LangGraph + Ollama) ↔ k8s cluster (MCP servers, Forgejo, pal-e-docs, Harbor)

Cross-project dependencies:

Project	What pal-e-pac reads/uses	Direction
pal-e-agency	SOPs, conventions, agent definitions, hooks (claude-custom repo)	Reads. Ports prompts for local-model compatibility. Does NOT modify agency process.
pal-e-platform	Ollama deployment (Phase 6a), GPU resources (GTX 1070, 8GB VRAM)	Uses. Ollama already deployed. No new platform work needed.
pal-e-docs	MCP server (pal-e-docs-mcp), knowledge base	Uses. Connects MCP server via langchain-mcp-adapters. No pal-e-docs changes needed.

Key architectural decisions:

Decision	Rationale
LangGraph replaces Goose (Phase 3 pivot)	Phase 3 baselines proved Goose's generic orchestration can't compensate for Qwen3-4B's weaknesses (scored 28/100). LangGraph gives us deterministic state machines where the graph controls flow and the model fills slots.
CLI-first, `pac` command (typer + rich + prompt_toolkit)	Same UX as `claude`. Python CLI with interactive REPL mode. `uv` for packaging. No IDE dependency.
LangGraph over LangChain chains	We need a state machine, not a chain. The model doesn't decide the path — we do. Route → select → model → correct → safety → call → respond. Deterministic graph with model filling slots.
langchain-mcp-adapters for MCP bridge	Our MCP servers are already built (forgejo-mcp, pal-e-docs-mcp, woodpecker-mcp). The adapter converts MCP tools to LangChain tools. Zero MCP server changes needed.
Read ~/.claude for compatibility	CLAUDE.md, hooks, agent configs already exist and are maintained. Don't duplicate — read the same source of truth.
GTX 1070 8GB is the hard constraint	No hardware upgrades. Design for what we have. Qwen3-4B always fits, Qwen3-8B fits when embedding model paused.
Benchmark-driven development (Phase 1 finding)	Define competence benchmarks BEFORE porting agents. TDD for AI. Smoke test proved 4B models need explicit guidance — benchmarks quantify the gap and track progress.
Safety guardrails before capability (Phase 1 finding)	Qwen-4B called create_api_token unprompted. Weaker models are MORE dangerous with unrestricted MCP access. Permission system is a prerequisite.
Read-only testing until dev cluster	Only one cluster (production). No write operations through untested models against production MCP servers.
Promptfoo for systematic evaluation	YAML configs, CI-able, compares models on identical prompts. 29 test cases, 8 categories, proven baseline data.
NOT a fifth pillar	pal-e-pac is a project that consumes all three pillars (Platform, Docs, Agency). It doesn't define organizational structure — it ensures the operating model survives.
Graceful degradation over hard cutover	Claude when available → Qwen-8B when not → Qwen-4B as floor. Multi-tier, not all-or-nothing.

Repos

Repo	Platform	Role	Status
pal-e-pac	Forgejo	Goose extensions + pac CLI wrapper + benchmarks (extension-first, NOT a fork)	Created 2026-03-14. Issue #1 open (Phase 2: SafetyLayer).
claude-custom	Forgejo	Hooks/configs to port (read-only from pac's perspective)	Exists (owned by pal-e-agency)

Inbox

All work scoped into plan phases. Query: list_notes(project="pal-e-pac", note_type="todo", status="open")

Slug	Summary	Discovered