Skip to content
MCP profile

GhostQA

AI personas navigate your web app in real browsers, find bugs and UX issues. No scripts needed.

Content & MediaPackagePythonOpen SourceExternal
Last updated
March 16, 2026
Visibility
Public
ByRegistry

About This MCP Server


Traditional E2E tests are brittle. You write selectors, they break. You maintain scripts, they rot. SpecterQA takes a different approach: AI vision models look at your actual UI and navigate it the way a person would.

You define personas (who is using your app) and journeys (what they're trying to do). SpecterQA's engine takes a screenshot, sends it to a Claude vision model, gets back a decision ("click this button", "fill this field"), executes it via Playwright, takes another screenshot, and repeats until the goal is achieved or something goes wrong.

When something goes wrong, you get evidence: screenshots, UX observations, cost breakdowns, and findings categorized by severity.

The core loop is simple:

1. Screenshot -- Playwright captures the current page state as a PNG 2. Decide -- A Claude vision model receives the screenshot + persona context + goal, returns a structured JSON action (click, fill, navigate, scroll, keyboard, wait, done, or stuck) 3. Execute -- Playwright performs the action (click at coordinates, type text, navigate to URL, etc.) 4. Repeat -- Loop until the goal is achieved, the agent gets stuck, or the budget runs out

The persona's profile shapes how the AI behaves. A "tech-savvy developer" explores differently than a "frustrated first-time user." Persona patience, tech comfort, and frustrations all influence the system prompt.

Capabilities
Persona-based testing -- Define AI users with backgrounds, goals, frustrations, and tech comfort levels. They don't just follow scripts; they react to what they see.Vision-powered -- No selectors, no DOM queries. The AI interprets screenshots like a human would. Catches visual/layout issues that selector-based tests miss entirely.YAML-configured -- Products, personas, and journeys are all YAML files. PMs can read them. No code to maintain.Budget enforcement -- Per-run, per-day, and per-month cost caps. The engine hard-stops if you hit the limit. No surprise bills.JUnit XML output -- Drop --junit-xml results.xml and plug it into any CI system.Tiered model routing -- Haiku for cheap navigation, Sonnet for complex reasoning, optional local Ollama for zero-cost simple actions.Multi-platform -- Web apps (via Playwright), macOS native apps (via Accessibility API + pyobjc), iOS Simulator (via simctl). Same YAML format, different runners.Evidence collection -- Every run produces screenshots, a findings report, cost breakdown, and a structured JSON result. Everything is saved to an evidence directory.Stuck detection -- If the AI repeats the same action or the UI stops changing, the engine escalates to a stronger model, then aborts if nothing works. No infinite loops.Template variables -- Use {{persona.credentials.email}} in your journey steps. Variables resolve from persona configs at runtime.Precondition checks -- Verify services are up before running tests. Fail fast with clear errors instead of wasting API calls.

Tools & Endpoints1

Example Workflow

You'll need an Anthropic API key:

That's it. Three commands and an API key.

Why Use GhostQA?

  • Persona-based testing -- Define AI users with backgrounds, goals, frustrations, and tech comfort levels. They don't just follow scripts; they react to what they see.
  • Vision-powered -- No selectors, no DOM queries. The AI interprets screenshots like a human would. Catches visual/layout issues that selector-based tests miss entirely.
  • YAML-configured -- Products, personas, and journeys are all YAML files. PMs can read them. No code to maintain.
  • Budget enforcement -- Per-run, per-day, and per-month cost caps. The engine hard-stops if you hit the limit. No surprise bills.
  • JUnit XML output -- Drop --junit-xml results.xml and plug it into any CI system.
  • Tiered model routing -- Haiku for cheap navigation, Sonnet for complex reasoning, optional local Ollama for zero-cost simple actions.
  • Multi-platform -- Web apps (via Playwright), macOS native apps (via Accessibility API + pyobjc), iOS Simulator (via simctl). Same YAML format, different runners.
  • Evidence collection -- Every run produces screenshots, a findings report, cost breakdown, and a structured JSON result. Everything is saved to an evidence directory.
  • Stuck detection -- If the AI repeats the same action or the UI stops changing, the engine escalates to a stronger model, then aborts if nothing works. No infinite loops.
  • Template variables -- Use {{persona.credentials.email}} in your journey steps. Variables resolve from persona configs at runtime.
  • Precondition checks -- Verify services are up before running tests. Fail fast with clear errors instead of wasting API calls.

Limitations

  • Requires an Anthropic API key. No API key, no testing. There's no free tier built into SpecterQA itself.
  • Costs money. Every run makes API calls. A typical 3-step journey costs $0.30-0.60. Budget enforcement prevents surprises, but the meter is always running.
  • Vision models aren't perfect. The AI sometimes misreads small text, clicks the wrong element, or gets confused by complex layouts. It's good, not infallible. You'll occasionally see false positives and false negatives.
  • Not a replacement for unit tests. SpecterQA tests behavioral UX flows. It doesn't test your business logic, data integrity, or edge case handling. Use it alongside your existing test suite, not instead of it.
  • macOS native testing requires pyobjc. The specterqa[native] extra pulls in pyobjc packages (~200MB). Only needed for native macOS and iOS Simulator testing.
  • Alpha software. Version 0.4.0. APIs may change. File structure may change. Expect rough edges.
  • Single-persona per journey (for now). Multi-persona concurrent testing (e.g., simulating a chat between two users) is on the roadmap but not yet supported.
  • Deterministic reproduction is hard. Because the AI makes decisions at runtime, the exact sequence of actions varies between runs. Same journey, same persona, slightly different clicks. This is by design (it catches more issues) but makes exact reproduction tricky.

Specifications

Status
live
Industry
Content & Media
Category
General
Server type
Package
Language
Python
License
Open Source
Verified
Yes

Requirements

  • SpecterQA is distributed via PyPI and requires Python 3.10 or later.
  • After installing, download the Playwright browser binaries:
  • For macOS native app testing and iOS Simulator support, install the optional native extra:
  • For MCP server support (integrating SpecterQA as a tool in Claude Desktop, Cursor, or other MCP clients):
  • You will also need an Anthropic API key to run tests:
  • To verify the installation:

Hosting


Hosting Options

  • Package

API


Integrate this server into your application. Choose a connection method below.

1

Install

Install command
Python
pip install specterqa

Performance


Usage


Quick Reference


Name
GhostQA
Function
AI personas navigate your web app in real browsers, find bugs and UX issues. No scripts needed.
Available Tools
For schema definitions, type stubs, and federated protocol details, see docs/for-agents.md.
Transport
Package
Language
Python
Install
pip install specterqa
Source
External (Registry)
License
Open Source
Get started

Ready to integrate this MCP server?

Book a demo to see how this server fits your workflow, or explore the full catalog.

Related MCP Servers


Catalog Workspace

Discover agents, MCP servers, and skills in one governed surface

Use structured catalog views to compare readiness, ownership, integrations, and deployment posture before rollout.