Skip to content
MCP profile

Github RightNow AI Forge Mcp Server

Turn PyTorch into fast CUDA/Triton kernels on real datacenter GPUs with up to 14x speedup.

Developer ToolsPackageJavaScript/TypeScriptOpen SourceExternal
Last updated
March 16, 2026
Visibility
Public
ByRegistry

About This MCP Server


Forge MCP Server

Swarm agents that turn slow PyTorch into fast CUDA/Triton kernels, from any AI coding agent.

Forge transforms PyTorch models into production-grade CUDA/Triton kernels through automated multi-agent optimization. Using 32 parallel AI agents with inference-time scaling, it achieves up to 14x faster inference than torch.compile(mode='max-autotune-no-cudagraphs') while maintaining 100% numerical correctness.

This MCP server connects any MCP-compatible AI coding agent to Forge. Your agent submits PyTorch code, Forge optimizes it with swarm agents on real datacenter GPUs, and returns the fastest kernel as a drop-in replacement.

1. Authenticate: The agent calls forge_auth, which opens your browser. Sign in once, tokens are stored locally at ~/.forge/tokens.json and auto-refresh. 2. Optimize: The agent sends your PyTorch code via forge_optimize. The MCP server POSTs to the Forge API and streams SSE events in real time. 3. Benchmark: 32 parallel Coder+Judge agents generate kernels, compile them, test correctness against the PyTorch reference, and profile performance on real datacenter GPUs. 4. Return: The MCP server collects all results and returns the optimized code, speedup metrics, and iteration history. The output is a drop-in replacement for your original code.

Each optimization costs 1 credit. Credits are only charged for successful runs (speedup >= 1.1x). Failed runs and cancelled jobs are not charged.

Capabilities
Optimize existing kernels - Submit PyTorch code, get back an optimized Triton/CUDA kernel benchmarked against torch.compile(max-autotune)Generate new kernels - Describe an operation (e.g. "fused LayerNorm + GELU + Dropout"), get a production-ready optimized kernel32 parallel swarm agents - Coder+Judge agent pairs compete to discover optimal kernels, exploring tensor core utilization, memory coalescing, shared memory tiling, and kernel fusion simultaneouslyReal datacenter GPU benchmarking - Every kernel is compiled, tested for correctness, and profiled on actual datacenter hardware250k tokens/sec inference - Results in minutes, not hoursSmart detection - The agent automatically recognizes when your code would benefit from GPU optimizationOne-click auth - Browser-based OAuth sign-in. No API keys to manage.

Specifications

Status
live
Industry
Developer Tools
Category
General
Server type
Package
Language
JavaScript/TypeScript
License
Open Source
Verified
Yes

Hosting


Hosting Options

  • Package

API


Integrate this server into your application. Choose a connection method below.

1

Configure

Configuration
json
{
  "mcpServers": {
    "forge": {
      "command": "npx",
      "args": ["-y", "@rightnow/forge-mcp-server"]
    }
  }
}

Performance


Usage


Quick Reference


Name
Github RightNow AI Forge Mcp Server
Function
Turn PyTorch into fast CUDA/Triton kernels on real datacenter GPUs with up to 14x speedup.
Transport
Package
Language
JavaScript/TypeScript
Source
External (Registry)
License
Open Source
Get started

Ready to integrate this MCP server?

Book a demo to see how this server fits your workflow, or explore the full catalog.

Related MCP Servers


Catalog Workspace

Discover agents, MCP servers, and skills in one governed surface

Use structured catalog views to compare readiness, ownership, integrations, and deployment posture before rollout.