MCP Robot Control - Setup & Usage Guide¶
Complete guide for setting up and using natural language robot control with FastMCP and multi-LLM support.
Table of Contents¶
- Overview
- Quick Start
- Installation
- Configuration
- Usage Modes
- Available LLM Providers
- Common Tasks
- Troubleshooting
Overview¶
The Robot MCP system enables natural language control of robotic arms (Niryo Ned2, WidowX) using:
- FastMCP Server - Exposes robot control tools via HTTP/SSE
- Universal Client - Supports OpenAI, Groq, Gemini, and Ollama
- Vision System - Real-time object detection
- Web Interface - Gradio GUI with voice input
System Architecture¶
βββββββββββββββ ββββββββββββββββ βββββββββββββββ
β Multi- β HTTP β β Python β β
β LLM ββββββββββΊβ FastMCP ββββββββββΊβ Niryo/ β
β (OpenAI/ β SSE β Server β API β WidowX β
β Groq/Gemini)β β β β β
βββββββββββββββ ββββββββββββββββ βββββββββββββββ
β² β
β Natural Language Physical β
β Commands Actions β
ββββ΄βββ βββββΌββββ
βUser β βObjectsβ
βββββββ βββββββββ
Quick Start¶
Prerequisites¶
# System requirements
- Python 3.8+
- Redis server
- Niryo Ned2 or WidowX robot (or simulation)
- At least one LLM API key (OpenAI, Groq, or Gemini) OR Ollama installed
3-Step Setup¶
Step 1: Install Dependencies
git clone https://github.com/dgaida/robot_mcp.git
cd robot_mcp
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -e .
Step 2: Configure API Keys
# OpenAI (best reasoning)
OPENAI_API_KEY=sk-xxxxxxxxxxxxxxxx
# Groq (fastest, free tier available)
GROQ_API_KEY=gsk-xxxxxxxxxxxxxxxx
# Google Gemini (long context)
GEMINI_API_KEY=AIzaSy-xxxxxxxxxxxxxxxx
# Ollama - No API key needed (runs locally)
# Just install: curl -fsSL https://ollama.ai/install.sh | sh
Step 3: Start System
# Terminal 1: Start Redis
docker run -p 6379:6379 redis:alpine
# Terminal 2: Start FastMCP Server
python server/fastmcp_robot_server.py --robot niryo
# Terminal 3: Run Universal Client (auto-detects available API)
python client/fastmcp_universal_client.py
You're ready! The client will automatically use the first available LLM provider (priority: OpenAI > Groq > Gemini > Ollama).
Installation¶
Standard Installation¶
# Clone repository
git clone https://github.com/dgaida/robot_mcp.git
cd robot_mcp
# Create virtual environment
python -m venv venv
source venv/bin/activate
# Install package
pip install -e .
Dependencies Included¶
fastmcp- Modern MCP implementationopenai- OpenAI API clientgroq- Groq API clientgoogle-generativeai- Gemini API clientollama- Local LLM supportrobot-environment- Robot control (from GitHub)text2speech- TTS integration (from GitHub)gradio- Web interface
Configuration¶
Environment Variables¶
Create secrets.env in project root:
# LLM API Keys (add at least one)
OPENAI_API_KEY=sk-proj-xxxxxxxx
GROQ_API_KEY=gsk-xxxxxxxx
GEMINI_API_KEY=AIzaSy-xxxxxxxx
# Optional: ElevenLabs for better TTS
ELEVENLABS_API_KEY=your_key_here
Server Configuration¶
Start server with options:
# Real Niryo robot
python server/fastmcp_robot_server.py --robot niryo --no-simulation
# Simulated robot
python server/fastmcp_robot_server.py --robot niryo
# WidowX robot
python server/fastmcp_robot_server.py --robot widowx --no-simulation
# Custom host/port
python server/fastmcp_robot_server.py --host 0.0.0.0 --port 8080
# Disable camera (testing)
python server/fastmcp_robot_server.py --no-camera
# Verbose logging
python server/fastmcp_robot_server.py --verbose
Client Configuration¶
Universal client with auto-detection:
# Auto-detect API (prefers OpenAI > Groq > Gemini > Ollama)
python client/fastmcp_universal_client.py
# Force specific provider
python client/fastmcp_universal_client.py --api openai --model gpt-4o
python client/fastmcp_universal_client.py --api groq
python client/fastmcp_universal_client.py --api gemini --model gemini-2.0-flash
python client/fastmcp_universal_client.py --api ollama --model llama3.2:1b
# Single command mode
python client/fastmcp_universal_client.py --command "What objects do you see?"
# Adjust parameters
python client/fastmcp_universal_client.py --temperature 0.5 --max-tokens 2048
π Usage Modes¶
1. Interactive Chat Mode (Default)¶
Best for: Exploration, learning, development
python client/fastmcp_universal_client.py
π€ ROBOT CONTROL ASSISTANT (Universal LLM)
Using: OPENAI - gpt-4o-mini
You: What objects do you see?
π§ Calling tool: get_detected_objects
β Result: Detected 3 objects...
π€ Assistant: I can see 3 objects:
1. A pencil at coordinates [0.15, -0.05]
2. A red cube at [0.20, 0.10]
3. A blue square at [0.18, -0.10]
You: Move the pencil next to the red cube
π§ Calling tool: pick_place_object
β Result: Successfully picked and placed
π€ Assistant: Done! I've placed the pencil to the right of the red cube.
# Special commands:
You: tools # List available tools
You: clear # Clear conversation history
You: switch # Switch LLM provider
You: quit # Exit
2. Single Command Mode¶
Best for: Scripting, automation, testing
# Execute one command
python client/fastmcp_universal_client.py --command "Sort objects by size"
# Batch script
#!/bin/bash
commands=(
"What objects do you see?"
"Move the largest object to [0.2, 0.0]"
"Arrange all objects in a line"
)
for cmd in "${commands[@]}"; do
python client/fastmcp_universal_client.py --command "$cmd"
sleep 2
done
3. Gradio Web Interface¶
Best for: User-friendly interaction, demonstrations
Features: - π¬ Chat interface with robot - πΉ Live camera feed with object annotations - π€ Voice input (Whisper-based) - π System status monitoring - π Switch LLM providers on-the-fly
4. Example Scripts¶
Best for: Learning, templates
# Run specific example
python examples/universal_examples.py workspace_scan
# Run all examples
python examples/universal_examples.py all
# Compare LLM providers
python examples/universal_examples.py compare_providers
5. Claude Desktop Integration¶
Best for: Using with Claude's interface
Add to Claude Desktop, restart, and use tools directly in Claude!
Available LLM Providers¶
Provider Comparison¶
| Provider | Function Calling | Speed | Cost | Offline | Best For |
|---|---|---|---|---|---|
| OpenAI | β Excellent | Fast | $$ | β | Production, complex reasoning |
| Groq | β Excellent | Very Fast | Free tier | β | Development, prototyping |
| Gemini | β Excellent | Fast | Free tier | β | Long context, multimodal |
| Ollama | β οΈ Limited | Variable | Free | β | Local testing, privacy |
Recommended Models¶
For Complex Tasks:
# OpenAI - Best reasoning
--api openai --model gpt-4o
# Groq - Fastest inference
--api groq --model moonshotai/kimi-k2-instruct-0905
For Development:
# OpenAI - Fast and cheap
--api openai --model gpt-4o-mini
# Groq - Free and fast
--api groq --model llama-3.3-70b-versatile
For Local/Offline:
Provider Auto-Detection¶
If you have multiple API keys configured, the client uses this priority:
- OpenAI (if
OPENAI_API_KEYset) - Groq (if
GROQ_API_KEYset) - Gemini (if
GEMINI_API_KEYset) - Ollama (fallback, no key needed)
Override with --api flag:
Switching Providers¶
During Interactive Session:
You: switch
π Current provider: GROQ
Available: openai, groq, gemini, ollama
Switch to: openai
β Switched to OPENAI - gpt-4o-mini
Common Tasks¶
Basic Operations¶
1. Scan Workspace
2. Simple Pick and Place
3. Relative Placement
Advanced Tasks¶
4. Sort by Size
5. Create Patterns
6. Group by Color
Complex Workflows¶
7. Multi-Step Task
You: Execute: 1) Find all objects 2) Move smallest to [0.15, 0.1]
3) Move largest right of smallest 4) Report positions
8. Conditional Logic
9. Workspace Cleanup
You: Organize the workspace: cubes on left, cylinders in middle,
everything else on right, aligned in rows
Troubleshooting¶
Server Won't Start¶
Problem: Port 8000 already in use
# Check what's using the port
lsof -i :8000 # Linux/Mac
netstat -ano | findstr :8000 # Windows
# Kill the process
kill -9 <PID> # Linux/Mac
taskkill /PID <PID> /F # Windows
Problem: Redis connection error
# Start Redis
docker run -p 6379:6379 redis:alpine
# Or install locally
# Linux: sudo apt install redis-server
# Mac: brew install redis
Client Can't Connect¶
Problem: "Connection refused"
Solutions: 1. Verify server is running:
-
Check firewall settings
-
Ensure server started successfully (check logs in
log/directory)
API Key Issues¶
Problem: "Invalid API key"
Solutions:
1. Verify API key in secrets.env:
-
Test API key directly:
-
Regenerate key:
- OpenAI: https://platform.openai.com/api-keys
- Groq: https://console.groq.com/keys
- Gemini: https://aistudio.google.com/apikey
LLM Not Calling Tools¶
Problem: LLM responds in text only, no robot actions
Solutions: 1. Verify tools are registered:
-
Use specific commands:
-
Try different model (some better at tool calling):
No Objects Detected¶
Problem: get_detected_objects() returns empty
Solutions: 1. Move to observation pose first:
- Check camera:
- Is camera connected?
- Is Redis running?
-
Check camera feed in Gradio GUI
-
Verify lighting and object visibility
-
Check object labels:
Slow Performance¶
Problem: Long response times
Solutions: 1. Use faster model:
-
Clear conversation history:
-
Reduce detection frequency (edit server):
Common Error Messages¶
Error: "Maximum iterations reached"
β Task too complex, break into smaller steps
Error: "Object not found"
β Verify object name matches detection exactly (case-sensitive)
Error: "Coordinates out of bounds"
β Valid range: X=[0.163, 0.337], Y=[-0.087, 0.087]
Error: "Rate limit exceeded"
β Wait 60 seconds or upgrade API plan
Best Practices¶
1. Always Detect Before Manipulating¶
β
Good: First ask "What objects do you see?"
Then use coordinates from detection
β Bad: Assuming coordinates without checking
2. Use Exact Label Matching¶
3. Provide Clear Instructions¶
β
Good: "Pick up the pencil at [0.15, -0.05] and place it at [0.2, 0.1]"
β Bad: "Move that thing over there"
4. Check for Success¶
β
Good: After action, ask "Did that work?" or "Show me the result"
β Bad: Assuming success without verification
5. Use Safe Placement¶
β
Good: "Place object in a safe location" (LLM will find free space)
β Bad: Hard-coded coordinates that might collide
Quick Reference¶
Essential Commands¶
# Start system
docker run -p 6379:6379 redis:alpine
python server/fastmcp_robot_server.py --robot niryo
python client/fastmcp_universal_client.py
# Test connection
curl http://127.0.0.1:8000/sse
# View logs
tail -f log/mcp_server_*.log
# Stop server
# Press Ctrl+C in server terminal
Natural Language Examples¶
"What objects do you see?"
"Pick up the pencil"
"Move the red cube next to the blue square"
"Sort all objects by size"
"Arrange objects in a triangle"
"What's the largest object?"
"Place the smallest object in the center"
Interactive Commands¶
tools - List available tools
clear - Clear conversation history
switch - Switch LLM provider
quit - Exit interactive mode
π― Use Cases¶
1. Research & Development¶
- Rapid prototyping of robot behaviors
- Testing manipulation strategies
- Human-robot interaction studies
2. Education¶
- Teaching robotics concepts
- Demonstrating AI integration
- Student projects
3. Industrial Automation¶
- Pick-and-place tasks
- Quality control sorting
- Assembly line operations
4. Warehouse & Logistics¶
- Object sorting
- Inventory management
- Package handling
5. Assistive Robotics¶
- Object retrieval
- Workspace organization
- Personalized assistance
Getting Help¶
Resources: - API Reference & Architecture - Complete API documentation - GitHub Issues - Report bugs - Example Scripts - See working examples - MCP Documentation: https://modelcontextprotocol.io
Before Opening an Issue:
- [ ] Redis is running
- [ ] Server started successfully (check logs)
- [ ] At least one API key configured
- [ ] Client can connect (test with curl)
- [ ] Objects are visible to camera
- [ ] Tried examples first
Next Steps¶
- β Complete Quick Start setup
- β Try interactive mode with basic commands
- β Run example scripts to see capabilities
- β Explore different LLM providers
- β Try Gradio web interface
- β Review API Reference for advanced features
- β Create your own automation scripts
Happy robot commanding! π€β¨