Talks and presentations

Natural-Language Robot Manipulation via MCP: An Integrated Framework for Vision-Guided Pick-and-Place Automation

March 20, 2026

Conference proceedings talk, 9th Machine Learning for Cyber Physical Systems (ML4CPS) Conference, Berlin, Germany

This work presents a unified software framework for controlling robotic manipulators through unconstrained natural language, built around the Model Context Protocol (MCP). Users can issue commands via text, voice, or a web GUI, which are interpreted by a large language model that decomposes instructions into structured tool calls executed by a FastMCP server. The system integrates real-time open-vocabulary object detection, instance segmentation, and workspace coordinate transforms to ground language in the physical scene, enabling complex pick-and-place operations. A Redis-based pub/sub architecture decouples perception, reasoning, and control into independent processes, while a platform-agnostic hardware abstraction supports both the Niryo Ned2 and WidowX-250 robotic arms. Evaluation across eight diverse manipulation tasks — including multi-step operations, spatial reasoning, and multilingual instructions — achieved an 83% success rate on a Niryo Ned2, with the primary bottleneck being open-vocabulary perception rather than LLM reasoning or motion control. All components are publicly available, providing a reproducible foundation for natural-language interfaces to cyber-physical systems.

Entwicklung intelligenter autonomer Assistenten

June 28, 2025

Tutorial, Konrad-Adenauer-Stiftung: 22. Fachschaftstreffen MINT - Zukunft gestalten durch KI, Dresden, Germany

The tutorial provides an overview of the development of intelligent autonomous assistants, focusing on how multimodal AI technologies—such as large language models, computer vision, and robotics—enable systems to understand language, perceive their environment, plan actions, and manipulate objects. It discusses practical applications in Industry 5.0, logistics, laboratories, smart homes, and agriculture, and contrasts modular system designs with end-to-end vision-language-action models. Finally, it highlights current research challenges, including data scarcity, simulation-to-reality transfer, system speed, and robustness in complex real-world environments.

Synergizing Language Models and Biogas Plant Control: A GPT-4 Approach

June 04, 2024

Talk, 8th IWA World Conference on Anaerobic Digestion, Istanbul, Turkey

This study delves into the utilization of the large language model, GPT-4, as a controller to optimize substrate feed in an agricultural anaerobic co-digestion plant. Assigned with specific objectives, including targeted methane production, GPT-4 harnesses knowledge encompassing plant parameters, substrate characteristics, and real-time process data. The model formulates recommendations for substrate feed, offering transparent rationales for its decisions. To evaluate its effectiveness, a simulation model of an agricultural anaerobic co-digestion plant based on the Anaerobic Digestion Model no. 1 is employed. Initial findings suggest that GPT-4 effectively regulates substrate feed, maintaining methane production rates near predefined targets. Crucially, the explanations provided by GPT-4 are comprehensible. The accompanying code will be made accessible for further investigation and exploration.

Daniel Gaida

Talks and presentations

Natural-Language Robot Manipulation via MCP: An Integrated Framework for Vision-Guided Pick-and-Place Automation

Entwicklung intelligenter autonomer Assistenten

Synergizing Language Models and Biogas Plant Control: A GPT-4 Approach