I am working on a framework that enables robots to follow natural-language instructions by combining large language models with real-time perception, spatial reasoning, and robot control through a modular tool-based system.
Im Projekt entwickeln wir einen intelligenten, virtuellen Betriebsassistenten auf Basis eines multimodalen KI-Modells, der die Substratzufuhr in Biogasanlagen automatisch optimiert und eine flexible, effiziente Stromeinspeisung ermöglicht. Durch modellbasierte prädiktive Regelung, KI-gestützte Zustandsschätzung und eine sprachbasierte Benutzeroberfläche wird der Anlagenbetrieb vereinfacht, nachhaltiger gestaltet und in realen Biogasanlagen validiert.
Autonomous exploration, open-vocabulary object recognition, and safe grasping have been combined on a compact mobile robot, enabling it to independently map unknown rooms, locate user-specified items, and pick them up. Field tests showed the frontier-guided strategy finds and grasps objects faster and more reliably than simpler straight-line searches, even in cluttered indoor spaces.
A privacy-preserving AI document assistant was built to let students find and understand their scattered digital course materials. After systematically testing 56 local RAG-pipeline setups, the best combination of smart parsers, dense retrieval and reranking rivaled cloud services without sending data off campus.
A voice-controlled web tool has been created that lets users manage Discord events and channels through everyday speech, automatically converting phrases like “tomorrow 3 pm” into the exact timestamps Discord needs. The system links large language models to Discord’s servers via the open Model Context Protocol, hides the technical details behind a simple browser interface, and can be used from any device on the network.
A set of web apps has been built to digitalize restaurant processes. Guests can scan a QR code to order and pay at the table, reserve seats and preorder meals, while staff track all tables in real time.
An AlphaZero-style AI for the two-player board game Blokus Duo was built and pitted against hand-crafted Minimax and MCTS opponents. Although ten rounds of self-training did not yet surpass the classical agents, the neural network noticeably boosted MCTS move quality, indicating clear room for further improvement.
Four vector databases (FAISS, Chroma, Qdrant, Weaviate) and four embedding models were benchmarked on one million text snippets for speed, accuracy and memory use.
Local open-source language models running on an Nvidia Jetson board were timed and graded on how well they wrote Python tasks for a robot arm. The 1.5-billion-token Qwen2.5-Coder delivered the fastest, most reliable code and was further tested in larger and quantized versions.
A chatbot was built to make university exam rules easy to query by voice or text. It uses Retrieval-Augmented Generation, instantly quoting the official TH Köln exam regulations and North Rhine-Westphalia higher-education law.
An AI chatbot using Retrieval-Augmented Generation was built to turn TH Köln’s sprawling website into easy, conversation-style answers. Tests show it cuts search time—especially for tricky topics like exam rules—while keeping facts trustworthy and hallucinations low.
A fast machine-learning “meta-model” was trained with active-learning tricks to imitate Europe’s slow groundwater-exposure model for pesticides. The resulting CatBoost committee predicts concentrations in real time with under 0.7 % error, enabling instant, low-cost screening of crop-protection products.
This study delves into the utilization of the large language model, GPT-4, as a controller to optimize substrate feed in an agricultural anaerobic co-digestion plant. Assigned with specific objectives, including targeted methane production, GPT-4 harnesses knowledge encompassing plant parameters, substrate characteristics, and real-time process data. The model formulates recommendations for substrate feed, offering transparent rationales for its decisions. To evaluate its effectiveness, a simulation model of an agricultural anaerobic co-digestion plant based on the Anaerobic Digestion Model no. 1 is employed. Initial findings suggest that GPT-4 effectively regulates substrate feed, maintaining methane production rates near predefined targets. Crucially, the explanations provided by GPT-4 are comprehensible. The accompanying code will be made accessible for further investigation and exploration.
The tutorial provides an overview of the development of intelligent autonomous assistants, focusing on how multimodal AI technologies—such as large language models, computer vision, and robotics—enable systems to understand language, perceive their environment, plan actions, and manipulate objects. It discusses practical applications in Industry 5.0, logistics, laboratories, smart homes, and agriculture, and contrasts modular system designs with end-to-end vision-language-action models. Finally, it highlights current research challenges, including data scarcity, simulation-to-reality transfer, system speed, and robustness in complex real-world environments.