I am working on a framework that enables robots to follow natural-language instructions by combining large language models with real-time perception, spatial reasoning, and robot control through a modular tool-based system.
Im Projekt entwickeln wir einen intelligenten, virtuellen Betriebsassistenten auf Basis eines multimodalen KI-Modells, der die Substratzufuhr in Biogasanlagen automatisch optimiert und eine flexible, effiziente Stromeinspeisung ermöglicht. Durch modellbasierte prädiktive Regelung, KI-gestützte Zustandsschätzung und eine sprachbasierte Benutzeroberfläche wird der Anlagenbetrieb vereinfacht, nachhaltiger gestaltet und in realen Biogasanlagen validiert.
Vehicle localization for autonomous driving is achieved by fusing GNSS, IMU, and LiDAR data with an Unscented Kalman Filter (UKF). The UKF’s performance is evaluated in the CARLA driving simulator and compared to an Error State Extended Kalman Filter (ESEKF), with higher accuracy in position and orientation estimation observed under sensor noise and signal loss. It is demonstrated that the UKF provides a more reliable solution for non‑linear vehicle state estimation.
A YOLOv8 model was trained to detect fire extinguishers and integrated into the TurtleBot 4 platform, enabling real‑time object recognition. Two deployment strategies were evaluated: processing on the onboard Raspberry Pi, which proved too slow, and leveraging the OAK‑D‑PRO camera’s on‑chip inference, which delivered a substantial performance boost. Additionally, a laser‑scan based navigation algorithm was implemented, allowing the robot to autonomously explore unknown environments.
A coloring assistant based on a generative diffusion model is developed to automatically color manga line drawings. Its performance is evaluated against existing models using PSNR, MS‑SSIM and FID, showing improved fidelity in eye and hair colors and closer adherence to original sketches, while noting some residual artifacts that can be reduced by scaling the model. The results demonstrate that diffusion‑based approaches can deliver high‑quality coloring on consumer‑grade hardware.
A reinforcement‑learning approach based on Deep Q‑Learning is developed to optimise strategies for competing robots in a multi‑agent Microservice Dungeon scenario. Through a series of experiments, the agent’s performance is shown to improve with tailored reward functions, hyper‑parameter tuning, and observation adjustments, while the simplified environment and limited action space constrain its capabilities. The findings are highlighted as evidence that more advanced policy‑based methods and graph neural networks are needed to overcome these limitations and further enhance strategic decision‑making.
An AI‑powered browser plugin was developed to automatically generate context‑aware alternative text for images on webpages, thereby improving web accessibility for users with visual impairments. In evaluations, the AI was found capable of deciding when an alt‑text is needed and of producing more accurate descriptions by incorporating surrounding page text, although final verification by a human remains necessary.
A comparative study of three motion‑planning algorithms—PRM, CHOMP, and the convex‑set‑based GCS—was conducted for a six‑degree‑of‑freedom robotic manipulator in simulated environments. The algorithms were integrated into the ROS 2/MoveIt 2 framework and benchmarked in scenarios such as a table, a narrow passage, and a bookshelf, revealing that GCS can achieve rapid planning when correctly configured, while PRM is asymptotically optimal but computationally heavy and CHOMP provides fast local refinement. The results highlight trade‑offs between optimality, computation time, and robustness for robotic arm trajectory planning.
The suitability of Neural Radiance Fields and Gaussian Splatting for visual‑only localisation and navigation of indoor mobile robots is investigated. A Monte‑Carlo particle filter that compares camera images with synthetically rendered views is implemented on a TurtleBot and evaluated at a minimum update rate of 1 Hz, demonstrating that accurate global localisation can be achieved without prior pose information, though occasional localisation errors remain. The results suggest that a fully visual, low‑cost navigation system based on these radiance‑field models is feasible for real‑time robot operation.
The thesis advances the adversarially robust network intrusion detection system Apollon by integrating additional machine‑learning classifiers and by replacing its original multi‑armed‑bandit selector with alternative heuristics such as epsilon‑greedy and Thompson‑sampling. The enhanced system is evaluated on the CIC‑IDS‑2017 dataset, where the expanded model pool markedly improves detection accuracy and resilience against black‑box adversarial attacks, and the new heuristics provide occasional robustness gains.
A Retrieval‑Augmented Generation (RAG) system was built to extend a large language model with external ROS‑specific knowledge, using Python, LangChain, a Chroma vector database, Ollama for local model execution, and a Streamlit web interface. The system’s answers were compared with those of the plain language model, and a scoring‑based evaluation showed that RAG can produce noticeably better responses for certain question types and when sufficient domain data are available.
A chatbot was built for the publicly accessible website of the Technical University of Cologne, employing a Retrieval‑Augmented Generation approach that combines semantic vector search on crawled web content with a large language model (LLaMA 3) to generate natural‑language answers and cite sources. The system was prototyped with a Gradio web interface, hosted on Groq, and evaluated by comparing several LLMs on answer quality and clarity. The results demonstrate that a data‑driven assistant can meaningfully improve information access for university users.
The project investigates how machine‑learning and deep‑learning techniques can forecast nitrous‑oxide (N₂O) emissions from wastewater‑treatment plants, comparing mechanistic and data‑driven models on real‑world process data. Advanced feature‑engineering, temporal cross‑validation, and model‑interpretability methods (e.g., SHAP, permutation importance) are applied to evaluate the predictive performance and robustness of algorithms such as XGBoost, Random Forest, k‑NN, and neural networks. The results show that selected ML models can reliably predict N₂O emissions, offering a practical basis for emission‑monitoring soft sensors in treatment facilities.
A modular Android application was created that combines on‑device image segmentation (YOLO‑seg) with a quantized large language model (Gemma) via a flexible JSON interface, enabling offline multimodal scene analysis for visually impaired users. The system was evaluated against a hybrid server‑assisted version, confirming that fully local processing is technically feasible on a standard smartphone, while the limited context window of the on‑device LLM was identified as the primary performance bottleneck, requiring intelligent prompt‑reduction strategies.
Autonomous exploration, open-vocabulary object recognition, and safe grasping have been combined on a compact mobile robot, enabling it to independently map unknown rooms, locate user-specified items, and pick them up. Field tests showed the frontier-guided strategy finds and grasps objects faster and more reliably than simpler straight-line searches, even in cluttered indoor spaces.
A neural‑network model was trained with the low‑code platform Edge Impulse to recognise volatile organic compounds (VOCs) from data gathered by a cheap metal‑oxide gas sensor. The model was then deployed on an Arduino Nicla Sense ME microcontroller, enabling on‑device detection and classification of VOC sources such as cigarette smoke, 3D‑printer emissions and cooking fumes without any network connection. The results demonstrate that inexpensive embedded hardware can reliably identify common indoor VOCs, while also highlighting the limits of sensor sensitivity and the need for careful measurement conditions.
A privacy-preserving AI document assistant was built to let students find and understand their scattered digital course materials. After systematically testing 56 local RAG-pipeline setups, the best combination of smart parsers, dense retrieval and reranking rivaled cloud services without sending data off campus.
A voice-controlled web tool has been created that lets users manage Discord events and channels through everyday speech, automatically converting phrases like “tomorrow 3 pm” into the exact timestamps Discord needs. The system links large language models to Discord’s servers via the open Model Context Protocol, hides the technical details behind a simple browser interface, and can be used from any device on the network.
A set of web apps has been built to digitalize restaurant processes. Guests can scan a QR code to order and pay at the table, reserve seats and preorder meals, while staff track all tables in real time.
An AlphaZero-style AI for the two-player board game Blokus Duo was built and pitted against hand-crafted Minimax and MCTS opponents. Although ten rounds of self-training did not yet surpass the classical agents, the neural network noticeably boosted MCTS move quality, indicating clear room for further improvement.
Four vector databases (FAISS, Chroma, Qdrant, Weaviate) and four embedding models were benchmarked on one million text snippets for speed, accuracy and memory use.
Local open-source language models running on an Nvidia Jetson board were timed and graded on how well they wrote Python tasks for a robot arm. The 1.5-billion-token Qwen2.5-Coder delivered the fastest, most reliable code and was further tested in larger and quantized versions.
A chatbot was built to make university exam rules easy to query by voice or text. It uses Retrieval-Augmented Generation, instantly quoting the official TH Köln exam regulations and North Rhine-Westphalia higher-education law.
A miniature 4×5 turn-based grid world has been created where an AI robot must reach goal squares while a chasing ghost tries to catch it. Random “slip” moves and walls add uncertainty, making the Gymnasium-based environment a compact test-bed for reinforcement-learning algorithms.
An AI chatbot using Retrieval-Augmented Generation was built to turn TH Köln’s sprawling website into easy, conversation-style answers. Tests show it cuts search time—especially for tricky topics like exam rules—while keeping facts trustworthy and hallucinations low.
Natural-language commands are converted into autonomous TurtleBot 4 actions through a language-model interpreter linked to YOLO-based vision, LiDAR, and depth sensing; behavior-tree control replaced an early state-machine to let the robot explore, avoid obstacles, and resume tasks smoothly. The integrated system was shown to simplify human–robot interaction without requiring programming skills.
A new tool has been developed that automatically converts PowerPoint slides into LaTeX Beamer code by combining layout analysis with local large language models. Texts, images, tables and exact positions are extracted and reliably translated into compilable LaTeX, eliminating the need for manual reworking.
Seven recent 3D-Gaussian-splatting algorithms were compared on a cluttered garage scene to see how well they reconstruct tricky indoor spaces. No single method won outright: Mip-Splatting delivered the sharpest images, Brush the most natural look with least memory, and FastGS the fastest training, so the best choice depends on whether quality, memory, or speed matters most.
A fast machine-learning “meta-model” was trained with active-learning tricks to imitate Europe’s slow groundwater-exposure model for pesticides. The resulting CatBoost committee predicts concentrations in real time with under 0.7 % error, enabling instant, low-cost screening of crop-protection products.
A hybrid evolutionary algorithm is presented that optimizes the cutting of metal plates for crane frame structures, simultaneously minimizing material waste and the number of joint connections that affect static integrity. The method operates in two stages—first determining the minimal number of plates required and then mutating plate configurations under static constraints—and is designed to be applied across multiple orders, enabling additional material savings and adaptable, data‑driven planning.
A privacy-focused chatbot was built to answer university exam-regulation questions using a hybrid search through 47 official rulebooks. Surprisingly, a local 14-billion-parameter model matched the accuracy of a 70-billion-parameter cloud service, proving that smaller, on-campus AI can deliver reliable, regulation-grade answers without sending data elsewhere.
A large language model is employed as a high‑level interface that translates natural‑language commands into well‑defined robot actions, which are then executed collision‑free by the ROS 2‑based MoveIt framework in a simulated environment. The approach is demonstrated with a Niryo Ned2 arm performing object detection, pick‑and‑place, and stacking tasks, and its feasibility, limitations, and challenges are evaluated.
This study delves into the utilization of the large language model, GPT-4, as a controller to optimize substrate feed in an agricultural anaerobic co-digestion plant. Assigned with specific objectives, including targeted methane production, GPT-4 harnesses knowledge encompassing plant parameters, substrate characteristics, and real-time process data. The model formulates recommendations for substrate feed, offering transparent rationales for its decisions. To evaluate its effectiveness, a simulation model of an agricultural anaerobic co-digestion plant based on the Anaerobic Digestion Model no. 1 is employed. Initial findings suggest that GPT-4 effectively regulates substrate feed, maintaining methane production rates near predefined targets. Crucially, the explanations provided by GPT-4 are comprehensible. The accompanying code will be made accessible for further investigation and exploration.
The tutorial provides an overview of the development of intelligent autonomous assistants, focusing on how multimodal AI technologies—such as large language models, computer vision, and robotics—enable systems to understand language, perceive their environment, plan actions, and manipulate objects. It discusses practical applications in Industry 5.0, logistics, laboratories, smart homes, and agriculture, and contrasts modular system designs with end-to-end vision-language-action models. Finally, it highlights current research challenges, including data scarcity, simulation-to-reality transfer, system speed, and robustness in complex real-world environments.
This work presents a unified software framework for controlling robotic manipulators through unconstrained natural language, built around the Model Context Protocol (MCP). Users can issue commands via text, voice, or a web GUI, which are interpreted by a large language model that decomposes instructions into structured tool calls executed by a FastMCP server. The system integrates real-time open-vocabulary object detection, instance segmentation, and workspace coordinate transforms to ground language in the physical scene, enabling complex pick-and-place operations. A Redis-based pub/sub architecture decouples perception, reasoning, and control into independent processes, while a platform-agnostic hardware abstraction supports both the Niryo Ned2 and WidowX-250 robotic arms. Evaluation across eight diverse manipulation tasks — including multi-step operations, spatial reasoning, and multilingual instructions — achieved an 83% success rate on a Niryo Ned2, with the primary bottleneck being open-vocabulary perception rather than LLM reasoning or motion control. All components are publicly available, providing a reproducible foundation for natural-language interfaces to cyber-physical systems.