Posts by Collection

portfolio

Development of autonomous intelligent systems

I am working on a framework that enables robots to follow natural-language instructions by combining large language models with real-time perception, spatial reasoning, and robot control through a modular tool-based system.

Verbundvorhaben: Intelligenter Virtueller Betriebsassistent zur Optimierung landwirtschaftlicher Biogasanlagen durch ein Large Multimodal Model

Im Projekt entwickeln wir einen intelligenten, virtuellen Betriebsassistenten auf Basis eines multimodalen KI-Modells, der die Substratzufuhr in Biogasanlagen automatisch optimiert und eine flexible, effiziente Stromeinspeisung ermöglicht. Durch modellbasierte prädiktive Regelung, KI-gestützte Zustandsschätzung und eine sprachbasierte Benutzeroberfläche wird der Anlagenbetrieb vereinfacht, nachhaltiger gestaltet und in realen Biogasanlagen validiert.

publications

student_projects

Lokalisierung eines Fahrzeugs mit einem Unscented Kalman Filter

T. Y. N., 2023

Vehicle localization for autonomous driving is achieved by fusing GNSS, IMU, and LiDAR data with an Unscented Kalman Filter (UKF). The UKF’s performance is evaluated in the CARLA driving simulator and compared to an Error State Extended Kalman Filter (ESEKF), with higher accuracy in position and orientation estimation observed under sensor noise and signal loss. It is demonstrated that the UKF provides a more reliable solution for non‑linear vehicle state estimation.

Entwicklung einer KI-gestützten Objekterkennung zur Navigation der autonomen Plattform Turtlebot 4

A. P., 2023

A YOLOv8 model was trained to detect fire extinguishers and integrated into the TurtleBot 4 platform, enabling real‑time object recognition. Two deployment strategies were evaluated: processing on the onboard Raspberry Pi, which proved too slow, and leveraging the OAK‑D‑PRO camera’s on‑chip inference, which delivered a substantial performance boost. Additionally, a laser‑scan based navigation algorithm was implemented, allowing the robot to autonomously explore unknown environments.

Entwicklung eines Kolorierungsassistenten auf Basis einer generativen KI

V. P., 2024

A coloring assistant based on a generative diffusion model is developed to automatically color manga line drawings. Its performance is evaluated against existing models using PSNR, MS‑SSIM and FID, showing improved fidelity in eye and hair colors and closer adherence to original sketches, while noting some residual artifacts that can be reduced by scaling the model. The results demonstrate that diffusion‑based approaches can deliver high‑quality coloring on consumer‑grade hardware.

Entwicklung eines Reinforcement Learning basierten Ansatzes zur Strategieoptimierung in einem Multi Agenten Roboter Szenario

A. M., 2024

A reinforcement‑learning approach based on Deep Q‑Learning is developed to optimise strategies for competing robots in a multi‑agent Microservice Dungeon scenario. Through a series of experiments, the agent’s performance is shown to improve with tailored reward functions, hyper‑parameter tuning, and observation adjustments, while the simplified environment and limited action space constrain its capabilities. The findings are highlighted as evidence that more advanced policy‑based methods and graph neural networks are needed to overcome these limitations and further enhance strategic decision‑making.

Entwicklung eines KI-gesteuerten Browser-Plugins zur Generierung kontext-abhängiger Alternativtexte für verbesserte Web-Barrierefreiheit

P. F., 2024

An AI‑powered browser plugin was developed to automatically generate context‑aware alternative text for images on webpages, thereby improving web accessibility for users with visual impairments. In evaluations, the AI was found capable of deciding when an alt‑text is needed and of producing more accurate descriptions by incorporating surrounding page text, although final verification by a human remains necessary.

Vergleich von Algorithmen zur Bewegungsplanung für einen Robotergreifarm mit 6 Freiheitsgraden in einer Simulation

C. V., 2024

A comparative study of three motion‑planning algorithms—PRM, CHOMP, and the convex‑set‑based GCS—was conducted for a six‑degree‑of‑freedom robotic manipulator in simulated environments. The algorithms were integrated into the ROS 2/MoveIt 2 framework and benchmarked in scenarios such as a table, a narrow passage, and a bookshelf, revealing that GCS can achieve rapid planning when correctly configured, while PRM is asymptotically optimal but computationally heavy and CHOMP provides fast local refinement. The results highlight trade‑offs between optimality, computation time, and robustness for robotic arm trajectory planning.

Visuelle Navigation und Lokalisation in einem Raum mithilfe von Neural Radiance Fields und Gaussian Splatting

P. T., 2024

The suitability of Neural Radiance Fields and Gaussian Splatting for visual‑only localisation and navigation of indoor mobile robots is investigated. A Monte‑Carlo particle filter that compares camera images with synthetically rendered views is implemented on a TurtleBot and evaluated at a minimum update rate of 1 Hz, demonstrating that accurate global localisation can be achieved without prior pose information, though occasional localisation errors remain. The results suggest that a fully visual, low‑cost navigation system based on these radiance‑field models is feasible for real‑time robot operation.

Weiterentwicklung und Evaluation eines adversarial robusten Network Intrusion Detection Systems

L. T., 2025

The thesis advances the adversarially robust network intrusion detection system Apollon by integrating additional machine‑learning classifiers and by replacing its original multi‑armed‑bandit selector with alternative heuristics such as epsilon‑greedy and Thompson‑sampling. The enhanced system is evaluated on the CIC‑IDS‑2017 dataset, where the expanded model pool markedly improves detection accuracy and resilience against black‑box adversarial attacks, and the new heuristics provide occasional robustness gains.

Implementierung eines spezifizierten Frage-Antwort-Large Language Models (LLM) für das Robot Operating System (ROS) mit Nutzung von Retrieval Augmented Generation (RAG)

S. S., 2025

A Retrieval‑Augmented Generation (RAG) system was built to extend a large language model with external ROS‑specific knowledge, using Python, LangChain, a Chroma vector database, Ollama for local model execution, and a Streamlit web interface. The system’s answers were compared with those of the plain language model, and a scoring‑based evaluation showed that RAG can produce noticeably better responses for certain question types and when sufficient domain data are available.

Entwicklung eines Chatbots für die öffentlich zugängliche Website der TH Köln

F. E., 2025

A chatbot was built for the publicly accessible website of the Technical University of Cologne, employing a Retrieval‑Augmented Generation approach that combines semantic vector search on crawled web content with a large language model (LLaMA 3) to generate natural‑language answers and cite sources. The system was prototyped with a Gradio web interface, hosted on Groq, and evaluated by comparing several LLMs on answer quality and clarity. The results demonstrate that a data‑driven assistant can meaningfully improve information access for university users.

Vorhersage von Lachgasemissionen bei der Abwasserbehandlung mit Maschinelles Lernen/Deep Learning

A. U., 2025

The project investigates how machine‑learning and deep‑learning techniques can forecast nitrous‑oxide (N₂O) emissions from wastewater‑treatment plants, comparing mechanistic and data‑driven models on real‑world process data. Advanced feature‑engineering, temporal cross‑validation, and model‑interpretability methods (e.g., SHAP, permutation importance) are applied to evaluate the predictive performance and robustness of algorithms such as XGBoost, Random Forest, k‑NN, and neural networks. The results show that selected ML models can reliably predict N₂O emissions, offering a practical basis for emission‑monitoring soft sensors in treatment facilities.

Entwicklung einer mobilen Anwendung zur interaktiven visuellen Szenenanalyse mithilfe von Segment-Anything-Modellen und Large Language Modellen

J. F. K. R., 2025

A modular Android application was created that combines on‑device image segmentation (YOLO‑seg) with a quantized large language model (Gemma) via a flexible JSON interface, enabling offline multimodal scene analysis for visually impaired users. The system was evaluated against a hybrid server‑assisted version, confirming that fully local processing is technically feasible on a standard smartphone, while the limited context window of the on‑device LLM was identified as the primary performance bottleneck, requiring intelligent prompt‑reduction strategies.

Autonomous Object Detection and Manipulation Using a Mobile Cobot

T. Y. N., 2026

Autonomous exploration, open-vocabulary object recognition, and safe grasping have been combined on a compact mobile robot, enabling it to independently map unknown rooms, locate user-specified items, and pick them up. Field tests showed the frontier-guided strategy finds and grasps objects faster and more reliably than simpler straight-line searches, even in cluttered indoor spaces.

Entwicklung und Test neuronaler Netze mithilfe von Edge Impulse auf einem Mikrocontroller zur Detektion und Klassifikation von flüchtigen organischen Verbindungen (VOCs)

J. D., 2026

A neural‑network model was trained with the low‑code platform Edge Impulse to recognise volatile organic compounds (VOCs) from data gathered by a cheap metal‑oxide gas sensor. The model was then deployed on an Arduino Nicla Sense ME microcontroller, enabling on‑device detection and classification of VOC sources such as cigarette smoke, 3D‑printer emissions and cooking fumes without any network connection. The results demonstrate that inexpensive embedded hardware can reliably identify common indoor VOCs, while also highlighting the limits of sensor sensitivity and the need for careful measurement conditions.

Entwicklung einer App zur sprachbasierten Interaktion mit Discord über einen Discord MCP Server

A. M., 2026

A voice-controlled web tool has been created that lets users manage Discord events and channels through everyday speech, automatically converting phrases like “tomorrow 3 pm” into the exact timestamps Discord needs. The system links large language models to Discord’s servers via the open Model Context Protocol, hides the technical details behind a simple browser interface, and can be used from any device on the network.

Vergleich von lokalen LLMs zur Programmierfähigkeit in Python

M. P. G., 2026

Local open-source language models running on an Nvidia Jetson board were timed and graded on how well they wrote Python tasks for a robot arm. The 1.5-billion-token Qwen2.5-Coder delivered the fastest, most reliable code and was further tested in larger and quantized versions.

Entwicklung eines Chatbots für Prüfungsrecht

T. M. N., 2026

A chatbot was built to make university exam rules easy to query by voice or text. It uses Retrieval-Augmented Generation, instantly quoting the official TH Köln exam regulations and North Rhine-Westphalia higher-education law.

Steuerung der Turtlebot 4 Plattform per Text - TextToTurtlebot

N. S. & M. K. & M. F., 2026

Natural-language commands are converted into autonomous TurtleBot 4 actions through a language-model interpreter linked to YOLO-based vision, LiDAR, and depth sensing; behavior-tree control replaced an early state-machine to let the robot explore, avoid obstacles, and resume tasks smoothly. The integrated system was shown to simplify human–robot interaction without requiring programming skills.

Vergleich von 3D-Gaussian-Splatting-Algorithmen zur Rekonstruktion einer komplexen Indoor-Szene

N. S., 2026

Seven recent 3D-Gaussian-splatting algorithms were compared on a cluttered garage scene to see how well they reconstruct tricky indoor spaces. No single method won outright: Mip-Splatting delivered the sharpest images, Brush the most natural look with least memory, and FastGS the fastest training, so the best choice depends on whether quality, memory, or speed matters most.

Materialeffizienz und Statik vereint: Ein hybrider Algorithmus zur Optimierung multipler Kastenträger-Konfigurationen

M. T., 2026

A hybrid evolutionary algorithm is presented that optimizes the cutting of metal plates for crane frame structures, simultaneously minimizing material waste and the number of joint connections that affect static integrity. The method operates in two stages—first determining the minimal number of plates required and then mutating plate configurations under static constraints—and is designed to be applied across multiple orders, enabling additional material savings and adaptable, data‑driven planning.

Entwicklung eines RAG-basierten Chatbots zur Beantwortung prüfungsrechtlicher Fragen

N. B., 2026

A privacy-focused chatbot was built to answer university exam-regulation questions using a hybrid search through 47 official rulebooks. Surprisingly, a local 14-billion-parameter model matched the accuracy of a 70-billion-parameter cloud service, proving that smaller, on-campus AI can deliver reliable, regulation-grade answers without sending data elsewhere.

LLM Unterstützte High Level Aufgabenplanung zur kollisionsfreien Bewegungsplanung eines Roboter- arms mit MoveIt in einer Simulationsumgebung

L. A., 2026

A large language model is employed as a high‑level interface that translates natural‑language commands into well‑defined robot actions, which are then executed collision‑free by the ROS 2‑based MoveIt framework in a simulated environment. The approach is demonstrated with a Niryo Ned2 arm performing object detection, pick‑and‑place, and stacking tasks, and its feasibility, limitations, and challenges are evaluated.

talks

Synergizing Language Models and Biogas Plant Control: A GPT-4 Approach

Published:

This study delves into the utilization of the large language model, GPT-4, as a controller to optimize substrate feed in an agricultural anaerobic co-digestion plant. Assigned with specific objectives, including targeted methane production, GPT-4 harnesses knowledge encompassing plant parameters, substrate characteristics, and real-time process data. The model formulates recommendations for substrate feed, offering transparent rationales for its decisions. To evaluate its effectiveness, a simulation model of an agricultural anaerobic co-digestion plant based on the Anaerobic Digestion Model no. 1 is employed. Initial findings suggest that GPT-4 effectively regulates substrate feed, maintaining methane production rates near predefined targets. Crucially, the explanations provided by GPT-4 are comprehensible. The accompanying code will be made accessible for further investigation and exploration.

Entwicklung intelligenter autonomer Assistenten

Published:

The tutorial provides an overview of the development of intelligent autonomous assistants, focusing on how multimodal AI technologies—such as large language models, computer vision, and robotics—enable systems to understand language, perceive their environment, plan actions, and manipulate objects. It discusses practical applications in Industry 5.0, logistics, laboratories, smart homes, and agriculture, and contrasts modular system designs with end-to-end vision-language-action models. Finally, it highlights current research challenges, including data scarcity, simulation-to-reality transfer, system speed, and robustness in complex real-world environments.

Natural-Language Robot Manipulation via MCP: An Integrated Framework for Vision-Guided Pick-and-Place Automation

Published:

This work presents a unified software framework for controlling robotic manipulators through unconstrained natural language, built around the Model Context Protocol (MCP). Users can issue commands via text, voice, or a web GUI, which are interpreted by a large language model that decomposes instructions into structured tool calls executed by a FastMCP server. The system integrates real-time open-vocabulary object detection, instance segmentation, and workspace coordinate transforms to ground language in the physical scene, enabling complex pick-and-place operations. A Redis-based pub/sub architecture decouples perception, reasoning, and control into independent processes, while a platform-agnostic hardware abstraction supports both the Niryo Ned2 and WidowX-250 robotic arms. Evaluation across eight diverse manipulation tasks — including multi-step operations, spatial reasoning, and multilingual instructions — achieved an 83% success rate on a Niryo Ned2, with the primary bottleneck being open-vocabulary perception rather than LLM reasoning or motion control. All components are publicly available, providing a reproducible foundation for natural-language interfaces to cyber-physical systems.

teaching

Künstliche Intelligenz

Undergraduate course, TH Köln, Institut für Informatik, 2021

Künstliche Intelligenz im Bachelorstudiengang Informatik.

Algorithmik

Undergraduate course, TH Köln, Institut für Informatik, 2022

Algorithmik im Bachelorstudiengang Informatik.

Cyber-Physische Systeme

Undergraduate course, TH Köln, Institut für Informatik, 2023

WPF Cyber-Physische Systeme im Bachelorstudiengang Informatik.

Mensch-zentrierte Künstliche Intelligenz

Undergraduate course, TH Köln, Institut für Informatik und Wirtschaftsinformatik, 2025

Mensch-zentrierte Künstliche Intelligenz im Bachelorstudiengang Medieninformatik.