Entwicklung einer mobilen Anwendung zur interaktiven visuellen Szenenanalyse mithilfe von Segment-Anything-Modellen und Large Language Modellen

J. F. K. R., 2025

A modular Android application was created that combines on‑device image segmentation (YOLO‑seg) with a quantized large language model (Gemma) via a flexible JSON interface, enabling offline multimodal scene analysis for visually impaired users. The system was evaluated against a hybrid server‑assisted version, confirming that fully local processing is technically feasible on a standard smartphone, while the limited context window of the on‑device LLM was identified as the primary performance bottleneck, requiring intelligent prompt‑reduction strategies.

Bluesky Facebook LinkedIn Mastodon X (formerly Twitter)

Daniel Gaida