Architecture
This document describes the high-level architecture of the Semantic Backup Explorer.
Overview
The project is a RAG (Retrieval-Augmented Generation) based system designed to help users search through their file backups and synchronize local changes.
Components
1. Library (semantic_backup_explorer/)
core/: Contains the main business logic (BackupOperations). It orchestrates folder finding and comparison.rag/: Implements the RAG pipeline usingSentenceTransformersfor embeddings andChromaDBfor vector storage. It usesllm_clientto interface with Groq.indexer/: Handles the recursive scanning of backup directories and produces a Markdown index file.chunking/: Partitions the Markdown index into folder-based chunks suitable for the vector database.compare/: Logic for comparing local directory contents with the backup index, considering both existence and modification times.sync/: Handles the actual copying of files from source to destination.utils/: Shared utilities for configuration, logging, path normalization, and compatibility.
2. CLI & UI (semantic_backup_explorer/cli/)
ui/gradio_app.py: A web interface built with Gradio for an interactive experience.commands/: (Future) Place for modular CLI commands.
3. Scripts (scripts/)
Standalone Python scripts for common tasks:
- auto_sync.py: Automated synchronization based on a config file.
- build_index.py: Scans a backup drive and builds the vector database.
Data Flow
- Scanning:
indexerscans the backup drive ->backup_index.md. - Indexing:
chunkingreadsbackup_index.md->rag.Embeddercreates vectors ->rag.Retrieverstores inChromaDB. - Search: User query ->
rag.Embedder->rag.Retriever(context) ->llm_client(Groq) -> Answer. - Compare & Sync: Local folder ->
core.BackupOperationsfinds backup counterpart (keyword or RAG) ->compareidentifies differences ->synccopies files.