Veea: Private AI Assistant — System Design Overview

Introduction

This document outlines the system design for Veea, a cross-platform mobile application that acts as an AI assistant for meeting notes. The app records audio conversations, then processes them using AI to generate transcripts, speaker diarization, and summaries.

Key Feature: The user can choose to perform this processing either on-device for privacy or in a private cloud for speed and accuracy.

High-Level Architecture

Veea follows a hybrid architecture designed for privacy, modularity, and extensibility. It combines:

Flutter front-end for cross-platform UI and state management
Rust core for high-performance local AI processing
Private cloud backend for accelerated inference

graph TD
    %% ===== Styling (color & layout) =====
    classDef flutter fill:#E3F2FD,stroke:#1565C0,stroke-width:1px,color:#0D47A1;
    classDef domain fill:#E8F5E9,stroke:#2E7D32,stroke-width:1px,color:#1B5E20;
    classDef data fill:#FFF3E0,stroke:#EF6C00,stroke-width:1px,color:#E65100;
    classDef processing fill:#F3E5F5,stroke:#6A1B9A,stroke-width:1px,color:#4A148C;
    classDef cloud fill:#FCE4EC,stroke:#AD1457,stroke-width:1px,color:#880E4F;

    %% ===== Flutter Layer (Top) =====
    subgraph Flutter_App["Flutter App"]
        UI["Presentation Layer (Flutter UI)"]
        StateManagement["State Management (Cubit / Bloc)"]
    end
    class Flutter_App flutter

    %% ===== Domain Layer =====
    subgraph Domain_Layer["Domain Layer"]
        UseCases["Use Cases"]
        Repositories["Repository Interfaces"]
    end
    class Domain_Layer domain

    %% ===== Data Layer =====
    subgraph Data_Layer["Data Layer"]
        DataSources["Data Sources"]
        LocalDB["Local Database"]
    end
    class Data_Layer data

    %% ===== Processing Options =====
    subgraph Processing_Options["Processing Options"]
        FFI["FFI Bridge"]
        PrivateCloud["Private Cloud API"]
    end
    class Processing_Options processing

    %% ===== Local AI Core =====
    subgraph Rust_Core["Rust Core (Local AI)"]
        STT["Speech-to-Text Engine"]
        DIAR["Speaker Diarization"]
        SUM["Summarization LLM"]
    end
    class Rust_Core processing

    %% ===== Private Cloud =====
    subgraph Private_Cloud["Private Cloud (Optional)"]
        API["Processing API Layer"]
        GPU["GPU Workers / Model Serving"]
    end
    class Private_Cloud cloud

    %% ===== Flow Connections =====
    UI --> StateManagement
    StateManagement --> UseCases
    UseCases --> Repositories
    Repositories --> DataSources
    DataSources --> LocalDB
    DataSources --> FFI
    DataSources --> PrivateCloud
    FFI --> Rust_Core
    PrivateCloud --> API
    API --> GPU

    %% Results back to app
    Rust_Core --> FFI
    FFI --> DataSources
    PrivateCloud --> DataSources
    DataSources --> Repositories
    Repositories --> UseCases
    UseCases --> StateManagement
    StateManagement --> UI

Core Principles

Principle	Description
Privacy by Design	No raw audio leaves the device in Local Mode.
Hybrid Intelligence	Switch seamlessly between local and private-cloud inference.
Modular Architecture	Clean separation of presentation, domain, and data layers.
Lightweight & Efficient	Rust-powered ML modules for low latency and power use.
Offline-First	All primary functions work without internet connectivity.

Modules Overview

Module	Description
Authentication	Handles user sign-up, login, logout, and session persistence. Supports guest (local-only) and private-cloud accounts.
Onboarding	Guides first-time users through permission requests, usage goals, and privacy-mode selection (Local vs Cloud).
Settings	Manages profile info, model selection, language, and storage preferences.
Note Taking	Records audio, creates a new Note entity, and stores local metadata in DB.
Notes Library	Displays and manages saved notes, transcripts, and summaries. Supports search, filter, and export.
AI Pipeline	Performs STT, diarization, and summarization using AI enging.

Core Data Flow

Record Start → Creates a new Note (status = recording) and starts microphone capture.
Record Stop → Finalizes audio and triggers background processing.
STT Engine → Converts audio chunks into transcripts.
Diarization → Detects and labels speakers across the timeline.
Summarization → Generates concise summaries and action points.
Persistence → Saves all artifacts (audio, transcript, summary) into the local DB.
Sync (Optional) → Uploads encrypted artifacts to the private cloud for further processing or backup.