Skip to content

Veea: Private AI Assistant — System Design Overview

Introduction

This document outlines the system design for Veea, a cross-platform mobile application that acts as an AI assistant for meeting notes. The app records audio conversations, then processes them using AI to generate transcripts, speaker diarization, and summaries.

Key Feature: The user can choose to perform this processing either on-device for privacy or in a private cloud for speed and accuracy.


High-Level Architecture

Veea follows a hybrid architecture designed for privacy, modularity, and extensibility. It combines:

  • Flutter front-end for cross-platform UI and state management
  • Rust core for high-performance local AI processing
  • Private cloud backend for accelerated inference
graph TD
    %% ===== Styling (color & layout) =====
    classDef flutter fill:#E3F2FD,stroke:#1565C0,stroke-width:1px,color:#0D47A1;
    classDef domain fill:#E8F5E9,stroke:#2E7D32,stroke-width:1px,color:#1B5E20;
    classDef data fill:#FFF3E0,stroke:#EF6C00,stroke-width:1px,color:#E65100;
    classDef processing fill:#F3E5F5,stroke:#6A1B9A,stroke-width:1px,color:#4A148C;
    classDef cloud fill:#FCE4EC,stroke:#AD1457,stroke-width:1px,color:#880E4F;

    %% ===== Flutter Layer (Top) =====
    subgraph Flutter_App["Flutter App"]
        UI["Presentation Layer (Flutter UI)"]
        StateManagement["State Management (Cubit / Bloc)"]
    end
    class Flutter_App flutter

    %% ===== Domain Layer =====
    subgraph Domain_Layer["Domain Layer"]
        UseCases["Use Cases"]
        Repositories["Repository Interfaces"]
    end
    class Domain_Layer domain

    %% ===== Data Layer =====
    subgraph Data_Layer["Data Layer"]
        DataSources["Data Sources"]
        LocalDB["Local Database"]
    end
    class Data_Layer data

    %% ===== Processing Options =====
    subgraph Processing_Options["Processing Options"]
        FFI["FFI Bridge"]
        PrivateCloud["Private Cloud API"]
    end
    class Processing_Options processing

    %% ===== Local AI Core =====
    subgraph Rust_Core["Rust Core (Local AI)"]
        STT["Speech-to-Text Engine"]
        DIAR["Speaker Diarization"]
        SUM["Summarization LLM"]
    end
    class Rust_Core processing

    %% ===== Private Cloud =====
    subgraph Private_Cloud["Private Cloud (Optional)"]
        API["Processing API Layer"]
        GPU["GPU Workers / Model Serving"]
    end
    class Private_Cloud cloud

    %% ===== Flow Connections =====
    UI --> StateManagement
    StateManagement --> UseCases
    UseCases --> Repositories
    Repositories --> DataSources
    DataSources --> LocalDB
    DataSources --> FFI
    DataSources --> PrivateCloud
    FFI --> Rust_Core
    PrivateCloud --> API
    API --> GPU

    %% Results back to app
    Rust_Core --> FFI
    FFI --> DataSources
    PrivateCloud --> DataSources
    DataSources --> Repositories
    Repositories --> UseCases
    UseCases --> StateManagement
    StateManagement --> UI

Core Principles

Principle Description
Privacy by Design No raw audio leaves the device in Local Mode.
Hybrid Intelligence Switch seamlessly between local and private-cloud inference.
Modular Architecture Clean separation of presentation, domain, and data layers.
Lightweight & Efficient Rust-powered ML modules for low latency and power use.
Offline-First All primary functions work without internet connectivity.

Modules Overview

Module Description
Authentication Handles user sign-up, login, logout, and session persistence. Supports guest (local-only) and private-cloud accounts.
Onboarding Guides first-time users through permission requests, usage goals, and privacy-mode selection (Local vs Cloud).
Settings Manages profile info, model selection, language, and storage preferences.
Note Taking Records audio, creates a new Note entity, and stores local metadata in DB.
Notes Library Displays and manages saved notes, transcripts, and summaries. Supports search, filter, and export.
AI Pipeline Performs STT, diarization, and summarization using AI enging.

Core Data Flow

  1. Record Start → Creates a new Note (status = recording) and starts microphone capture.
  2. Record Stop → Finalizes audio and triggers background processing.
  3. STT Engine → Converts audio chunks into transcripts.
  4. Diarization → Detects and labels speakers across the timeline.
  5. Summarization → Generates concise summaries and action points.
  6. Persistence → Saves all artifacts (audio, transcript, summary) into the local DB.
  7. Sync (Optional) → Uploads encrypted artifacts to the private cloud for further processing or backup.