Architecture Deep Dive

Building Openclad: A 24/7 Personal AI Agent That Actually Does Things

How I built an autonomous AI assistant that runs on my Mac, talks through Telegram, controls my smart home, tracks my job search, and learns from every interaction — using Claude, MCP servers, and about 2,000 lines of Python.

Claude Agent SDK + MCP ~2,000 lines Python 24/7 autonomous github.com/foodlbs/openclad

There's a gap between what AI chatbots can do and what they actually do for you day-to-day. ChatGPT can write a poem, but it can't turn off your living room lights when you're already in bed. Claude can analyze a document, but it won't proactively send you a news briefing at 9 AM.

I wanted something different: an AI agent that runs continuously, has access to my real tools and services, makes decisions autonomously for low-risk tasks, and asks permission before doing anything destructive. Something I could message from my phone and get things done — not just get answers.

So I built Jarvis. This post walks through the architecture, key design decisions, and how you can build your own. The full source is at github.com/foodlbs/openclad.

01 What Jarvis Can Do

Before diving into architecture, here's what a typical day looks like:

9:00 AM

Jarvis sends a news briefing to Telegram — top headlines, tech news, market movements. No prompt needed; it's a scheduled skill.

10:30 AM

"What's the status of my job applications?" → Jarvis queries the SQLite-backed tracker: 3 applied, 1 interview scheduled, 2 rejected.

2:00 PM

I send a PDF: "Summarize this and save the key points." → Jarvis reads it, generates a summary, stores it in vector memory, saves a markdown file to Documents.

11:00 PM

"Turn off all the lights." → Approval request with inline Telegram button. I tap Approve. Lights off via Home Assistant.

The key insight: Jarvis doesn't just respond to questions. It executes tasks, remembers context, and operates on a schedule — all while a risk classification system keeps me in control of anything with real-world side effects.

02 Architecture Overview

The system breaks into four layers: the Telegram interface (how I talk to it), the agent core (how it thinks and acts), MCP servers (what it can do), and persistent state (what it remembers).

System Architecture — Component Map

03 The Tech Stack

Layer	Technology	Rationale
Language	Python 3.12 + uv workspace	Fast dependency resolution, monorepo support
Agent	Claude Agent SDK (subprocess)	Process isolation, crash recovery, tool use built-in
Chat Interface	aiogram 3.x	Async Telegram, inline keyboards for approvals
State & Events	Redis	Task state, conversation buffer, retry queue
Vector Memory	ChromaDB (local)	No cloud dependency, free, cosine similarity search
Embeddings	OpenAI text-embedding-3-small	Cost-effective, high quality for semantic search
MCP Framework	FastMCP	Simple Python MCP server scaffolding
Config	pydantic-settings + YAML	Type-safe config with env var overrides
Logging	structlog	Structured JSON logs for production
Daemon	macOS LaunchAgent	Auto-start on boot, background execution

04 Deep Dive: How Each Piece Works

1. The Conversation Flow

When I send a message on Telegram, here's exactly what happens:

Sequence Diagram — "Turn off the lights"

Smart model selection — Short messages auto-route to Haiku instead of Sonnet. Saves cost and latency for trivial queries while preserving Sonnet's reasoning capacity for demanding work.

Conversation buffer — Last 10 turns in Redis with a 1-hour TTL. Follow-ups like "What about the bedroom lights?" work because Jarvis remembers the topic.

2. The Risk System

An autonomous agent with access to your filesystem, email, and smart home needs guardrails. Jarvis uses a two-tier classification:

Risk Classification — Two-Tier Model

✓ Autonomous — No Approval Needed

File reads

Web search

Memory queries

Code sandbox

Browser navigation

Job tracker reads

🔒 Require Approval — Inline Button

File writes / deletes

Email sends

Calendar edits

Smart home control

Phone calls

Purchases

Classification examines input parameters, not just tool names. Reading ~/Documents is autonomous; writing to /etc/ always requires approval. Configured via risk_policy.yaml:

# risk_policy.yaml
risk_overrides:
  mcp__filesystem__write_file: autonomous
  mcp__smart_home__call_service: require_approval

context_escalation:
  dangerous_paths: [/system, /etc, /usr/bin]
  sensitive_entities: [lock, alarm, security]

3. The Memory System

Memory Architecture — Dual-Layer Design

Short-Term · System Prompt

preferences.mdLearned preferences

projects.mdActive project status

chat_history.mdRecent sessions · auto-trim at 30

journal.mdAgent reflections

ContextLoaderAssembles into every prompt

→ PersonalAgent ←

Long-Term · Vector Search

ChromaDB storeLocal — no cloud dependency

OpenAI Embeddingstext-embedding-3-small

memory_store_tool2-3 sentence task summaries

memory_search_toolSemantic recall before complex tasks

auto-trimCompresses history > 30 sessions

The combination gives Jarvis both working memory (always loaded into every prompt) and recall (searchable when needed).

4. The Skill Framework

Skills are Markdown files defining triggers, steps, and required tools. When Jarvis spots a repeatable pattern (3+ similar tool call sequences), it suggests creating a new skill — the system grows organically.

## Daily News Briefing
Trigger: "news update", "daily news", "morning briefing"
Schedule: Every day at 9:00 AM EST

### Steps
1. Search for current top headlines
2. Search for tech industry news
3. Format into clean briefing with sections
4. Output ONLY the briefing — no meta-commentary

5. The Scheduler

Parses a human-readable schedules.md into cron-like entries. Each job runs as an async task with max turns capped at 15 to enforce conciseness and suppress meta-commentary.

6. Resilience & Fallback

Retry queue — Failed tasks enter a Redis sorted set with exponential backoff (30s → 60s → 120s). After 3 retries, abandoned with failure notification. Model downgrades to Haiku on retry.

API fallback — Rate-limit or downtime detected via error keywords → falls back to Claude Code CLI with a cached OAuth token.

Circuit breaker — Closed → open → half-open. After 5 consecutive failures, rejects calls for 60 seconds before allowing a probe.

05 Project Structure

openclad/
├── main.py                  # Entry point & orchestrator
├── pyproject.toml            # uv workspace root
├── compose.yaml              # Docker Compose (Redis)
├── packages/
│   ├── core/               # Agent, config, state, risk, retry, scheduler
│   ├── interfaces/         # Telegram bot, handlers, approval flow
│   └── mcp_servers/        # Job tracker, smart home, memory
├── data/
│   ├── agent_context/        # personality.md, skills/, schedules.md
│   ├── memory/chroma/        # Vector store persistence
│   └── secrets/              # OAuth credentials (gitignored)
├── configs/
│   ├── agent.yaml            # Runtime config
│   └── risk_policy.yaml      # Risk classification overrides
└── tests/                    # 20+ unit tests

The monorepo keeps things modular — core has no Telegram dependency, interfaces has no MCP dependency, and mcp_servers are standalone FastMCP processes. Want Discord instead of Telegram? Replace interfaces without touching core.

06 Key Design Decisions

Why local ChromaDB over Pinecone?

No cloud dependency. The vector store lives at data/memory/chroma/ — just files on disk. Zero cost, zero round-trip latency, fully git-backupable.

Why the Claude Agent SDK subprocess model?

Each agent invocation is isolated. If it crashes, nothing leaks. SDK upgrade? Restart the process. Simplest possible isolation boundary.

Why Telegram over a custom UI?

Already on my phone, laptop, and watch. Inline keyboards, file attachments, voice messages, rich formatting — all built-in. A custom UI would have taken weeks for a worse experience.

Why Markdown for schedules and skills?

Editable with any text editor, version-controlled with git, and readable by the agent itself. When Jarvis creates a new skill, it writes a .md file.

Why Redis for all stateful data?

One dependency, in-memory speed, TTL for auto-cleanup, pub/sub for future real-time features. For a single-user agent, one Redis instance is plenty.

07 Running It Yourself

# Clone
git clone https://github.com/foodlbs/openclad.git
cd openclad

# Configure
cp .env.example .env

# Install
uv sync --all-packages

# Start Redis
docker compose up -d redis

# Run
uv run python main.py

You'll need an Anthropic API key, a Telegram bot token (from @BotFather), and your Telegram chat ID. Optionally: an OpenAI key (embeddings), Google OAuth (Calendar/Gmail), and a Home Assistant URL + token.

08 What I'd Do Differently

Dedicated voice pipeline

Telegram voice transcription works but is clunky. A proper streaming voice pipeline would transform the experience.

Parameterized skills

Skills as templates: Research {topic} with depth {shallow|deep} instead of flat instruction sets.

Multi-agent orchestration

Some tasks need parallel sub-agents (researcher + writer). Single-agent model hits turn limits on complex workflows.

Observability dashboard

Task history, tool usage, cost tracking, memory growth. The event stream is there but underutilized.

09 Wrapping Up

Jarvis has been running 24/7 on my Mac for about two weeks — delivering morning news, tracking job applications, helping with research, and controlling my apartment, all through Telegram.

The total codebase is ~2,000 lines of Python across three packages, plus Markdown files for personality, skills, and schedules. The agent framework does the heavy lifting; the surrounding infrastructure — risk classification, retry logic, conversation persistence, skill routing — is what transforms a chatbot into an actual assistant.

Star the repo and check out the source at github.com/foodlbs/openclad. If you build something cool with it, I'd love to hear about it.