Latest News and Articles

Google Open-Sources “Always On” Agent Memory System, Bypassing Vector Databases

08.03.2026

Google has released an open-source “Always On Memory Agent” that fundamentally shifts how AI agents retain and recall information. Unlike conventional systems reliant on vector databases, this agent uses a large language model (LLM) to manage persistent memory directly, storing data in SQLite and consolidating it in the background. The project, built with Google’s Agent Development Kit (ADK) and Gemini 3.1 Flash-Lite, marks a notable step towards continuous, long-running AI autonomy.

The Shift Away From Vector Databases

For years, AI agent memory has largely depended on vector databases for efficient retrieval. This new approach bypasses that complexity entirely, relying instead on the LLM’s ability to organize and update memory directly. This simplifies infrastructure, potentially reducing costs and operational overhead, particularly for smaller or medium-sized agents. The design trades vector search latency for model latency, shifting the performance bottleneck.

Why This Matters: The Rise of Persistent AI

The move reflects a growing demand for AI systems that operate continuously, retaining context across extended interactions. This is crucial for applications like long-term research assistance, internal copilots, and automated workflows. However, persistent memory also introduces new governance challenges. Unlike session-bound agents, systems with continuous memory require clear policies on data retention, auditing, and access control.

How It Works: Simplified Architecture

The agent operates as a long-running service, ingesting various data types (text, image, audio, video, PDF) and storing structured memories in SQLite. Scheduled consolidation, by default every 30 minutes, ensures the LLM regularly updates its knowledge base. A local HTTP API and Streamlit dashboard provide access and monitoring capabilities. The key claim is that no vector database or embedding pipelines are necessary; the LLM handles memory organization itself.

Flash-Lite’s Role: Economics and Performance

Google’s Gemini 3.1 Flash-Lite model powers the system, providing a balance of speed and cost-effectiveness. Priced at $0.25 per 1 million input tokens and $1.50 per 1 million output tokens, Flash-Lite is 2.5 times faster than Gemini 2.5 Flash and delivers a 45% increase in output speed. The model’s performance (Elo score of 1432 on Arena.ai) makes it viable for high-frequency, always-on operations.

Governance and Scalability Concerns

The release has already sparked debate, with experts pointing out the compliance risks of uncontrolled memory consolidation. Without deterministic boundaries, an agent could “dream” and cross-pollinate memories in unpredictable ways, creating audit and liability nightmares. Scaling the system also raises questions about memory drift, looping behaviors, and retrieval efficiency as the knowledge base grows.

The Bigger Picture: Agent Runtime Strategy

Google’s ADK frames this not as a standalone demo, but as part of a broader agent runtime strategy. The framework is model-agnostic and supports various deployment patterns, including Cloud Run and Vertex AI Agent Engine. This suggests a vision of agents as deployable software systems, with memory as an integral runtime layer.

In conclusion, Google’s open-source memory agent signals a shift towards more persistent and autonomous AI systems. While the technology offers compelling efficiency gains, its long-term success will hinge on addressing governance concerns and ensuring scalability in real-world enterprise deployments.