Google senior AI product manager Shubham Saboo has open-sourced an "Always On Memory Agent" on the official Google Cloud Platform Github page under a permissive MIT License, allowing for commercial usage. The project was built with Google's Agent Development Kit (ADK) and Gemini 3.1 Flash-Lite, a low-cost model introduced by Google as its fastest and most cost-efficient Gemini 3 series model. The Always On Memory Agent serves as a practical reference implementation for an agent system that can ingest information continuously, consolidate it in the background, and retrieve it later without relying on a conventional vector database. The agent runs continuously, ingests files or API input, stores structured memories in SQLite, and performs scheduled memory consolidation every 30 minutes by default, using a local HTTP API and Streamlit dashboard. The system supports text, image, audio, video, and PDF ingestion, and frames the design with an intentionally provocative claim: "No vector database. No embeddings. Just an LLM that reads, thinks, and writes structured memory." The use of Gemini 3.1 Flash-Lite provides the always-on model with some economic logic, with the model priced at $0.25 per 1 million input tokens and $1.50 per 1 million output tokens, and delivering a 45% increase in output speed while maintaining similar or better quality. The release has sparked a debate about governance, with several responses highlighting concerns about compliance, drift, and loops, and the need for deterministic boundaries, retention guarantees, segregation rules, and formal audit workflows. The Always On Memory Agent is seen as a reference point for a broader agent runtime strategy, with the ADK framework presented as model-agnostic and deployment-agnostic, supporting workflow agents, multi-agent systems, tools, evaluation, and deployment targets including Cloud Run and Vertex AI Agent Engine.

Google senior AI product manager Shubham Saboo has open-sourced an “Always On Memory Agent” on the official Google Cloud Platform Github page under a permissive MIT License, allowing for commercial usage. The project addresses a key problem in agent design, providing a practical reference implementation for an agent system that can ingest information continuously, consolidate it in the background, and retrieve it later without relying on a conventional vector database.

The Always On Memory Agent was built with Google’s Agent Development Kit (ADK) and Gemini 3.1 Flash-Lite, a low-cost model introduced by Google as its fastest and most cost-efficient Gemini 3 series model. The agent runs continuously, ingesting files or API input, storing structured memories in SQLite, and performing scheduled memory consolidation every 30 minutes by default. A local HTTP API and Streamlit dashboard are included, and the system supports text, image, audio, video, and PDF ingestion. The design choice to use a large language model (LLM) to organize and update memory directly, rather than relying on a vector database, is likely to draw attention from developers managing cost and operational complexity.

The use of Gemini 3.1 Flash-Lite gives the always-on model some economic logic, with Google pricing the model at $0.25 per 1 million input tokens and $1.50 per 1 million output tokens. The model is 2.5 times faster than Gemini 2.5 Flash in time to first token and delivers a 45% increase in output speed while maintaining similar or better quality. The pairing of Flash-Lite with a background-memory agent is significant, as it provides predictable latency and low enough inference cost to avoid making “always on” prohibitively expensive.

The release of the Always On Memory Agent has sparked a debate about governance and operational burden, with several responses highlighting concerns about compliance, drift, and loops. Enterprise architects are likely to raise questions about who can write memory, what gets merged, how retention works, when memories are deleted, and how teams audit what the agent learned over time. The tradeoff for developers is less about ideology than fit, with a lighter stack being attractive for low-cost, bounded-memory agents, while larger-scale deployments may still demand stricter retrieval controls, more explicit indexing strategies, and stronger lifecycle tooling.

The Always On Memory Agent is interesting on its own, but the larger message is that Saboo is trying to make agents feel like deployable software systems rather than isolated prompts. In that framing, memory becomes part of the runtime layer, not just an add-on feature. The release lands at the right time, as enterprise AI teams are moving beyond single-turn assistants and into systems expected to remember preferences, preserve project context, and operate across longer horizons. However, the strongest takeaway from the reaction around the launch is that continuous memory will be judged on governance as much as capability, with the real enterprise question being whether an agent can remember in ways that stay bounded, inspectable, and safe enough to trust in production.

Techno News

Google senior AI product manager Shubham Saboo has open-sourced an “Always On Memory Agent” on the official Google Cl…

Leave a Reply Cancel reply

Recent Posts