InfiniMind Uses Vision Models To Turn Video Archives Into Searchable Databases

We record everything now. Security cameras run 24/7, TV stations archive decades of broadcasts, and production crews film thousands of hours of B-roll. Almost none of it is ever seen again. It sits on servers, eating up electricity and storage costs, labeled “dark data.” It is a massive library with no card catalog, and until recently, the only way to find anything in it was to hit play and wait.

Key Takeaways
InfiniMind raised $5.8 million in seed funding led by UTEC.
Ex-Googlers Aza Kai and Hiraku Yanagita co-founded the Tokyo-based startup.
The company is relocating its headquarters from Japan to the United States.

That is the problem InfiniMind wants to solve. Founded by two former Google executives, Aza Kai and Hiraku Yanagita, the startup builds software that watches video so humans do not have to. The company just raised $5.8 million in seed funding to move its headquarters from Tokyo to the United States. Their goal is to turn petabytes of raw, unwatchable footage into a database you can search like a spreadsheet.

The big deal

Most companies treat old video as a liability. It costs money to store, and finding a specific clip—like a product mention in a 1990s broadcast or a safety violation on a factory floor—requires paying someone to watch hours of static. Because of this, valuable information is collected but never used.

If this works, that footage becomes active data. A retailer could instantly find every time a specific brand logo appeared on screen across five years of security tapes. A broadcaster could query decades of archives for specific spoken phrases or events without manual tagging. It turns a “write-only” medium into a readable one.

The real bottleneck is not what you think

March 29, 2026

The real bottleneck is not training compute

March 25, 2026

How it works

The system uses vision-language models to process visual data, audio, and speech simultaneously, creating a searchable map of the content.

Think of a library with millions of books but no card catalog. To find a specific quote, you would have to open every book and read every page until you found it. InfiniMind acts as a librarian who has already read and memorized every book. You ask for the quote, and the system hands you the specific book opened to the exact page.

Instead of just labeling a frame “dog” or “car,” the software tracks narratives and causality over long durations. It allows a user to ask complex questions about what happened in a 200-hour video file, and the system retrieves the exact timestamp where the event occurred.

The catch

The primary challenge here is usually cost. Processing video requires massive computing power, and while the founders claim cost efficiency is a major differentiator, they did not release specific pricing structures. Most existing solutions force users to choose between high accuracy and low cost.

The market is also crowded. Competitors like TwelveLabs are already building similar tools for general use. InfiniMind is betting that a focus on “enterprise” needs—like security and safety monitoring—will help them stand out. The article does not specify how the system handles privacy concerns when analyzing surveillance footage.

What now?

The company has already deployed a product called TV Pulse in Japan to help media companies track brand exposure. The team is now using its new funding to expand engineering infrastructure and hire more staff.

If you manage large video archives, keep an eye out for their flagship platform, DeepFrame. It enters beta testing this March, with a full release scheduled for April 2026.