HeadlinesBriefing favicon HeadlinesBriefing.com

Open-source tool adds vision capabilities to local LLMs via Google Lens

Hacker News: Front Page •
×

Developer Vincent Kaufmann has created a no-API workaround that gives text-only AI models like GPT-OSS-120B computer vision capabilities. The open-source MCP server uses Google Lens and OpenCV to identify objects in images, enabling the 120-billion-parameter model to recognize hardware like NVIDIA DGX Spark systems from photos.

The tool combines 17 Google services including Search, Maps, and Translate into a local pipeline. Users install via PyPI with two commands - no API keys required. This approach bypasses commercial cloud services while maintaining functionality through browser automation and computer vision techniques.

In tests, the system correctly identified specific tech gear from cluttered desk photos. The GitHub repository shows how developers can extend local language models with multimodal capabilities without expensive API integrations - a potential game-changer for offline AI applications.