HeadlinesBriefing favicon HeadlinesBriefing.com

Rust‑Based CLI Lets AI Agents Automate Any Desktop App

Hacker News •
×

GitHub’s lahfir/agent-desktop project delivers a native desktop‑automation CLI aimed at AI agents. Written in Rust, it taps OS accessibility trees to observe and manipulate any GUI without screenshots or pixel matching. A single binary runs on macOS, Linux and Windows, exposing a structured JSON view of elements and deterministic references like @e1. This approach lets agents reason about UI state directly.

Key features include a 53 commands surface, from observation and keyboard shortcuts to mouse drags and window management. The CLI ships a C‑ABI cdylib so languages such as Python, Swift, Go, Ruby or Node can load it in‑process, avoiding a fork per call. Progressive skeleton traversal offers 78–96% token reduction on dense apps by first returning a shallow overview then drilling into targeted regions.

Typical workflows snapshot the UI, let an LLM decide on an action, then issue a click, type or shortcut before resnapshotting. Prebuilt binaries install via npm (“npm i -g agent-desktop”) or from source, and macOS requires the Accessibility permission to run. By delivering machine‑readable responses with error codes, the tool enables reliable end‑to‑end automation for apps like Finder, Safari, Slack or Xcode.