sim-use turns mobile app automation into a real observe-act-verify loop for AI agents

June 30, 2026updates

sim-use gives AI agents a compact cross-platform CLI for reading and acting on iOS Simulator and Android screens, making mobile UI verification feel much closer to a real agent loop than a pile of brittle test scripts.

GitHub README capture for lycorp-jp/sim-use

A lot of agent tooling still assumes the web is the main interface worth automating. That misses a practical gap for product teams building mobile apps: the moment an agent needs to verify a real screen, tap through a flow, or confirm that a UI change actually rendered on device, the workflow often falls back to brittle test scripts, image diff hacks, or a human staring at a simulator.

sim-use is interesting because it treats that gap as a first-class product surface instead of a side quest. The repo gives AI agents a single CLI for both iOS Simulator and Android emulator or device screens. It can describe the visible UI in a compact, token-efficient outline, act on elements without relying on raw coordinates, and then verify the result in the same loop.

That sounds simple, but it is exactly the kind of infrastructure that makes mobile agent workflows feel real instead of theatrical.

What the repo is actually building

At its core, sim-use is an observe, act, verify runtime for mobile interfaces.

The README shows the mental model clearly:

sim-use ui
sim-use tap @9
sim-use ui

Instead of dumping a giant accessibility tree into an LLM, the tool turns the current screen into a compact outline that preserves structure while staying cheap to read. Elements get aliases like @9, which means the next action can target something the model just saw without re-solving the whole screen from scratch.

That design choice matters more than it first appears. A lot of automation tooling is technically powerful, but awkward for agents because every action requires either verbose JSON parsing or fragile coordinate-based control. sim-use is intentionally shaped around what an LLM loop can consume and produce quickly.

Under the hood, the project bridges Apple Accessibility APIs, the iOS Simulator HID pipeline, and Android AccessibilityService behind one command surface. The repo also claims a per-device background daemon so repeated calls land in roughly a few hundred milliseconds after warm-up, which is exactly the kind of latency improvement that determines whether an agent loop feels usable.

Why this feels different from traditional mobile test tooling

The most interesting thing about sim-use is that it is not pretending to be another generic QA framework.

It is built for agents first.

That shows up all over the repo. The ui command is optimized for token efficiency. Tap targets can use cached aliases, stable IDs, labels, or coordinates as a last resort. Errors can be emitted in structured JSON with actionable hints. There is even a bundled client-init flow so an AI coding tool can install the skill and learn the command surface directly.

That product framing is smart. Traditional mobile automation stacks were designed around human-authored test suites, not tight model loops. Once the user becomes an LLM, the interface quality of the tooling matters just as much as the low-level device control. sim-use understands that.

The real win is closing the mobile feedback loop

For builders, the strongest idea here is not just screen control. It is loop closure.

A coding agent can already write UI code. The missing step is often verifying what actually happened inside a simulator or emulator without asking a human to take over. sim-use moves that missing step closer to a usable default.

That is a bigger deal for mobile than it is for the web. Browser automation already has a mature ecosystem of DOM-aware tools. Mobile screens are harder. Accessibility trees vary, system prompts interrupt flows, keyboards behave differently, and the gap between source code and visible result is more expensive to traverse. A tool that unifies observation and action across iOS and Android lowers that friction in a way product teams can actually feel.

The repo is also wisely opinionated about selectors. Alias-based taps are fast, #<id> selectors are more stable, labels help for scripted flows, and coordinates remain the fallback. That layering acknowledges reality instead of overselling one perfect method.

Where the repo feels especially product-minded

There are a few details in the README that make the project feel more mature than a typical fresh GitHub spike.

First, the cross-platform surface is consistent. The same top-level verbs work across iOS and Android, which means teams can design one agent workflow instead of two separate automation stacks.

Second, the tool is explicit about ugly real-world edge cases. The README documents hardware keyboard assumptions for paste behavior, iOS paste permission prompts, Android bridge installation, and when to switch interaction strategies. That is the sort of operational honesty that makes an automation tool trustworthy.

Third, the repo frames performance as part of the product. A fast loop is not a vanity metric here. If every observe or tap round trip is slow, agents become annoying and expensive. If the loop is fast enough, the tool becomes something you can actually keep inside an iterative development workflow.

I also like that sim-use is open source under Apache 2.0. For teams experimenting with agent-assisted QA, internal tooling, or device-farm style workflows, that matters. It lowers the friction to adopt, fork, and adapt the project to real pipelines.

What builders should watch next

sim-use is early, so the interesting questions are less about whether the core idea is valid and more about how far the abstraction can stretch.

The next frontier is probably reliability under messy app states: deep navigation stacks, mixed native and web content, permission prompts, loading transitions, and flaky accessibility metadata. Another important question is how well the tool composes with larger agent systems that need memory, planning, retries, and artifact capture around the raw device actions.

But those are good problems to have. They mean the repo is already operating at the right layer.

Why this repo stands out

The best open-source tooling often does not invent a completely new category. It sharpens an awkward, high-friction workflow until it becomes something teams can actually build around.

That is what sim-use is doing for mobile agent automation. It turns device interaction into a compact command surface that models can reason about, act on, and verify without pretending the browser is the whole world.

For anyone building AI-assisted mobile workflows, that is a repo worth watching.

Repo

GitHub: https://github.com/lycorp-jp/sim-use