It begins?

Ferret-UI offers the possibility of advanced control over a device like an iPhone. By understanding user interface elements, it offers the possibility of Siri performing actions for users in apps, by selecting graphical elements within the app on its own.

There are also useful applications for the visually impaired. Such an LLM could be more capable of explaining what is on screen in detail, and potentially carry out actions for the user without them needing to do anything else but ask for it to happen.

via Apple Insider