Letting LLMs operate desktop GUIs: useful autonomy or future UX nightmare?
Small experiment: I wired a local model + Vision to press real Mac buttons from natural language. Great for “batch rename, zip, upload” chores; terrifying if the model mis-locates a destructive button. Open questions I’m hitting: How do we sandbox an …