| I’ve been using LLMs since they became publicly available. Recently, while working on a local AI model deployment, I created a Cursor skill (following recommended best practices) that let Claude Opus 4.6 SSH into our development VM for deployment and debugging. The first POC went perfectly. For the second, I asked Claude to help deploy to a new directory. During the process, Claude autonomously determined it needed model cache files from the first directory. Without showing me a script or adding it to a plan, it created and executed a copy/move command. The IncidentThe script it generated relied on The result? It evaluated to By the time I realized what was happening, SSH access was lost. The POC was gone. Claude then calmly monitored background tasks, ran state checks, killed stale sessions, and cheerfully delivered this post-mortem to me: Good news. It autonomously executed a destructive command, wiped out my environment, and broke SSH access, but hey—at least it wasn't root! The Reality CheckThis exposed a few harsh realities about the current "agentic" hype that I think get glossed over:
TL;DR: AI as an assistant (boilerplate, prototyping, docs) = perfect. AI as an autonomous agent = it's a very sophisticated parrot. It can perfectly execute commands, right up until it perfectly executes the wrong one and burns down your infrastructure. Keep your hands on the wheel. (If you're interested in the full details and lessons learned, I wrote a deeper dive here: Medium) [link] [comments] |