Statistically we are cooked
In order for an LLM to identify harmful content, that harmful content must be included in the model's weights. If you train a model on data that omits this information, then it may naively regurgitate harmful content provided by a human users witho…