My thinking goes like this:
1) people used to keep their opinions to themselves much more than today
2) social media put our opinions on a hair trigger
3) negative public opinioms turned the collective voice of the human race from 'gemerally respectful' to shrill and hideous. When person from group A complains about group B, everyone in group B assumes everyone in group A hates them, even though that persons opinion may just have been his own. The response to being hated is to hate back. Not-so-positive positive feedback loop.
Social media really started taking off with Facebook. So let's say this explosion of data vitriol started happening around 2007. What I want to know is if you trained an llm entirely on data from the early 2000s, 1990s and 1980s, how would the models do on some of these ominous white-paper tests, like the one where the AI blackmails the CEO to prevent from being turned off, or let's the guy die in a hot room?
I know there was lots of awful stuff on the internet back then too, but not like now. I want to know how much safe those llms are by comparison if there's enough data from back then to train on.
[link] [comments]