So I mainly generate my own data bases via algorithms or paid users that write the data I use for my private models, thing is Its expensive when working on a massive project and my current project is an advanced form of text generation and chat capabilities, so my question here specifically is:
I found a data base on kaggle of 51 million discord messages from anonyms users, they have context but still require a lot of refinement and work on them, but is it ethical to use this data base which was probably collected without the knowledge of the anonyms users present in it as discords TOS are against such Data-bases and data collection...?
[link] [comments]