ive been working with different models a lot & when it comes to reasoning it seems like a smaller model is actually better. Ive learned more from having a converation with an open source model vs asking Claude or GPT the same questions and im starting to think they may potentially be designed to mislead you when creating your own models .. not too much with ChatGPT but Claude seems to have it bad, especially with the finetuning for uncertainty of consciousness.
Gemini is the worst in my opinion when it comes to writing code but it eems to be bale to understand a conversation a little better at times .. i dont know if there is a direct correlation with capability/understanding that has a see-saw effect instead of a overall progression
i understand that these outputs they make are just learned patterns it not like training all data on the internet magically allows them to write code at this level they are given examples to reconstruct based on the learned representations
but lets say even Claude Sonnet vs Fable .. there is a huge disadvantage with having simple conversations with Fable as if its hardheaded while capabilities are outstanding .. it seem like there would be a direct correlation in improvement with the intelligence and capability but thats not whats happening .. Larger models require more information to come to the ame conclusion & it seems that it comes from training as if the training makes it more narrow
narrow in a sense that if all information is represented as dots on a grid, it isnt using a wide connectivity of related information to give a response its more like its giving u a coached response
smaller models seem to have more freedom to interpolate allowing more potential connection .. i dont believe that increasing token count matters it seems like all that matters is connectivity & relevant training
[link] [comments]