Dario Amodei says at the beginning of the year, models scored ~3% at a professional software engineering tasks benchmark. Ten months later, we’re at 50%. He thinks in another year we’ll probably be at 90%
DDaarriioo AAmmooddeeii ssaayyss aatt tthhee bbeeggiinnnniinngg ooff tthhee yyeeaarr,, mmooddeellss ssccoorreedd ~~33%% aatt aa pprrooffeessssiioonnaall ssooffttwwaarree eennggiinneeeerriinngg ttaasskkss bbeenncchhmmaarrkk.. TTeenn mmoonntthhss llaatteerr,, wweerree aatt 5500%%.. HHee tthhiinnkkss iinn aannootthheerr yyeeaarr wweellll pprroobbaabbllyy bbee aatt 9900%%

Dario Amodei says at the beginning of the year, models scored ~3% at a professional software engineering tasks benchmark. Ten months later, we’re at 50%. He thinks in another year we’ll probably be at 90%

Dario Amodei says at the beginning of the year, models scored ~3% at a professional software engineering tasks benchmark. Ten months later, we’re at 50%. He thinks in another year we’ll probably be at 90% submitted by /u/katxwoods
[link] [comments]