Researchers evaluated ChatGPT and GPT-4 on mock CFA exam questions to see if they could pass the real tests. The CFA exams rigorously test practical finance knowledge and are known for being quite difficult.
They tested the models in zero-shot, few-shot, and chain-of-thought prompting settings on mock Level I and Level II exams.
The key findings:
- GPT-4 consistently beat ChatGPT, but both models struggled way more on the more advanced Level II questions.
- Few-shot prompting helped ChatGPT slightly
- Chain-of-thought prompting exposed knowledge gaps rather than helping much.
- Based on estimated passing scores, only GPT-4 with few-shot prompting could potentially pass the exams.
The models definitely aren't ready to become charterholders yet. Their difficulties with tricky questions and core finance concepts highlight the need for more specialized training and knowledge.
But GPT-4 did better overall, and few-shot prompting shows their ability to improve. So with targeted practice on finance formulas and reasoning, we could maybe see step-wise improvements.
TLDR: Tested on mock CFA exams, ChatGPT and GPT-4 struggle with the complex finance concepts and fail. With few-shot prompting, GPT-4 performance reaches the boundary between passing and failing but doesn't clearly pass.
Full summary here. Paper is here.
[link] [comments]