Transformers are great at NLP and computer vision tasks, but I was surprised to learn they still lag behind simple linear models at time series forecasting.
The issue is how most Transformer architectures treat each timestamp as a token and fuse all the variable data from that moment. This makes two big problems:
- Variables recorded at slightly different times get blurred together, losing important timing info
- Each token can only see a single moment, no long-term dependencies
So Transformers struggle to extract useful patterns and correlations from the data.
Some researchers from Tsinghua University took a fresh look at this and realized the Transformer components themselves are solid, they just need to flip the architecture for time series data.
Their "Inverted Transformer" (or iTransformer):
- Makes each variable's full history into a token, instead of each timestamp
- Uses self-attention over variables to capture relationships
- Processes time dependencies per variable with feedforward layers
This simple tweak gives all the benefits we want:
- State-of-the-art forecasting accuracy, beating both linear models and standard Transformers
- Better generalization to unseen variables
- Increased interpretability
- Ability to leverage longer historical context
TLDR: Inverting Transformers to align with time series structure allows them to outperform alternatives in working with time series data.
Full summary. Paper is here.
[link] [comments]