Inverting Transformers Significantly Improves Time Series Forecasting

Transformers are great at NLP and computer vision tasks, but I was surprised to learn they still lag behind simple linear models at time series forecasting.

The issue is how most Transformer architectures treat each timestamp as a token and fuse all the variable data from that moment. This makes two big problems:

Variables recorded at slightly different times get blurred together, losing important timing info
Each token can only see a single moment, no long-term dependencies

So Transformers struggle to extract useful patterns and correlations from the data.

Some researchers from Tsinghua University took a fresh look at this and realized the Transformer components themselves are solid, they just need to flip the architecture for time series data.

Their "Inverted Transformer" (or iTransformer):

Makes each variable's full history into a token, instead of each timestamp
Uses self-attention over variables to capture relationships
Processes time dependencies per variable with feedforward layers

This simple tweak gives all the benefits we want:

State-of-the-art forecasting accuracy, beating both linear models and standard Transformers
Better generalization to unseen variables
Increased interpretability
Ability to leverage longer historical context

TLDR: Inverting Transformers to align with time series structure allows them to outperform alternatives in working with time series data.

Full summary. Paper is here.

submitted by /u/Successful-Western27
[link] [comments]