Inverting Transformers Significantly Improves Time Series Forecasting
Inverting Transformers Significantly Improves Time Series Forecasting

Inverting Transformers Significantly Improves Time Series Forecasting

Transformers are great at NLP and computer vision tasks, but I was surprised to learn they still lag behind simple linear models at time series forecasting.

The issue is how most Transformer architectures treat each timestamp as a token and fuse all the variable data from that moment. This makes two big problems:

  • Variables recorded at slightly different times get blurred together, losing important timing info
  • Each token can only see a single moment, no long-term dependencies

So Transformers struggle to extract useful patterns and correlations from the data.

Some researchers from Tsinghua University took a fresh look at this and realized the Transformer components themselves are solid, they just need to flip the architecture for time series data.

Their "Inverted Transformer" (or iTransformer):

  • Makes each variable's full history into a token, instead of each timestamp
  • Uses self-attention over variables to capture relationships
  • Processes time dependencies per variable with feedforward layers

This simple tweak gives all the benefits we want:

  • State-of-the-art forecasting accuracy, beating both linear models and standard Transformers
  • Better generalization to unseen variables
  • Increased interpretability
  • Ability to leverage longer historical context

TLDR: Inverting Transformers to align with time series structure allows them to outperform alternatives in working with time series data.

Full summary. Paper is here.

submitted by /u/Successful-Western27
[link] [comments]