Image Transformer[1] is a sequence modeling formulation of image generation generalized by
Transformer
, which restricting the self-attention mechanism to attend to local neighborhoods, while maintaininglarge receptive field
. There are some details of reading and implementing it.