Image Transformer[1] is a sequence modeling formulation of image generation generalized by
Transformer, which restricting the self-attention mechanism to attend to local neighborhoods, while maintaininglarge receptive field. There are some details of reading and implementing it.