Studying Multi-head Attention in the Style Tokens Paper

I am trying to understand the multi-head attention introduced in the paper, “Attention Is All You Need“. The purpose to understand the multi-head attention is understand the style token layer, which contains multi-head attention and was introduced in the paper, “Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis“. Multi-head attention is […]