Attention Is All* You Need * (except for fully connected layers, softmax, positional encoders, layer norm, skip connections…)…
Attention Is All* You Need
— Jeremy Howard (@jeremyphoward) February 6, 2019
* (except for fully connected layers, softmax, positional encoders, layer norm, skip connections…) pic.twitter.com/BnSQD9oEPA
(via http://twitter.com/jeremyphoward/status/1093273889449205760)