Positional Embeddings Clearly Explained — Integrating with the original Embeddings

Lorentz Yeung
9 min readJun 6, 2024

1. What is Positional Embedding?

According to Attention Is All You Need, the positional embedding formula uses sine and cosine functions to encode the position of each token in a sequence. The encoding for a position (k) and dimension (i) is defined as follows:

The positional embedding formula uses sine and cosine functions to encode the position of each token. Photo by Author

But why sine and cosine? And how does it actually work? You have come to the right place. This article aims to demystify positional embeddings and elucidate how they work in tandem with original embeddings to empower NLP models.

Before delving into the intricacies of positional embeddings, let’s have a brief revision of what is embedding and at the same time we will learn what k, i, and d actually are. I believe most of you are like me, we know what they are, but still, we want a way to visualize them in our mind.

What is embedding? How does it actually look like?

Imagine you have a huge box of Crayola (coloring pencils). You can mix the color of each color pencil on the drawing paper, and there are countless ways to do it. In…

--

--

Lorentz Yeung

Data Analyst in Microsoft, Founder of El Arte Design and Marketing, Certified Digital Marketer, MSc in Digital Marketing, London based.