Positional Embeddings Clearly Explained — Integrating with the original Embeddings

9 min readJun 6, 2024

Photo by Sangga Rima Roman Selia on Unsplash

1. What is Positional Embedding?

According to Attention Is All You Need, the positional embedding formula uses sine and cosine functions to encode the position of each token in a sequence. The encoding for a position (k) and dimension (i) is defined as follows:

The positional embedding formula uses sine and cosine functions to encode the position of each token. Photo by Author

But why sine and cosine? And how does it actually work? You have come to the right place. This article aims to demystify positional embeddings and elucidate how they work in tandem with original embeddings to empower NLP models.

Before delving into the intricacies of positional embeddings, let’s have a brief revision of what is embedding and at the same time we will learn what k, i, and d actually are. I believe most of you are like me, we know what they are, but still, we want a way to visualize them in our mind.

What is embedding? How does it actually look like?

Imagine you have a huge box of Crayola (coloring pencils). You can mix the color of each color pencil on the drawing paper, and there are countless ways to do it. In…

Positional Embeddings Clearly Explained — Integrating with the original Embeddings

1. What is Positional Embedding?

What is embedding? How does it actually look like?

Written by Lorentz Yeung