Word Embeddings
18 Oct 2024Introduction
Word embeddings are one of the most significant advancements in natural language processing (NLP). They allow us to transform words or sentences into vectors, where each word is represented by a point in a high-dimensional space. The core idea is that words with similar meanings are close to each other in this space, making it possible to use mathematical operations on these vectors to uncover relationships between words.
In this post, we’ll explore how to create word embeddings using a pre-trained model, and we’ll perform various vector operations to see how these embeddings capture semantic relationships. We’ll cover examples like analogy generation, word similarity, and how these embeddings can be leveraged for search tasks.
What Are Word Embeddings?
Word embeddings are dense vector representations of words, where each word is mapped to a point in a continuous vector space. Unlike older techniques (such as one-hot encoding) that give each word a unique identifier, embeddings represent words in a way that captures semantic relationships, such as similarity and analogy.
For example, embeddings can represent the relationship:
This is made possible because words that are semantically similar (e.g., “king” and “queen”) have vector representations that are close together in space, while words that are opposites (e.g., “good” and “bad”) may have vectors pointing in opposite directions.
Gensim
Let’s begin by loading a pre-trained word embedding model. We’ll use the glove-wiki-gigaword-50
model, which provides
50-dimensional vectors for many common words.
This might take a moment to download. It’s not too big.
Now that we have the model, let’s try converting some words into vectors.
Converting Words to Vectors
We can take individual words and get their vector representations. Let’s look at the vectors for “king,” “queen,” “man,” and “woman.”
You’ll see that each word is represented as a 50-dimensional vector. These vectors capture the meanings of the words in such a way that we can manipulate them mathematically.
Performing Vector Arithmetic
One of the most famous examples of vector arithmetic in word embeddings is the analogy:
We can perform this operation by subtracting the vector for “man” from “king” and then adding the vector for “woman.” Let’s try this and see what word is closest to the resulting vector.
You should find that the word closest to the resulting vector is “queen,” demonstrating that the model captures the gender relationship between “king” and “queen.”
Measuring Word Similarity with Cosine Similarity
Another key operation you can perform on word embeddings is measuring the similarity between two words. The most common way to do this is by calculating the cosine similarity between the two vectors. The cosine similarity between two vectors is defined as:
\[\text{cosine similarity} = \frac{A \cdot B}{\|A\| \|B\|}\]This returns a value between -1 and 1:
- 1 means the vectors are identical (the words are very similar),
- 0 means the vectors are orthogonal (unrelated words),
- -1 means the vectors are pointing in opposite directions (possibly antonyms).
Let’s measure the similarity between related words like “apple” and “fruit,” and compare it to unrelated words like “apple” and “car.”
You will see that the cosine similarity between “apple” and “fruit” is much higher than that between “apple” and “car,” illustrating the semantic relationship between “apple” and “fruit.”
Search Using Word Embeddings
Another powerful use of word embeddings is in search tasks. If you want to find words that are most similar to a given
word, you can use the model’s similar_by_word
function to retrieve the top N most similar words. Here’s how you can
search for words most similar to “apple”:
You can see here that “apple” is treated in the proper noun sense as in the company Apple.
Each of these words has strong relevance to the company.
Averaging Word Vectors
Another interesting operation is averaging word vectors. This allows us to combine the meaning of two words into a single vector. For instance, we could average the vectors for “apple” and “orange” to get a vector that represents something like “fruit.”
There are a number of related words to both “apple” and “orange”. The average provides us with this intersection.
Conclusion
Word embeddings are a powerful way to represent the meaning of words as vectors in a high-dimensional space. By using simple mathematical operations, such as vector arithmetic and cosine similarity, we can uncover a variety of semantic relationships between words. These operations allow embeddings to be used in tasks such as analogy generation, search, and clustering.
In this post, we explored how to use pre-trained word embeddings, perform vector operations, and leverage them for real-world tasks. These foundational concepts are what power much of the magic behind modern NLP techniques, from search engines to chatbots and more.