Cosine similarity

Cosine similarity is a metric used to measure how similar two vectors are, which is often used in the context of text similarity or clustering tasks in machine learning. The cosine similarity between two vectors is defined as the cosine of the angle between them, which can be computed as the dot product of the vectors divided by the product of their magnitudes.

Here is the formula for cosine similarity between two vectors AA and BB:

Cosine Similarity=A⋅B∥A∥∥B∥Cosine Similarity=∥A∥∥B∥A⋅B​

Where:

  • A⋅BA⋅B is the dot product of the vectors AA and BB.
  • ∥A∥∥A∥ and ∥B∥∥B∥ are the magnitudes (or norms) of the vectors AA and BB.

Implementation Using Python:

Import Libraries:

import numpy as np

from sklearn.metrics.pairwise import cosine_similarity

  • numpy is used for numerical operations and creating arrays.
  • cosine_similarity from sklearn.metrics.pairwise computes the cosine similarity between two vectors.

Define Embeddings:

embedding_1 = np.array([1, 2, 3])

embedding_2 = np.array([4, 5, 6])

embedding_3 = np.array([7, 8, 9])

  • These are example embeddings represented as 3-dimensional vectors. In practice, these vectors would come from some embedding technique like Word2Vec, GloVe, or BERT.

Compute Cosine Similarity:

similarity_1_2 = cosine_similarity([embedding_1], [embedding_2])

similarity_1_3 = cosine_similarity([embedding_1], [embedding_3])

similarity_2_3 = cosine_similarity([embedding_2], [embedding_3])

  • cosine_similarity function computes the similarity between each pair of embeddings.
  • The function returns a 2D array (matrix), but since we are comparing single pairs of vectors, it’s a 1×1 matrix. We extract the value with [0][0].

Print Results:
print(f”Cosine Similarity between embedding_1 and embedding_2: {similarity_1_2[0][0]}”)

print(f”Cosine Similarity between embedding_1 and embedding_3: {similarity_1_3[0][0]}”)

print(f”Cosine Similarity between embedding_2 and embedding_3: {similarity_2_3[0][0]}”)

Results Interpretation

The cosine_similarity function computes the similarity between the given vectors. The output values range between -1 and 1:

  • 1 means the vectors are identical (i.e., they point in the same direction).
  • 0 means the vectors are orthogonal (i.e., they are at 90 degrees to each other, no similarity).
  • -1 means the vectors are diametrically opposite.

Given the embeddings:

  • embedding_1 = [1, 2, 3]
  • embedding_2 = [4, 5, 6]
  • embedding_3 = [7, 8, 9]

The similarity calculations are:

  • Cosine similarity between embedding_1 and embedding_2:similarity_1_2=1⋅4+2⋅5+3⋅612+22+32⋅42+52+62=4+10+1814⋅77=321078≈0.974similarity_1_2=12+22+32​⋅42+52+62​1⋅4+2⋅5+3⋅6​=14​⋅77​4+10+18​=1078​32​≈0.974
  • Cosine similarity between embedding_1 and embedding_3:similarity_1_3=1⋅7+2⋅8+3⋅912+22+32⋅72+82+92=7+16+2714⋅194=502716≈0.974similarity_1_3=12+22+32​⋅72+82+92​1⋅7+2⋅8+3⋅9​=14​⋅194​7+16+27​=2716​50​≈0.974
  • Cosine similarity between embedding_2 and embedding_3:similarity_2_3=4⋅7+5⋅8+6⋅942+52+62⋅72+82+92=28+40+5477⋅194=12214938≈0.974similarity_2_3=42+52+62​⋅72+82+92​4⋅7+5⋅8+6⋅9​=77​⋅194​28+40+54​=14938​122​≈0.974

The embeddings are collinear, leading to cosine similarities very close to 1. This indicates a high degree of similarity between all pairs of vectors.

Related Posts

Top Generative AI Tools for Content Creation in 2024

Generative AI has come a long way, reshaping how content creators, marketers, and businesses think about producing engaging content. In 2024, the landscape is more vibrant than…

Step-by-Step Guide to Creating the Plugin

ChatGPT plugins enhance the chatbot experience by providing the underlying language model with recent, personal, or specific data that was not included in the model’s training data….

Large language models

Large language models (LLM) are very large deep learning models that are pre-trained on vast amounts of data. The underlying transformer is a set of neural networks that…

Revolutionizing Content Creation with Generative AI: A Deep Dive

Generative AI can help create content that is engaging and tailored to their specific audience.Generative AI is a category of artificial intelligence in which AI models can…

Mastering ChatGPT for Developers: Practical Use Cases and Time-Saving Tips

As a developer, your time is precious. ChatGPT, a powerful large language model, can become an invaluable asset by automating tasks, sparking inspiration, and streamlining your workflow….

How To Create a custom GPT Using ChatGPT (No Code Required)

You can create your own GPT-powered chatbot, with no code required. Step 1: Getting Started First things first, I needed a GPT-Plus subscription to access the new…

Leave a Reply

Your email address will not be published. Required fields are marked *

Share via
Copy link