Embeddings

Word Vectors & Embeddings

2013 - Google Word2Vec Efficient Estimation of Word Representations in Vector Space

Google’s word vectors had another intriguing property: you could “reason” about words using vector arithmetic. ….this is where the king-man = queen example comes from

…but these associations are also where Because these vectors are built from the way humans use words, they end up reflecting many of the biases that are present in human language.

Words can often have multiple meanings …. so meaning depends on context (john just left, or John is left handed) or bank (financial vs river bank)

word vectors are a way for LLM’s to capture word meaning

Emebeddings

Token embedding is the process of taking the matched tokens from the model’s vocabulary and converting that into a dense numerical vector (per token) and embedding it into an “embedding layer.” Think of it as layer 1 of an exponentially many that are to follow.
- These vectors are a clever way of encoding semantic meaning into tokens. Think of it a little like attaching metadata to something. For a specific cat, some of the metadata could be: orange, fluffy, striped, male, mean, and large, which can be used to paint a pretty good picture that you are talking about a large male fluffy orange tabby cat that is mean.
These token vectors exist in what’s known as high dimensional vector space.
- In this space, directionality and proximity (to other token vectors) can imply relationships. Take the Washington Monument (WM) and the Eiffel Tower (ET). In vector space WM would be positioned closer to Washington DC than to Paris, whereas ET would be closer to Paris than Washington DC. Both would be positioned close to “Tourist attraction” and “structure” whereas they both would be quite from “food.” It’s ok to think of this example three dimensionally, but keep in mind these models usually operate where the associated dimensions can range from the hundreds to the many thousands.
  - Note: this high dimensional relation
- give example here of king-man = queen

To interact with an LLM, you need to input some text. The LLM accepts that text and tokenizes it, which is a process of breaking the text into smaller units. Sometimes these token units are words, subwords, or even single characters but for this explanatory exercise it is easier to pretend a token unit represent a single discrete word unit.

dictionary embedding.

embeddings are a little like meta data for words

Let’s take the word “cat” some meta data associated with the could be small, mammal, furry, soft, pet, purrs, claws, predator ….in vector space. Many of these words could also describe a dog so it’s not to hard to magine that a dog would be close in vector space, whereas whale would be described by a different set of words.

nasty disposition,

Intuitively see it this way. take 3 dimensional space and add time to it. [x,y,z,t] => [2,4,2, 9]

Initially it intuitively feels difficult to think of it beyond three dimensional terms, but if you use 4 dimensional spacetime as a reference you can quickly see how the higher dimensions would work. Spacetime has 3 spatial dimensions (the x,y,z coordinates) and a temporal (time) dimension ( [x,y,z,t] ). If you think of a situation where the spatial dimensions are held constant while the time dimension is changed it’s easy to see that t1 is different from t0. Now extend that out to potentially hundreds of dimensions and it’s fairly easy to see how multidimensional vector space would work.

Embedding

inputs
- tokens
- token position
- creates the Input embed (vectorized)