This is the first lecture of Stanford’s CS224N, “Natural Language Processing with Deep Learning,” taught by Christopher Manning. CS224N is one of the most influential courses in the field, and this opening session introduces the foundational idea of representing words as vectors so that a computer can work with meaning numerically.
Manning motivates the problem by asking how a machine could possibly know that two words are related, then develops the answer through word2vec, an algorithm that learns word vectors from the company a word keeps across large text corpora. He walks through the objective function the method optimizes and shows what the resulting vectors capture, including the famous regularities where relationships between words appear as consistent directions in the vector space.
Word vectors are the conceptual ancestor of the embeddings inside today’s large language models, and this lecture lays the groundwork that later sessions build on toward transformers. For a technically inclined reader who wants a rigorous, course-quality introduction to how language is turned into something a neural network can learn from, this is a definitive starting point, freely available with the rest of the course.