For someone like me who’s always been interested in practical machine learning, it was wonderfully delightful to have found Word2vec, a neural network that intakes text and outputs numerical vectors. It transforms each word in a sentence into a series of numbers that could be used to predict the probabilities of related items, on top of mathematical interpretations of similar words. This means that this method doesn’t need to know the exact definition of the words, and with enough data could better interpret relationships between words than the average human being.
Of course, that’s not all Word2vec could do. It seems that the applicability of the Word2vec (there are 2 distinct models) goes beyond predictive syntax interpretations.
If you’re interested in learning more about the Word2vec and its intricacies, check out:
- This document published by an Israelian computer science PhD (with a link to the PDF).
- A publication by the guys at GOOG on sentence and phrase compositions.