Word2vec: for clustering, right?

So yes, absolutely. Of course it depends on your needs.

Word2vec is known for creating word embeddings using text sentences. Words are represented as vectors in an n-dimensional space. More special, vectors in this space can be added and substracted, the famous:

king – man + woman = queen

Edit: Pinterest applied the word2vec model to the pins in a session: pin2vec. Using this technique similar pins could be identified. More information can be found here: https://engineering.pinterest.com/blog/applying-deep-learning-related-pins (It is categorized under deep-learning: this seems discutable.)

It turns out that web visits can also be seen as sentences, and each URL as a word. Applying word2vec on web-visits this way, and then using t-sne for plotting shows that indeed similar URL’s are clustered near each other (sorry, no picture disclosure). URL2vec? It is like Uber, but then for …

Less intuitive though is the substraction and addition of URL vectors.  Substracting one URL from another gives …. Well I will have to find out later. For now, the word2vec vector could act as a condensed input for a neural network instead of a large bag of URL’s.

Leave a Reply