In 2014, the doc2vec approach developed the original ideas of word2vec further. If individual words can be represented as vectors, why canβt arbitrary passages of text be represented as vectors?
The doc2vec approach takes passages of text and learns vectors for them based on the words they contain. Vectors can be learnt at the level of paragraphs, clauses or even documents.
For Legal NLP, the doc2vec approach can be used to embed clauses and legal documents into a vector space so they can then be clustered or searched.