Transformers are a more recent and popular architecture used in NLP. The BERT and GPT-3 models both use the Transformer architecture. Like RNNs, Transformers consume sequential data but can process the entire text input at once. They are able to weigh the significance of each part of the input data using a technique called self-attention. This enables more efficient learning and patterns in words to be surfaced more easily.
In Legal NLP, Transformers are used to carry out a number of tasks including Named Entity Recognition, Classification, Summarisation and Question Answering.
Resources