Infra-data analysis: blog #3
Modeling natural language

Historical perspective on the problem

After many advances of the natural language processing, GPT-2 already in 2019 demonstrating quite impressive possibilities of 'generative language like structures', the field of computer linguistics have been bursting with new insights and advancements in this area.
There are many advances in post-GPT-2 such as larger and more Powerful Models (Figure left):

GPT-3 was released in 2020 with 175 billion parameters, compared to GPT-2's 1.5 billion. This increase in parameters allowed GPT-3 to generate even more fluent and accurate language, perform a wider array of tasks without specific fine-tuning, and exhibit impressive zero-shot and few-shot learning capabilities.

Multimodal Models which are working beyond text, models like DALL-E and CLIP combined text and image processing, enabling new forms of content generation and understanding. These models demonstrated the ability to create images from textual descriptions and understand images in the context of textual information.

Transformer Architecture Enhancements are important as they have transformer variants. The introduction of the Transformer-XL allowed for better handling of long-term dependencies in text, while models like Reformer and BigGAN introduced more efficient ways to scale up transformers.

Sparse Transformers models reduce computational complexity by focusing on the most relevant parts of the input, which makes training large models more feasible

Yet the main question stays on the real capabilities of natural language processing tasks by LLMs (see Melanie Mitchell's work on this topic). 

Examples of n-arity structures in language

All human language systems inherit tree-like structures, which are known in generative linguistics schools (Chomsky). These linguistic structures are important to parse the language data for creating sense-making parsed amounts of information.  Semantic (or in general language, sense-making) level is connected to syntactic level, hence at tokenisation at each level (from syntactic to lexical levels where the tokenisation is usually happening) will be connected to further levels of splitting into tokens or language patterns.