
We grow up exploring the world in terms of words: we name objects (e.g., head), activities (e.g., think), and properties (e.g., smart). We attend school and learn in terms of words. We communicate with one another and again use words. It seems obvious that language consists of words.
Yet systems such as ChatGPT process language in terms of tokens.
Tokens are not words; they are frequent combinations of neighboring characters, identified as recurring sequences in written language and not directly associated with meaning. Although tokens are called subword units, they are not morphemes, though many morphemes (e.g., -er in leader) are tokens. From the AI’s perspective, language is an uninterrupted sequence in which even spaces are characters that combine with their neighbors. Thus, AI tokens are not recognized as units in the language sciences and linguistics.
This raises fundamental questions: If ChatGPT does not operate with words and meanings, how does text meaning arise? How is language fluency possible at all? Do tokens have a parallel in the natural sciences?
If you are interested in how tokens are determined and used, and what token-based language processing implies for linguistic research and work with AI in general, join us online for the first workshop in the series Linguistics Meets ChatGPT on March 23, 2026.
Details and registration: https://gaussaiglobal.com/LingTransformer
Stela Manova
PI, Gauss:AI Global
DOI: 10.13140/RG.2.2.13858.29127
© 2026 Gauss:AI Global
Sterngasse 3/2/6, A-1010 Vienna, Austria

