in the Book: Music Language and the Brain -Aniruddh D. Patel, is Speech Processing separated from Music Processing in the Brain. Speechprocessing is more deeper inside the Neurocortex, while the Music Processing get's translated from Speech there.
Vice Versa: Music to Speech is in a different area, upper from the previous area.
Hence, i have separated both.
Visualization of LLMs: https://bbycroft.net/llm
Source: Deepmind, Formal Algorithms for Transformers, 19.7.2022
"Gibbs gelang eine weitere Verallgemeinerung der meist auf Gase beschränkten Ergebnisse Maxwells und Boltzmanns auf beliebige Systeme, indem er den Begriff des Ensembles einführte."
Wikipedia
> Meta's Llama Modell got Open Sourced.
Thus, Stanford has produced his Alpaca Modell, which could be run on a local machine (if intended).
Meanwhile ChatGPT(OpenAI& Microsoft) and Claude (Antrophic & Google) are the key leading companies.
> Apple is still pending..
After learning all sorts of Mathematical foundations, we can apply these in Speech Processing.
First of all, we try to sample as many words or text, as you have done. (I talk here in an analog view.)
Second, you can order & filter these words/text into groups. Afterward, you can possibly convert the words into numbers.
The main reason is, that everyone speaks differently, and thus has a different frequency spectrum of the spoken word.
The spoken word can be regarded & Encoded in the frequency spectrum of each individual speaker.
Later you can 'Overlay' from several speakers the data like images, thus you can regard a word, pauses, or sentences, by overlaying many data frames. The advantage of math in speech processing is, that the output f(x) is already given and also the input x. This means by correlating many samples, you can try to classify simple words.
Third, if you like to be more precise like discovering different accents in the same language, then you have to apply optimization techniques (gradient descent), which you can regard in Andrej Karpathy Course and afterward stochastics.
Today+ Before: Markov Chains are describing exactly the system in which the language of the model is encoded (model: word, model: sentences, model: pragmatism, etc.). These apply the stochastics, thus 1. the states of all possible words and 2. the time, when the word is spoken. Both are part of the Random Sequence (each described separately in two Random Sequences) in which your sentence is spoken. (Note: Probability isn't a Probability Distribution. )
Lastly Thus, by Comparing the Systems Response, with your Spoken word, you get a comparison, of whether the word is detected or not,+ furthermore details of melody, intent, etc.
In the Literature the 1. states are encoded in the Transition Matrix, while 2. the time is encoded in the Emission Matrix. (where the words are spoken). For the fine-tuning, you can find the optimal paths (Viterbi Algorithm), let the model 'predict' from its Markov model(feedback or forward-forward), train by the large dataset, and lastly, reference by the user's correction.
>Understanding Language is the key to understanding, how the brain works.
Learned from a passionate Prof. and aggregated into this Page.
Understanding Noise, thus Random processes is the key component to understanding mathematically Life, as humans.
In the past, Neumann and Ulam utilized Markov Chain Monte Carlo - Methods (MCMC) to find optimums in random processes. (for the Max-Function in these Processes in the new Forward-Forward-Algorithm from Hinton Dec, 2022)
Today, we have the necessary computer process power (calculator) to reconstruct single molecules or sentences.
Der Indogerman. Stammbaum fasst die Abstammung mehrfacher Sprachfragmente in der Summe der Erforschungen zusammen.
Notable: Language Models has to be trained with a lot of care and research
Else you burn your ressources, Dollars & Feelings.