Large Language Models, Artificial Intelligence and the Future of Law
Session 3: How do we build a Large Language Model?
Session 3: How do we build a Large Language Model?
How does AI work?
At its heart, all the specific and weak AI we talked about are computer algorithms—a set of instructions that a computer follows to solve problems or perform tasks.
LLMs in particular use a specific kind of algorithm. At its heart, its got something called a transformer. A transformer is a neural network. A neural network is a kind of machine learning algorithm. Below we see what these terms mean.
Let us assume we want to program a smart airconditioner, which adjusts temperature according to our needs. It has just two sensors: 1) A watch and 2) A thermometer. How do we program it to do what we want?
We can have a Rule:
If outside temperature is too cold (below -5), then we want to set the temperature to 24. Otherwise, we set the temperature to 22.
Also, when we sleep we would like the temperature to be set to 18.
Or, we can change the temperatures manually for 30 days. Give the computer a record of the ac temperature, outside temperature and time of day, and try and make it "learn" our preferences.
Judgment Disposition Classifier
Rule:
If we have words like "disposed", "allowed", "granted" then petitioner wins, else respondent wins.
Hidden Rules:
Negation, remands, withdrawal?
Same principle with ChatGPT (Simulation Code)
image credits: https://hands-on.cloud/quick-introduction-to-machine-learning/
Neural networks are composed of interconnected layers of artificial neurons. These neurons are analogous to biological neurons.
Each neuron receives inputs, multiplies them by weights, and adds a bias term. The result is then passed through a mathematical function (activation function) to determine its output.
The network learns by feeding it examples with known answers. The difference between what the network predicted and the correct answer is used to adjust the weights using a process called backpropagation.
A trained neural network takes new inputs and uses its learned weights to calculate an output, providing predictions or classifications for new data it hasn't seen before
10 ingredients- Nutella, Lettuce, Cheese, Sugar, Peanut Butter, Tomato, Cucumber, Mayonnaise, Mud, chillies.
How do you decide which combination makes the best sandwich?
Step 1: Have a dataset of all (or as many) sentences in the language. This initial corpus is called the training corpus.
Step 2: Break the dataset into two parts. A training dataset and test dataset.
Step 3: Use any machine learning algorithm (Regression, Random Forest, Support Vector Machines (SVM)) to learn the probability of the next word in any given phrase. For example, the probability that dog is the next word after the phrase the quick brown fox jumps over.
Step 4: The machine learning algorithm will discover a complicated formula of parameters that you can use that you can use to calculate the probability that dog will be the next word. The parameters are called training weights.
Step 5: Test if the predictions of the next word are correct using your test dataset. If not adjust the parameters (or the hyperparameters) till you get more accurate results. (Simulation Code)
Transformers = Tokenization + Word Embeddings + Self-Attention + Positional Encodings
Tokenization splits the input text into manageable pieces, such as words or subwords.
Word Embeddings convert tokens into vectors of fixed size, capturing semantic meanings.
Self-Attention allows the model to weigh the importance of different words within the same sentence, considering the context.
Positional Encodings add information about the relative or absolute position of the tokens in the sequence, as Transformers do not inherently process sequential data.
The discovery of transformers greatly increased the ability to calculate weights in parallel, significantly increasing the scaling ability of Large Language Models