Maths and ML (under construction)

Mode

The mode is the value that appears most frequently in a dataset.

Example:
cat, cat, dog, dog, dog, elephant

- The mode is dog, as it occurs 3 times, which is more than any other value.

Gain of information

We try to figure out which features (age, or smoker) is the best preductor for cancer

Age Smoker Target

Teenager Yes No cancer

Elderly No Cancer

Adult Yes Cancer

Elderly Yes Cancer

Adult No No Cancer

Elderly No Cancer

Adult Yes Cancer

Elderly No No cancer

1. Entropy Formula:

Entropy quantifies the impurity or randomness in the dataset:

Entropy= sum(pi * log⁡2(pi) ) where p is proportion of class iii in the dataset.

2. Target Entropy:

From your data:

Cancer: 5 occurrences
No Cancer: 3 occurrences

The probabilities:

p(Cancer)=5/8, p(No Cancer)=3/8

Entropy(Target)=−(5/8 log⁡2 (5/8)+ 3/8 log⁡2(3/8) = 0.95

3. Entropy for Features:

To calculate the entropy for features, we split the dataset based on each feature and calculate the weighted sum of the entropies of the subsets.

Feature: Age

Split into groups: Teenager, Adult, and Elderly.
- Teenager: 1 (No Cancer)
- Adult: 3 ( 2 Cancer, 1 No Cancer)
- Elderly: 3 (3 Cancer, 1 No cancer)

For each group, calculate its entropy and weight it by the proportion of the dataset it represents.

Entropy_Age= (Entropy_teenager * number of teenager + Entropy_adult* number of adult + Entropy_elderly * elderly) /total number of people

Entropy_Teenager: 0 * log2(0) + 1* log2(1) =0
Entropy_Adult: −2/3⋅log⁡2 (2/3) - 1/3⋅log⁡2(1/3)=0.92
Entropy_Elderly: -3/4 log2(3/4) - 1/4* log2(1/4)= 0.81
Now combine them:

Entropy_Age= (1 teen * 0 + 3 adult *0.92 + 4 elderly *0.81)/ 8 people =0.75

Feature: Smoker

Split into groups: Yes and No.
- Yes: 4 (3 Cancer, 1 No Cancer)
- No: 4 (2 Cancer, 2 No Cancer)

For each group:

Yes: -3/4*log2(3/4) -1/4*log2(1/4)=0.81
No: -2/4*log2(2/4)-1/2(log2(2/4) =01

Now combine them:

Entropy_Smoker)= (4 Smoker * 0.81 + 4 Non smokers * 1)/8 people = 0.835

4. Information Gain:

To calculate the information gain for each feature:

Information Gain=Entropy(Target)−Entropy(Feature)

For Age:

Information Gain(Age)=0.95−0.75=0.2

For Smoker:

Information Gain(Smoker)=0.950−0.835=0.15

So, the root node of the decision tree would be Age.

Logistic regression and sigmoid function

Logistic Regression: A Classification Method

Logistic regression predicts the probability of an outcome (e.g., belonging to class 1) based on a linear model and a sigmoid transformation.

Limitations of Logistic Regression:

Logistic regression assumes linearity between features and the log-odds, which may not hold true in all cases.
The method can struggle with highly imbalanced datasets or when features are highly correlated.

Linear Model: The linear relationship is defined as:
y=ax+b
Here, a is the coefficient (slope), b is the intercept, and x is the input data.
Sigmoid Transformation: The raw output y is transformed into a probability using the sigmoid function:
p=1/(1+e(−y))
This maps y into the range [0, 1].
Thresholding: A threshold (typically 0.5) is applied to classify the output:
- p≥0.5 Classify as 1 (positive class).
- p<0.5 Classify as 0 (negative class).

data(x) linear model sigmoidal transformation (proba) threshold (classification)

2 y=0 0.5 0

3 y=5 0.993 1

1 y= -5 0.007 0

2.5 y=2.5 0.924 1

Confidence Interval

CI =sample mean(n) +/- Z * std(n)/sqrt(n) where n is the samples size

Z-value from the standard normal distribution corresponding to the confidence level (e.g., Z=1.96 for 95%)