ML hyperparameters (Gemini)
Categorized by type of model or general application:
General/Optimization Hyperparameters:
Learning Rate: Controls the step size at each iteration while moving towards a minimum of a loss function.
Number of Epochs: The number of complete passes through the entire training dataset during training.
Batch Size: The number of samples processed before the model's internal parameters are updated.
Optimizer: The algorithm used to minimize the loss function (e.g., SGD, Adam, RMSprop).
Regularization Strength (λ or α): Controls the penalty for complexity in the model, preventing overfitting (e.g., L1, L2 regularization).
Neural Network Specific Hyperparameters:
Number of Hidden Layers: The depth of the neural network.
Number of Neurons (in each hidden layer): The width of the neural network at each hidden layer.
Activation Function: The non-linear function applied to the output of each neuron (e.g., ReLU, Sigmoid, Tanh).
Dropout Rate: The fraction of neurons to randomly set to zero during training to prevent overfitting.
Weight Initialization: The method used to set the initial values of the network's weights.
Tree-Based Model Specific Hyperparameters (e.g., Decision Trees, Random Forests, Gradient Boosting):
Max Depth: The maximum depth of individual trees in the ensemble.
Min Samples Split: The minimum number of samples required to split an internal node.
Min Samples Leaf: The minimum number of samples required to be at a leaf node.
Number of Estimators (or Trees): The number of trees in an ensemble (e.g., in Random Forest or Gradient Boosting).
Criterion (for splitting): The function to measure the quality of a split (e.g., Gini impurity, entropy for classification; mean squared error for regression).
Support Vector Machine (SVM) Specific Hyperparameters:
Regularization Parameter: Controls the trade-off between achieving a low training error and a low testing error (i.e., preventing overfitting).
Gamma (for RBF/Polynomial Kernels): Defines how far the influence of a single training example reaches. A low
K-Nearest Neighbors (KNN) Specific Hyperparameters:
N Neighbors (k): The number of nearest neighbors to consider for classification or regression.
P (Power Parameter for Minkowski Metric): Determines the type of distance metric used (e.g., p=1 for Manhattan distance, p=2 for Euclidean distance).