Hyperparameters are model configuration variables whose values are set before the training process. Generally they determine the behavior and performance of a machine learning model. Hyperparameters are not learned from the data, but they can significantly impact the model's performance. Here are some examples of hyperparameters:
Learning Rate: Learning rate is a hyperparameter that determines the step size of the gradient descent algorithm during model training. It controls how much the model's parameters are updated in each iteration. A high learning rate can cause the model to overshoot the optimal solution, while a low learning rate can cause the model to converge too slowly. The learning rate is usually set to a small value between 0.01 and 0.0001.
Regularization Strength: Regularization strength is a hyperparameter that controls the amount of regularization applied to a model to prevent overfitting. Regularization adds a penalty term to the loss function that discourages the model from overfitting the training data. The strength of regularization is controlled by a hyperparameter, such as the L1 or L2 regularization coefficient. A high regularization strength can cause the model to underfit, while a low regularization strength can cause the model to overfit.
Number of Hidden Layers: The number of hidden layers is a hyperparameter that determines the depth of a neural network model. It controls the complexity of the model and its ability to capture nonlinear relationships in the data. A deep network with many hidden layers can learn more complex features, but it would be more difficult to train it as compared to a shallow network with fewer hidden layers.
Batch Size: Batch size is a hyperparameter that determines the number of training data items used in each iteration of the training process. It controls the amount of memory required to train the model and the convergence speed of the algorithm. A larger batch size can lead to faster convergence but requires more memory and may lead to worse generalization performance.
Activation Function: Activation function is a hyperparameter that determines the nonlinearity of a neural network model. It is applied to the output of each neuron in the network and it controls the range of the output values. Common activation functions include ReLU, sigmoid, and tanh. The choice of activation function can impact the model's ability to learn complex features and the convergence speed of the algorithm.