Gradient Descent | Iterative optimization method to minimize a loss function | ΞΈj+1β=ΞΈjββΞ±βJ(ΞΈjβ) | Training neural networks, logistic regression, and general parameter estimation |
Normal (Gaussian) Distribution | Continuous probability distribution defined by mean and variance | f(xβ£ΞΌ,Ο2)=Ο2Οβ1βexp(β2Ο2(xβΞΌ)2β) | Modeling noise, Gaussian priors, feature normalization, probabilistic models |
Z-score | Standardizes a value relative to a distribution (mean, std) | z=ΟxβΞΌβ | Feature scaling, outlier detection, standardization before training |
Sigmoid (Logistic) Function | S-shaped activation mapping real-valued input to (0,1) | Ο(x)=1+eβx1β | Binary classification outputs, logistic regression, binary neuron activation |
Pearson Correlation Coefficient | Measure of linear correlation between two variables | Corr(X,Y)=Std(X)Std(Y)Cov(X,Y)β | Feature selection, exploratory data analysis, detecting multicollinearity |
Cosine Similarity | Angle-based similarity between two vectors | sim(A,B)=β₯Aβ₯β₯Bβ₯Aβ
Bβ | Text similarity, nearest neighbors in embedding spaces, recommendation systems |
Naive Bayes (posterior with conditional independence) | Probabilistic classifier assuming feature independence | P(yβ£x1β,β¦,xnβ)=P(x1β,β¦,xnβ)P(y)βi=1nβP(xiββ£y)β | Text classification, spam detection, quick baseline probabilistic models |
Maximum Likelihood Estimation (MLE) | Parameter estimation by maximizing the data likelihood | ΞΈ^MLEβ=argmaxΞΈββi=1nβP(xiββ£ΞΈ) | Estimating model parameters for many statistical models (e.g., Gaussian, logistic) |
Ordinary Least Squares (OLS) Solution | Closed-form linear regression coefficients minimizing squared error. | Ξ²^β=(Xβ€X)β1Xβ€y | Linear regression fitting, baseline regression analysis, quick parameter estimates |
F1 Score | Harmonic mean of precision and recall for classification | F1β=P+R2β
Pβ
Rβ | Evaluating imbalanced classification tasks (e.g., information retrieval) |
ReLU (Rectified Linear Unit) | Piecewise linear activation that is zero for negative inputs | ReLU(x)=max(0,x) | Activation in deep neural networks β helps with sparse activations and gradient flow |
Softmax (class probability) | Converts logits to a probability distribution over classes | P(y=jβ£x)=βk=1Kβexp(xβ€wkβ)exp(xβ€wjβ)β | Multi-class classification outputs, final layer in classifiers, cross-entropy loss |
Coefficient of Determination (R^2) | Fraction of variance explained by a regression model | R2=1ββiβ(yiββyΛβ)2βiβ(yiββy^βiβ)2β | Assessing goodness-of-fit for regression models |
Mean Squared Error (MSE) | Average squared difference between predictions and targets | MSE=n1ββi=1nβ(yiββy^βiβ)2 | Regression loss for training and model comparison |
MSE with L2 Regularization (Ridge-style) | MSE augmented with L2 penalty to shrink coefficients | MSEregβ=n1ββi=1nβ(yiββy^βiβ)2+Ξ»βj=1pβΞ²j2β | Preventing overfitting, ridge regression, regularized linear models |
Eigenvalue / Eigenvector Equation | Characterizes linear transformations via scale directions | Av=Ξ»v | PCA, spectral clustering, analyzing linear operators and covariance matrices |
(Shannon) Entropy | Measure of uncertainty or information content in a distribution | H(X)=ββiβpiβlog2βpiβ | Feature selection, decision tree splitting, information-theoretic model comparisons |
K-Means Objective | Sum of squared distances used to define cluster assignments | Sargminββi=1kββxβSiβββ₯xβΞΌiββ₯2 | Unsupervised clustering to find centroids; pre-processing and segmentation |
Kullback-Leibler (KL) Divergence | Asymmetric measure of difference between two probability distributions. | DKLβ(Pβ₯Q)=βxβP(x)logQ(x)P(x)β | Variational inference, training generative models, measuring distribution shifts |
Log-Loss (Binary Cross-Entropy) | Negative log-likelihood for binary classification predictions | βlogβ=βN1ββi=1Nβ[yiβlog(y^βiβ)+(1βyiβ)log(1βy^βiβ)] | Loss for binary classifiers, logistic regression, and neural nets with sigmoid outputs |
Support Vector Machine (hinge loss, primal) | Margin-based objective with hinge loss and regularization | minw,bβ21ββ₯wβ₯2+Cβi=1nβmax(0,1βyiβ(wβ€xiββb)) | Classification with large-margin objectives; SVM training and kernel methods |
Linear Regression (model) | Linear model expressing target as weighted sum of inputs | y=Ξ²0β+Ξ²1βx1β+Ξ²2βx2β+β―+Ξ²nβxnβ+Ξ΅ | Predictive modeling for continuous targets, baseline models, interpretability |
Singular Value Decomposition (SVD) | Factorizes a matrix into singular vectors and singular values | A=UΞ£Vβ€ | Dimensionality reduction, low-rank approximations, recommender systems (matrix factorization). |
Lagrange Multiplier (constrained optimization) | Method to optimize with equality constraints using multipliers | Primary constraint form: g(x)=0; L(x,Ξ»)=f(x)βΞ»g(x) | Constrained optimization in model training, dual formulations, constrained EM or SVM derivations. |