Skip to main content

Linear Algebra

Concept	Definition	Key Operations	Applications
Vectors and Scalars	Vectors are ordered arrays of numbers representing points in space; scalars are single numbers	Addition, scalar multiplication, dot product, cross product	Data points, feature vectors, gradients in optimization
Matrices	Rectangular arrays of numbers representing linear transformations	Addition, multiplication, transpose, inverse, determinant	Dataset representation, transformation matrices, covariance matrices
Systems of Linear Equations	Sets of equations with multiple variables and linear relationships	Gaussian elimination, LU decomposition, matrix inversion	Solving regression problems, network flow analysis, optimization
Vector Spaces and Subspaces	Collections of vectors closed under addition and scalar multiplication	Basis, dimension, span, linear independence, null space	Understanding data structure, dimensionality reduction, feature spaces
Eigenvalues and Eigenvectors	Eigenvectors are directions unchanged by linear transformations; eigenvalues are scaling factors	Eigen-decomposition, spectral theorem, power iteration	Principal Component Analysis (PCA), stability analysis, quantum mechanics
Determinants	Scalar value indicating matrix properties like invertibility and volume scaling	Formula computation, properties, Cramer's rule	Testing matrix invertibility, computing areas and volumes
Norms	Measures of vector/matrix magnitude or distance	L1 norm (Manhattan), L2 norm (Euclidean), Frobenius norm	Regularization in ML, distance metrics, error measurement
Singular Value Decomposition (SVD)	Matrix factorization into $U\Sigma V^T$ where $U$ and $V$ are orthogonal, $\Sigma$ is diagonal	Full SVD, reduced SVD, applications in data analysis	Dimensionality reduction, recommender systems, image compression
Orthogonality	Vectors/matrices with dot product zero, representing perpendicular directions	Orthogonal vectors, orthogonal matrices, orthonormal bases	Coordinate system transformation, Gram-Schmidt process, QR decomposition

Scalars, Vectors, and Matrices

Aspect	Scalars	Vectors	Matrices
Definition	A single numerical value	An ordered list of numbers (1D array)	A rectangular array of numbers (2D array)
Notation	Lowercase italic letters: $a, b, \lambda$	Lowercase bold letters: $\mathbf{v}, \mathbf{x}$ or with an arrow $\vec{v}$	Uppercase bold letters: $\mathbf{A}, \mathbf{X}$
Dimension	0D (just one value)	1D with $n$ components (an $n$ -dimensional vector)	2D with $m \times n$ elements (rows and columns)
Characteristics	Magnitude only, no direction	Magnitude and direction (length + orientation)	Collection of numbers arranged in rows and columns
Examples in Data Analysis	Average salary = 75000 Learning rate = 0.01	Customer features = $(30, 50000, 12)$ Stock prices over a week = $(100,102,101,...)$	A dataset with rows = observations, columns = features Image pixel grid Correlation matrix
Basic Operations	Multiplication with vectors/matrices (scaling)	Addition, subtraction (element-wise) Scalar multiplication Dot product (→ scalar) Norms (L1/L2 distances)	Addition, subtraction (element-wise) Scalar-matrix multiplication Matrix-matrix multiplication Transpose
Key Uses in Data Analysis	Represent individual values, model parameters, hyperparameters	Represent data points (rows) or features (columns), weights in models, distance/similarity measures	Represent datasets, transformations, statistical measures (covariance, correlations), ML computations
Geometric Meaning	A point on a number line	A directed arrow (length + direction) in space	A transformation of space, mapping vectors to new vectors
Relevance	Simple descriptive stats or model constants	Data representation, projections, learning algorithms	Dataset storage, transformations, machine learning models, PCA, regression, deep learning

Matrices

Matrix Type	Definition / Characteristics	Notation	Key Properties	Relevance in Data Analysis
Identity	Square matrix with 1s on main diagonal, 0s elsewhere. Acts like scalar 1 in multiplication	$\mathbf{I}, \mathbf{I}_n$	$\mathbf{I}_m \mathbf{A} = \mathbf{A}$ , $\mathbf{A} \mathbf{I}_n = \mathbf{A}$ . Leaves vectors/matrices unchanged	"Do nothing" transformation. Defines inverses. Used in regularization (e.g., Ridge Regression)
Zero	All entries are 0. Can be any dimension. Acts like scalar 0 in addition	$\mathbf{0}, \mathbf{0}_{m\times n}$	$\mathbf{A} + \mathbf{0} = \mathbf{A}$ . Multiplying with zero matrix yields a zero matrix (if dimensions match)	Represents baseline/no effect. Used for error analysis (perfect fit = zero error). Useful for padding matrices
Diagonal	Square matrix with nonzero values only on the main diagonal	$\mathbf{D}, \text{diag}(d_1,\dots,d_n)$	Multiplication simplifies to scaling rows/columns. Easily invertible if diagonal entries are nonzero. Eigenvalues are diagonal entries	Used for scaling features. PCA eigenvalues appear in diagonal form. Indicates independence/uncorrelated features. Weighted regression methods
Symmetric	Square matrix equal to its transpose: $\mathbf{A} = \mathbf{A}^T$	$\mathbf{A} = \mathbf{A}^T$	All eigenvalues are real. Always diagonalizable. Eigenvectors for distinct eigenvalues are orthogonal	Covariance and correlation matrices. Similarity and kernel matrices in machine learning (e.g., SVMs, clustering)
Inverse	For square matrix $\mathbf{A}$ , inverse $\mathbf{A}^{-1}$ satisfies $\mathbf{A}\mathbf{A}^{-1}=\mathbf{I}$	$\mathbf{A}^{-1}$	Exists only if $\det(\mathbf{A}) \neq 0$ . Provides unique solution to linear equations	Critical for regression, solving systems ( $\mathbf{A}\mathbf{x}=\mathbf{b}$ ), Kalman filters, and precision matrices

Aspect	Key Points	Examples
Definition	Determinant is a scalar value computed from a square matrix	Denoted as $\det(\mathbf{A})$ or $\|\mathbf{A}\|$ . Only defined for square matrices
Conceptual Meaning	Invertibility test: nonzero determinant → invertible Volume scaling factor in linear transformations Indicator of linear independence of vectors	If $\det(\mathbf{A}) = 0$ , matrix is singular and columns are dependent
Calculation (2×2)	Formula: $\det(\mathbf{A}) = ad - bc$	For $\begin{pmatrix} 3 & 1 \\ 4 & 2 \end{pmatrix}$ , determinant $= 2$
Calculation (3×3)	Methods: Sarrus' Rule (only for 3×3) or cofactor expansion	Example: $\begin{pmatrix} 1&2&3\\4&5&6\\7&8&9\end{pmatrix}$ , determinant $=0$
Key Properties	$\det(\mathbf{I})=1$ $\det(\mathbf{0})=0$ For diagonal matrices: product of diagonal elements $\det(\mathbf{A}) = \det(\mathbf{A}^T)$ $\det(\mathbf{A}\mathbf{B})=\det(\mathbf{A})\det(\mathbf{B})$ Swapping rows/columns changes sign Scalar multiple: $\det(c\mathbf{A}) = c^n \det(\mathbf{A})$ Row/column dependence → determinant = 0 Some row ops leave determinant unchanged	Useful for simplifying computation and understanding structural properties
Invertibility & Solving Systems	$\det(\mathbf{A})\neq 0$ → inverse exists	In regression, if $\det(\mathbf{X}^T\mathbf{X})=0$ : indicates multicollinearity. Small determinants → numerical instability
Linear Independence & Rank	Zero determinant → linear dependence; matrix rank < dimension	Helps detect redundant features in datasets
Geometric Meaning	Absolute determinant = scaling factor of area/volume. Sign indicates orientation flip/reflection	If $\det(\mathbf{A})=0$ : space collapses to lower dimension (loss of information)
PCA Relevance	Covariance matrix determinant = product of eigenvalues. Zero determinant means some features perfectly correlated	Links determinants to dimensionality reduction and variance in PCA

Aspect	Description	Example	Use Cases
Definition	Set of linear equations with common variables; solutions satisfy all equations simultaneously	$a_{11}x_1 + a_{12}x_2 = b_1$ , $a_{21}x_1 + a_{22}x_2 = b_2$	Models constraints, parameter estimation, optimization
Possible Solutions	Unique solution: single intersection No solution: inconsistent, parallel lines Infinite solutions: dependent, overlapping	Two lines intersecting vs. parallel vs. coincident	Identifies whether models are solvable or if redundancy exists
Matrix Form $Ax = b$	Compact representation using coefficient matrix $A$ , variable vector $x$ , and constant vector $b$	$\begin{pmatrix}2&3\\1&-1\end{pmatrix}\begin{pmatrix}x_1\\x_2\end{pmatrix}=\begin{pmatrix}12\\-1\end{pmatrix}$	Enables computation with software; foundation for regression, optimization, and network analysis
Gaussian Elimination	Algorithmic row operations to reduce system to row echelon form	Stepwise elimination of variables	Basis for computational solvers; reveals rank, independence, consistency
Matrix Inversion	Direct solution if $A$ is square and invertible: $x = A^{-1}b$	Least squares regression formula $(X^TX)^{-1}X^Ty$	Theoretical insight, regression coefficients, but unstable for large/ill-conditioned systems
Applications	Used for regression, optimization, networks, constraint solving	Linear programming, PCA foundation, traffic/circuit analysis	Critical across data science, machine learning, and operations research
Numerical Considerations	Stability issues can arise for nearly singular systems	Small change in $A$ produces large change in $x$	Helps diagnose multicollinearity and instability in models
Software Tools	Computational libraries perform solving using efficient methods	Python (`numpy.linalg.solve`), R (`solve()`)	Automates arithmetic, but conceptual understanding required for interpretation

Aspect	Eigenvalues ( $\lambda$ )	Eigenvectors ( $\mathbf{x}$ )
Definition	Scalar factors that indicate how much a corresponding eigenvector is stretched or shrunk by a transformation	Non-zero vectors that maintain their direction under a linear transformation, only scaled by their eigenvalue
Eigen-equation	Appears as $\mathbf{A}\mathbf{x} = \lambda \mathbf{x}$ , solved from $\det(\mathbf{A} - \lambda \mathbf{I}) = 0$	Obtained by solving $(\mathbf{A} - \lambda \mathbf{I}) \mathbf{x} = 0$ for each eigenvalue $\lambda$
Conceptual Meaning	Represents the magnitude of the scaling effect of the transformation in a given direction	Represents the directions (axes) along which the transformation acts by pure stretching or shrinking without rotation
Numerical Example	For $A = \begin{pmatrix}2 & 1 \\ 1 & 2\end{pmatrix}$ , eigenvalues are $\lambda_1=1$ , $\lambda_2=3$	For the same matrix: eigenvector for $\lambda_1=1$ is $[1,-1]^T$ ; for $\lambda_2=3$ is $[1,1]^T$
Role in PCA	Indicate how much variance each principal component explains (larger eigenvalues = higher variance captured)	Define the principal components themselves, i.e., the new axes along which data varies most
Data Analysis Impact	Rank importance of directions by variance magnitude, guiding dimensionality reduction	Provide new coordinate system for data that simplifies interpretation and visualization
Other Applications	Indicate stability in dynamic systems; spectral analysis (graph connectivity, community detection)	Show invariant directions in system dynamics; essential in PCA and SVD for feature extraction & data representation
Uniqueness	Numerical values are unique (though multiplicity may occur)	Not unique - any scalar multiple of an eigenvector is also an eigenvector (commonly normalized to unit length)

Vector Spaces vs Subspaces

Concept	Vector Space	Subspace
Definition	A set of vectors closed under addition and scalar multiplication, following specific axioms	A subset of a vector space that itself satisfies all the vector space axioms
Required Properties	Closure under addition and scalar multiplication, existence of zero vector, additive inverse, associativity, commutativity, distributivity	Contains the zero vector, closed under addition, closed under scalar multiplication
Examples	$\mathbb{R}^2$ , $\mathbb{R}^3$ , $\mathbb{R}^n$	Line through the origin in $\mathbb{R}^2$ , plane through origin in $\mathbb{R}^3$ , trivial subspace $\{0\}$
Geometric Meaning	The full "space" where vectors (data points) live, can be high-dimensional	A smaller "region" inside a larger vector space, such as a line or plane within that space
Relevance to Data Analysis	Represents entire data feature space, geometric context for similarity, projections, and transformations	Supports dimensionality reduction (PCA), feature combinations, and efficient data representation

Concepts

Concept	Meaning	Use Cases
Span	All linear combinations of a set of vectors	Defines the full feature space reachable from given features
Linear Independence	No vector is redundant; none can be expressed as a combination of others	Identifies redundancy (multicollinearity) and supports dimensionality reduction
Basis	Minimal set of linearly independent vectors that span the whole space	Provides an optimal coordinate system (e.g., PCA basis)
Dimension	Number of independent directions (size of a basis)	Indicates data complexity and relates to curse of dimensionality
Null Space	Vectors mapped to zero under a transformation	Reveals redundancy or loss of information; linked to invertibility and multicollinearity
Column Space	All linear combinations of matrix columns (reachable outputs)	Defines prediction/output space in regression or linear models
Row Space	All linear combinations of matrix rows	Provides insight into feature relationships; dimension equals matrix rank