Skip to main content

Probability

Concept	Definition	Characteristics/Rules	Examples	Use Cases
Experiment	A process producing uncertain but well-defined outcomes	Outcomes known in advance, actual outcome uncertain, repeatable	Flipping a coin, rolling a die, drawing a card, clicking an ad	Defines the starting point of probability analysis
Outcome	A single possible result of an experiment	Represents one element from the sample space	Heads, Tails, rolling a "3", Ace of Spades	Used to build sample spaces
Sample Space (S)	The set of all possible outcomes of an experiment	Finite (discrete) or infinite/continuous	$S=\{H,T\}$ for a coin	Framework for defining events
Event (E)	A subset of the sample space	Can contain one or multiple outcomes	"Even die roll" = {2,4,6}; "at least one Head" = {HH,HT,TH}	Basis for probability calculations
Probability of an Event	Numerical measure of likelihood (0 to 1)	$P(E)=0$ impossible, $P(E)=1$ certain	Probability of rolling 3 on die = 1/6	Quantifies uncertainty
Classical Probability	A priori assumption of equal likelihood	$P(E) = \frac{\text{favorable outcomes}}{\text{total outcomes}}$	Rolling a 3 on a fair die = 1/6	Used in games of chance, theoretical problems
Empirical Probability	Based on observed data/frequency	$P(E) = \frac{\text{observed occurrences of E}}{\text{total trials}}$	Ad clicks: 150/1000 = 0.15	Foundation of data-driven analysis
Subjective Probability	Based on judgment or intuition	Not derived from calculation	Analyst predicts 70% chance of product success	Used in business decisions with scarce data
Complementary Events	Opposite of an event ( $E'$ )	Rule: $P(E') = 1 - P(E)$	Click rate 0.15 → No Click 0.85	Helps compute probabilities indirectly
Mutually Exclusive Events	Events that cannot occur together	Rule: $P(A \cup B) = P(A) + P(B)$	Even vs odd die roll outcomes	Useful in disjoint scenarios
Non-Mutually Exclusive Events	Events that can overlap	Rule: $P(A \cup B) = P(A) + P(B) - P(A \cap B)$	Rolling even OR >4 → 2/3	Key in overlapping categories or risks

Concept	Formula	Example	Use Cases
Conditional Probability	Probability of event A given event B: $P(A\|B) = \frac{P(A \cap B)}{P(B)}$	From a 52-card deck: Given the card is a Face Card, probability it's a King = 1/3	Used in click-through rates, churn prediction, and medical diagnosis
Independent Events	One event does not affect the other. $P(A\|B) = P(A),\; P(A \cap B) = P(A) \cdot P(B)$	Two coin flips: probability of heads on first and second flip = $0.5 \cdot 0.5 = 0.25$	Simplifies modeling; many tests assume independence
Dependent Events	One event affects the probability of the other. $P(A \cap B) = P(A) \cdot P(B\|A)$	Drawing two Kings in a row without replacement: $\frac{4}{52} \cdot \frac{3}{51} ≈ 0.0045$	Critical for sequential data, customer behavior, anomaly chains
Bayes' Theorem	Updates probability of A given evidence B: $P(A\|B) = \frac{P(B\|A) \cdot P(A)}{P(B)}$	Medical test: 1% prevalence, 95% true positive, 10% false positive. $P(D\|T) ≈ 8.75\%$	Core for A/B testing, spam filtering, fraud detection, Naive Bayes models

Concept	Definition	Properties	Examples	Use Cases
Random Variable	A function mapping outcomes of a random experiment to real numbers	Represents the numerical result of an unpredictable event	Number of heads in flips, height of a person, number of customers in a store	Forms foundation of modeling uncertainty and statistical inference
Discrete Random Variable	Takes on finite or countably infinite distinct values (often integers)	Possible outcomes are countable	Coin flips, number of defective items, die roll, customer arrivals	Modeled with discrete distributions (e.g., Binomial, Poisson)
Continuous Random Variable	Takes on values from any interval within the real line	Outcomes come from measurement, uncountably infinite	Height, weight, time to complete a task, daily sales revenue	Modeled with continuous distributions (e.g., Normal, Exponential)
Probability Distribution	Describes how probabilities are assigned over possible values	Defines likelihood structure of a random variable	PMF for discrete, PDF for continuous, CDF for both	Core tool to compute likelihood, support inference, and fit models
PMF (Probability Mass Function)	Assigns a probability to each possible value of a discrete random variable	$0 \leq P(X=x) \leq 1$ $\sum P(X=x)=1$	Number of heads in two coin flips: PMF = $\{P(0)=1/4, P(1)=1/2, P(2)=1/4\}$	Essential for modeling counts and categorical outcomes
PDF (Probability Density Function)	Describes density for continuous random variables; probability is area under curve	$f(x) \geq 0$ $\int_{-\infty}^{\infty} f(x) dx = 1$	Heights modeled by Normal distribution	Basis for calculating probabilities of ranges in continuous data
CDF (Cumulative Distribution Function)	Gives probability that $X \leq x$ . Works for both discrete and continuous variables	$F(x) = P(X \leq x)$ Non-decreasing Limits: $F(-\infty)=0, F(\infty)=1$	Coin flip example: $F(0)=1/4, F(1)=3/4, F(2)=1$	Used for quantiles, percentiles, range probabilities, and model fitting

Type	Distribution	Parameters	PMF/PDF	Mean	Variance	Use Cases
Discrete	Bernoulli	$p$	$P(X=k) = p^k (1-p)^{1-k}, \, k \in \{0,1\}$	$p$	$p(1-p)$	Binary outcomes, click/no-click, churn models
	Binomial	$n, p$	$P(X=k)=\binom{n}{k}p^k(1-p)^{n-k}$	$np$	$np(1-p)$	A/B testing, quality control, survey responses
	Poisson	$\lambda$	$P(X=k)=\frac{e^{-\lambda}\lambda^k}{k!}$	$\lambda$	$\lambda$	Rare events, call arrivals, web traffic, defects count
	Geometric	$p$	$P(X=k)=(1-p)^{k-1}p, \, k \geq 1$	$1/p$	$(1-p)/p^2$	Reliability, marketing conversion attempts, first defect detection
Continuous	Uniform	$a, b$	$f(x)=\frac{1}{b-a}, \, a \leq x \leq b$	$(a+b)/2$	$(b-a)^2/12$	Random number generation, baseline models
	Normal	$\mu, \sigma$	$f(x)=\frac{1}{\sigma \sqrt{2\pi}} e^{-\frac{1}{2}(\frac{x-\mu}{\sigma})^2}$	$\mu$	$\sigma^2$	Natural phenomena, CLT, parametric statistical tests, ML
	Standard Normal	$\mu=0,\sigma=1$	$f(z)=\frac{1}{\sqrt{2\pi}} e^{-z^2/2}$	0	1	Z-scores, standardization, hypothesis testing
	Exponential	$\lambda$	$f(x)=\lambda e^{-\lambda x}, \, x \geq 0$	$1/\lambda$	$1/\lambda^2$	Reliability engineering, waiting times, customer lifetime
	Chi-squared	$k$ (df)	Sum of squares of $k$ standard normals	$k$	$2k$	Goodness-of-fit, independence tests, variance CI
	t-distribution	$\nu$ (df)	Symmetric, bell-shaped, heavier tails	0	$\nu/(\nu-2), \, \nu > 2$	t-tests, CI for mean, small samples
	F-distribution	$d_1, d_2$	Ratio of scaled chi-squared variates	$\frac{d_2}{d_2-2}, d_2>2$	Varies	ANOVA, regression model significance, variance comparison

Aspect	Expected Value ( $E[X]$ )	Variance ( $Var[X]$ )
Definition	The theoretical average (mean) of a random variable; a measure of central tendency	The average squared deviation from the mean; a measure of spread or variability
Notation	$E[X]$ or $\mu$	$Var[X]$ or $\sigma^2$
Discrete Formula	$E[X] = \sum_{i=1}^{k} x_i P(X=x_i)$	$Var[X] = \sum_{i=1}^{k} (x_i - \mu)^2 P(X=x_i)$ or $Var[X] = E[X^2] - (E[X])^2$
Continuous Formula	$E[X] = \int_{-\infty}^{\infty} x f(x) dx$	$Var[X] = \int_{-\infty}^{\infty} (x - \mu)^2 f(x) dx$ or $Var[X] = E[X^2] - (E[X])^2$
Units	Same as the random variable $X$	Squared units of $X$ . Standard deviation ( $\sigma = \sqrt{Var[X]}$ ) restores original units
Properties	Linearity: $E[aX+bY+c] = aE[X] + bE[Y] + c$ Expected value of constant: $E[c] = c$ Not necessarily a possible outcome	Always non-negative Scaling: $Var[cX] = c^2 Var[X]$ Translation: $Var[X+c] = Var[X]$ For independent RVs: $Var[X+Y] = Var[X] + Var[Y]$
Interpretation	Long-run average or "center" of the distribution	Degree of dispersion around the mean; how "spread out" values are
Examples	Discrete: Average number of customer complaints = 0.95 Continuous: Exponential RV with rate $\lambda = 0.5$ has $E[X] = 2$	Discrete: Variance of complaints = 0.7475 Continuous: Exponential RV with rate $\lambda = 0.5$ has $Var[X] = 4$
Use Cases	Decision making, expected returns, risk assessment, fairness in probability games, model evaluation	Risk measurement, quality control, hypothesis testing, error analysis (MSE), variability comparison between processes

Aspect	Summary
Definition	States that the sampling distribution of the sample mean (or sum) of a large number of independent, identically distributed random variables approaches a normal distribution, regardless of the original population distribution
Conditions	Sufficiently large sample size (rule of thumb: $n \geq 30$ ) Independent observations Identically distributed data Population with finite variance
Key Properties	Sampling distribution of mean is approximately normal Mean of sample means equals population mean ( $\mu_{\bar{X}} = \mu$ ) Standard error of mean: $\sigma_{\bar{X}} = \frac{\sigma}{\sqrt{n}}$
Importance	Justifies use of normality-based statistical methods (t-tests, ANOVA, regression) Allows estimation of population means Foundation for confidence intervals and hypothesis testing Explains naturally occurring normal distributions in phenomena
Applications	Constructing confidence intervals: $\bar{X} \pm Z^* \left(\frac{\sigma}{\sqrt{n}}\right)$ Hypothesis testing with Z/t statistics Quality control and process monitoring
Examples	Estimating average transaction value from a sample of 100 transactions Testing if new product ratings differ from historical average of 4.0
Limitations	Requires sufficiently large sample sizes, especially for skewed populations Does not apply if data are dependent or from distributions with infinite variance (e.g., Cauchy) Applies mainly to means/sums, not all statistics
Conceptual Visualization	Rolling dice: distribution of averages becomes increasingly normal as $n$ increases $n=1$ : uniform distribution $n=2$ : beginning of bell-shape $n=30$ : strong normal curve around population mean (3.5)

Concept	Definition	Key Properties	Applications
Law of Large Numbers (LLN)	As sample size increases, the sample mean converges to the population mean	Weak LLN: convergence in probability Strong LLN: almost sure convergence Requires independent, identically distributed samples	Justifies using sample statistics as population estimates Foundation for Monte Carlo methods and simulation Explains why larger samples give more reliable estimates
Weak Law of Large Numbers	Sample average converges to expected value in probability: $\bar{X}_n \xrightarrow{P} \mu$	Probability that $\|\bar{X}_n - \mu\| > \epsilon$ approaches 0 as n increases Most commonly used version in practice	Large sample confidence in statistical estimates Risk management and insurance calculations
Strong Law of Large Numbers	Sample average converges to expected value almost surely: $\bar{X}_n \xrightarrow{a.s.} \mu$	Stronger guarantee than weak LLN Sample mean will eventually be arbitrarily close to true mean	Theoretical foundation for long-term convergence Used in proving consistency of estimators
Applications in Data Analysis	Explains why we can trust sample statistics with large samples	Survey sampling and opinion polls Quality control and process monitoring A/B testing and experimental design	Foundation for statistical inference Justifies central limit theorem assumptions Essential for understanding estimator consistency