Database

Overview
Selection Process
Transactions
Locking

DB Type	Features	Use Cases
Columnar	Schema Evolution Column-Oriented Storage Column-Level Compression Column-Wise Indexing Analytical Query Performance Slice-and-Dice: efficiently analyze data by selecting columns (slice) and breaking it down further (dice), allowing complex queries and aggregations on large datasets with speed and flexibility	Data Warehousing Analytics Log Processing
Document	Efficient Query Performance Document Versioning Flexible Schema	Content Management IoT
Graph	Relationship Focus Deep Insight	Social Networks Fraud Detection LLMs
Key-Value	Data Partitioning Simple Data Model High-Write & Query Performance Developer Friendly	Caching Session Store
NewSQL	Transactions & ACID Support SQL	High-Transaction Real-time
NoSQL	Horizontal Scalability High Availability Distributed Architecture Flexible Data Model	Big Data Real-time
Object-Oriented	Complex Querying & Navigation Complex Data Models Object Persistence Encapsulation & Data Abstraction Object Versioning Inheritance & Polymorphism	Object Persistence
Relational / SQL	Indexing & Optimization Security Features Relationship & Referential Integrity Structured Data Transactions & ACID SQL Support	Online Transaction Processing (OLTP) Online Analytical Processing (OLAP)
Spatial	Spatial Types & Indexing Topology & Network Analysis Geospatial Query Language Integration with GIS	Geographic Information Systems (GIS) Spatial Analysis
Time-Series	Retention Policies Efficient Storage Time-Window Aggregation High Write & Query Performance	Sensor Data Financial Data Industrial IoT
Vector DB	High-dimensional vector storage Similarity search Approximate nearest neighbor (ANN) search Scalability Metadata management: Stores and manages additional information associated with vectors for filtering and contextualization	Generative AI Natural language processing (NLP) Recommender systems Image and video search Fraud detection

Type	Definition	Priority	Transaction Failure	Data Consistency	Use Cases
ACID	Atomicity: Each transaction is completed or aborted Consistency: Guarantees committed transaction state (store valid data) Isolation: Transactions are independent Durability: Committed data is never lost	Consistency	Entire transaction rolls back	Guaranteed immediately	Financial Transactions (Banking, Stock Trading)
BASE	Basically Available: System remains available for read and write operations despite failures Soft state: System may be temporarily inconsistent but eventually becomes consistent Eventual consistency: Over time, all data replicas converge to the same state	Availability	May proceed with eventual consistency	Eventual, but not always immediate	Social Media Platforms

Locking Hierarchy
Types

Aspect	Intent Shared (IS)	Exclusive (X)	Intent Exclusive (IX)	Shared with Intent Exclusive (SIX)
Definition	Allows multiple readers but prevents updates. Signals intent to acquire an exclusive lock later	Prevents all other users from accessing the data (reading or writing)	Signals intent to acquire an exclusive lock and prevents other users from acquiring any locks (read or write)	Allows multiple readers but prevents updates. Signals intent to acquire an exclusive lock and prevents other users from acquiring any locks (read or write)
Usage	Initial read access before acquiring X lock Improve concurrency for read-heavy workloads	Update operations (write, delete)	Prevent other users from acquiring any locks before acquiring X lock Useful for long-running transactions	Similar to IS but signals future X lock acquisition Useful for scenarios where initial read might be followed by update
Impact on Concurrency	Improves read concurrency	Reduces write concurrency	Blocks all access, impacting overall concurrency	Improves read concurrency initially, reduces write concurrency later
Scalability	Scales well with read-heavy workloads	May impact performance with high write concurrency	May impact performance due to blocking all access	Can provide a balance between read and write concurrency

Feature	Pessimistic Locking	Optimistic Locking
Visualization
Locking Mechanism	Acquires locks on database records before any read/write operation	No explicit locks; relies on versioning or timestamps
Transaction Isolation	Guarantees serializability (transactions appear to execute one after another)	Relies on conflict detection during commit
Concurrency	Lower concurrency due to exclusive access	Higher concurrency as multiple transactions can read data concurrently
Data Integrity	High; ensures only one transaction modifies data at a time	Lower; potential for "lost updates" if conflicts occur
Implementation	Database-managed; different lock types (shared, exclusive) available	Application-level; relies on versioning mechanisms (e.g., version numbers, timestamps) in the database
Error Handling	Rollback transactions that encounter locked records	Retry transactions that encounter conflicts during commit
Use Cases	High contention environments (frequent updates) Critical data operations (financial transactions) Applications requiring strict data consistency	Low contention environments (read-heavy systems) Non-critical data updates Short-lived transactions

Data Storage Solutions

Overview
Processing Types
Data Processing
Data Repositories
File Storage

Key Points for Considerations

Data Volume and Growth: How much data do you currently store, and what's the anticipated growth rate?
Performance Requirements: How critical is fast access and retrieval of data for your operations?
Data Security and Compliance: What security measures are necessary to safeguard sensitive data? Are there industry regulations to adhere to?
Accessibility and Sharing Needs: Do you require remote access to data or collaboration features?
Budgetary Constraints: What is your allocated budget for data storage solutions?

Feature	Online Analytical Processing (OLAP)	Online Transaction Processing (OLTP)
Purpose	Analytical processing for decision-making	Transaction processing for day-to-day operations
Data Usage	Aggregates historical data for reporting and analysis	Processes real-time transactions and updates
Data Schema	Star or snowflake schemas, denormalized	Normalized schemas, reducing redundancy
Query Complexity	Complex queries with aggregations and joins	Simple queries with frequent read/write operations
Data Granularity	Summarized, aggregated data	Detailed, individual transactions
Data Size	Huge data volumes, typically in terabytes or more	Smaller data volumes, typically in gigabytes or less
Performance	Designed for high throughput, slower write speeds	Optimized for fast write speeds and low latency reads
Examples	Amazon Redshift, Google BigQuery, Snowflake	MySQL, PostgreSQL

Aspect	ETL	ELT
Visualization
Process Flow	Extract data first, then transform and load into the target system	Extract data first, load into the target system, then transform within the target system
Data Transformation	Transformation occurs before loading into the target system	Transformation occurs after loading into the target system
Performance	Typically slower due to data transformation overhead during the ETL process	Generally faster because loading raw data is faster than transforming it during the ETL process
Scalability	Limited scalability due to the need for substantial transformation before loading data	Highly scalable as it can leverage the processing power of the target system for transformations
Storage Requirements	Higher storage requirements as both raw and transformed data need to be stored	Lower storage requirements as only raw data needs to be stored initially, and transformation occurs within the target system
Data Integrity	Higher data integrity as data is cleaned and transformed before loading into the target system	May require additional checks and controls within the target system to ensure data integrity post-transformation
Complexity	Typically more complex due to the need for designing and managing transformation logic	Generally less complex as it leverages the capabilities of the target system for transformations
Flexibility	May be less flexible as transformation logic is predefined and applied uniformly to all data	More flexible as transformations can be tailored to specific use cases within the target system
Real-time Processing	Less suitable for real-time processing due to batch-oriented nature	More suitable for real-time processing as data can be loaded into the target system immediately and transformed on-the-fly
Examples	Apache Spark, Apache Flink, Google Cloud Dataproc	GCS → Spark jobs, Dataflow → BigQuery
Use Cases	For well-defined data models and reporting needs	For big data, data lakes, and agile analytics environments

Feature	Data Mart	Data Warehouse	Data Lake	Data Lakehouse
Visualization
Definition	A subset of a Data Warehouse containing specific data focused on a particular business function or department	A central repository for structured, organized, and processed data, optimized for querying and analysis	A vast repository of raw, unstructured, or semi-structured data stored in its native format	An architecture combining the features of a Data Lake and a Data Warehouse, providing unified analytics on both raw and processed data
Data Type	Structured data tailored for specific business needs	Structured data, typically from operational systems	Raw, unstructured, semi-structured data	Raw, semi-structured, and structured data
Data Storage	Similar to Data Warehouse, stored in relational databases or columnar stores	Stored in a structured format like relational databases (e.g., SQL Server, PostgreSQL) or columnar stores (e.g., Redshift, BigQuery)	Typically stored in distributed file systems like HDFS, AWS S3, or Azure Data Lake Storage	Usually stored in a combination of Data Lake storage and structured data formats like Parquet, Delta Lake, or Apache Iceberg
Data Processing	Similar to Data Warehouse, batch processing is prevalent, with ETL tools used for data transformation	Primarily batch processing. Data transformation and ETL processes are well-defined and structured. Tools like Informatica, Talend, or Apache Airflow are commonly used	Supports batch and real-time processing. Processing is done on raw data, often using technologies like Hadoop, Spark, or Apache Flink	Combines batch and real-time processing capabilities. Data is transformed and unified using tools like Apache Spark, Databricks, or Delta Lake
Use Cases	Tailored for specific business functions or departments requiring localized analytics and reporting. Commonly used in finance, sales, or marketing	Best suited for structured reporting, business intelligence, and historical analysis. Ideal for organizations with well-defined data requirements	Suitable for exploratory analytics, machine learning, and big data processing where flexibility and scalability are critical	Ideal for organizations looking to combine the benefits of Data Lakes and Data Warehouses for unified analytics on both raw and processed data
Examples	Sales Data Mart, Finance Data Mart, HR Data Mart	Amazon Redshift, Google BigQuery, Snowflake	Hadoop Distributed File System (HDFS), Amazon S3, Azure Data Lake Storage	Databricks Delta Lake, AWS Glue, Google BigQuery Omni

Feature	Parquet	Avro
Data Storage	Columnar	Row-based
Schema	Self-describing, stored in file metadata	Stored with data, language-independent
Compression	Highly compressed, supports multiple compression codecs (Snappy, Gzip, LZO, etc.)	Compressed, supports Deflate and Snappy codecs
Performance	Optimized for analytical (OLAP) workloads, fast data retrieval and processing	Optimized for write-intensive, big data operations, efficient for accessing all fields
Language Support	Language-agnostic, supported by many big data frameworks (Spark, Hive, Impala)	Language-independent, can be used across different programming languages
Advantages	Highly compressed, efficient storage Columnar format enables fast data retrieval and processing Supports schema evolution and complex data types Widely adopted and integrated with major big data frameworks	Language-independent, can be used across different programming languages Efficient for write-intensive, big data operations Supports schema evolution and complex data types Compact binary format reduces storage requirements
Disadvantages	Not human-readable Difficulties in applying updates, requires deleting and recreating the file	Data is not human-readable Not as widely integrated as Parquet in some big data frameworks
Use Cases	Large-scale data analytics and business intelligence Efficient storage and processing of structured, semi-structured, and unstructured data Integration with cloud-based data warehousing and processing services (AWS Athena, Amazon Redshift Spectrum)	Efficient storage and processing of big data workloads Data exchange and serialization between different systems and applications Streaming and real-time data processing (Apache Kafka)

Database Federation

Overview
Features
Pros & Cons

Data Federation is a technique used to integrate data from disparate sources and provide a unified, coherent view of data to the user. Often used in enterprise applications where data is distributed across multiple databases or systems.

Relational Database

Structured Query Language (SQL) is a programming language used for managing and manipulating relational databases.

Query Flow

Key Concepts

Attribute: A property or characteristic of an entity
Column: A vertical data element in a table
Constraint: A condition that must be met for a row to be inserted into a table, such as a column being non-null or unique values (column/table constraints)
Database Management System (DBMS): Database Management System, the software that manages and controls access to a database
Default: Default allows to add values to the column if the value of that column is not set
Primary Key: Non-null unique identifier for a row
Foreign Key: Field or combination of fields that establishes a link between two tables. It enforces referential integrity by ensuring that values in one table's key match with values in another table's key. This relationship allows for data consistency and facilitates queries across related tables
Record: A row in a table
Schema: The structure or blueprint of the database, defining the tables, columns, and relationships
Table: A collection of related data
View: Virtual table that is made up of elements of multiple physical or "real" tables

Language	Purpose	Examples	Features
DCL (Data Control Language)	Manages access and permissions to data	`GRANT`, `REVOKE`	Controls who can access, manipulate, or delete data
DDL (Data Definition Language)	Defines and manages database structure	`CREATE`, `ALTER`, `DROP`	Used to define tables, indexes, constraints, etc
DML (Data Manipulation Language)	Manipulates data within the database	`INSERT`, `UPDATE`, `DELETE`	Allows adding, modifying, and removing data
DQL (Data Query Language)	Retrieves data from the database	`SELECT`	Primarily used for querying data from tables

Aspect	WHERE Clause	HAVING Clause
Purpose	Filters individual rows based on specific conditions	Filters groups of rows after aggregation (using `GROUP BY`)
Data Considered	All rows in the table	Groups created by the `GROUP BY` clause
Condition Type	Can use any comparison operators, logical operators	Must use aggregate functions (`SUM`, `COUNT`, `AVG`, `MIN`, `MAX`) or expressions involving them
Execution Timing	Applied before rows are grouped (more efficient)	Applied after rows are grouped (less efficient)
Requirement	Can be used with or without `GROUP BY`	Requires a `GROUP BY` clause
Compatibility	Can be used with `SELECT`, `UPDATE`, and `DELETE` statements	Can only be used with `SELECT` statements
Filtering Logic	Filters rows that meet the condition, excluding others	Filters groups that meet the condition, excluding others

Feature	TRUNCATE	DELETE
Category	DDL (Data Definition Language)	DML (Data Manipulation Language)
Function	Removes all rows from a table	Removes specific rows based on conditions (`WHERE` clause)
Filtering	Not possible	Possible using `WHERE` clause
Transaction	Cannot be used within a transaction	Can be used within a transaction
Constraints	Disables foreign key constraints temporarily	Triggers foreign key constraints and other constraints
Transaction Log	Records only deallocated data pages	Records each deleted row
Rollback	Not possible	Possible (if transaction is active)
Speed	Faster	Slower
Locking	Exclusive lock on the table	Locks individual rows being deleted
Identity Columns	Resets auto-incrementing values	Preserves existing values
Triggers	Does not fire triggers	May fire triggers (depending on definition)
Permissions Required	`ALTER` on the table	`DELETE` on the table
Use Cases	Clearing temporary or staging tables Removing all data before reloading Fast data deletion for large tables	Deleting specific records based on criteria Selective data removal with rollback possibility Maintaining existing identity column values

Feature	Primary Key	Unique Key	Foreign Key
Purpose	Uniquely identifies each row	Ensures uniqueness for a set of columns	Links data between two tables
Uniqueness	Mandatory (one and only one)	Enforced (no duplicates)	Enforced (references unique values)
Null Values	Not allowed	Allowed (one per column)	Not allowed
Number per Table	One	Multiple	Can reference multiple tables
Indexes	Typically creates a clustered index	May or may not create an index	Typically does not create an index
Auto Increment	Supported	Not supported	Not supported
Foreign Key Reference	Can be referenced by foreign keys	Can be referenced by foreign keys	References a primary or unique key

Feature	Aggregation	Window Function
Functionality	Applies to entire groups of data	Applies to each row within a group
Purpose	Summarizes data by reducing it to a single value per group	Calculates running totals, subtotals, ranking, percentiles, on a defined partition of data
Common Functions	`SUM`, `COUNT`, `AVG`, `MIN`, `MAX`	`CUMSUM`, `ROW_NUMBER`, `DENSE_RANK`, `PERCENT_RANK`, `LAG`, `LEAD`
Output Format	Reduced DataFrame with one row per group	Maintains the original DataFrame structure with the same number of rows
Example	`SELECT department, COUNT(*) FROM employees GROUP BY department`	`SELECT name, salary, AVG(salary) OVER (PARTITION BY department ORDER BY hire_date ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) FROM employees`
Use Cases	Suitable for summarizing data across groups, such as calculating total sales per region or average order value per customer segment	Ideal for analytical tasks requiring comparisons between rows, like calculating moving averages, cumulative sums, or identifying top N values within groups

Feature	UNION	UNION ALL	INTERSECT	EXCEPT
Purpose	Combines the results of two or more sets and removes duplicates	Combines the results of two or more sets without removing duplicates	Returns elements common to both sets	Returns elements present in the first set but not in the second
Syntax	`SELECT id FROM A UNION SELECT id FROM B`	`SELECT id FROM A UNION ALL SELECT id FROM B`	`SELECT id FROM A INTERSECT SELECT id FROM B`	`SELECT id FROM A EXCEPT SELECT id FROM B`
Example	`A = {1, 2, 3}, B = {2, 4, 5}` → `{1, 2, 3, 4, 5}` (w/o duplicates)	`A = {1, 2, 3}, B = {2, 4, 5}` → `{1, 2, 2, 3, 4, 5}` (w/ duplicates)	`A = {1, 2, 3}, B = {2, 4, 5}` → `{2}` (elements common to both sets)	`A = {1, 2, 3}, B = {2, 4, 5}` → `{1, 3}` (elements in A but not B)

Type	Definition	Features	Use Cases
Skip List	Probabilistic data structure for ordered sets or maps, offering efficient search and insertion with average case complexity better than balanced trees	Used in Redis	In-memory
Hash Index	Uses a hash function to quickly map data keys to their locations, ideal for fast lookups	Most common in-memory index solution	In-memory
SSTable	File format for storing data in sorted order on disk, enabling efficient retrieval operations	Immutable data structure. Seldom used alone	Disk-based
LSM Tree	Data structure that combines in-memory and disk-based storage for ordered data, optimizing write performance and later merging for efficient reads	High write throughput. Disk compaction may impact performance	Memory + Disk
B-Tree	Self-balancing tree data structure for sorted data, allowing efficient search, insertion, and deletion operations	Most popular database index implementation	Disk-based
Inverted Index	Used for text retrieval, where words are mapped to documents they appear in, facilitating fast full-text searches	Used in document search engine such as Lucene	Search document
Suffix Tree	Stores suffixes of words, enabling efficient searches for patterns and substrings within a text	Used in string search, such as string suffix match	Search string
R-Tree	Stores spatial data like points, rectangles, or polygons, allowing efficient searches for objects within a specific area	Nearest neighbor	Search multi-dimension shape

Type	Definition	Example
SELECT	Retrieves data from a database	`SELECT column1, column2 FROM table_name`
INSERT	Adds new records to a table	`INSERT INTO table_name (column1, column2) VALUES (value1, value2)`
UPDATE	Modifies existing records in a table	`UPDATE table_name SET column1 = value1, column2 = value2 WHERE condition`
DELETE	Deletes records from a table	`DELETE FROM table_name WHERE condition`
JOIN	Combines rows from two or more tables based on a related column	`SELECT * FROM table1 INNER JOIN table2 ON table1.column = table2.column`
GROUP BY	Groups rows with identical values into summary rows	`SELECT column1, COUNT(*) FROM table_name GROUP BY column1`
HAVING	Filters records grouped by GROUP BY clause	`SELECT column1, COUNT() FROM table_name GROUP BY column1 HAVING COUNT() > 10`
ORDER BY	Sorts the result set in ascending or descending order	`SELECT * FROM table_name ORDER BY column1 DESC`
WHERE	Filters records based on specified conditions	`SELECT * FROM table_name WHERE condition`
DISTINCT	Returns only distinct (different) values	`SELECT DISTINCT column1 FROM table_name`
UNION	Combines the result sets of two or more SELECT statements	`SELECT column1 FROM table1 UNION SELECT column1 FROM table2`
TRANSACTION	Groups a set of SQL statements into a single unit of work	`BEGIN TRANSACTION; ... COMMIT; or ROLLBACK;`
INDEX	Creates an index on a table	`CREATE INDEX index_name ON table_name (column1)`
VIEW	Virtual table derived from one or more tables	`CREATE VIEW view_name AS SELECT column1, column2 FROM table_name WHERE condition`
TRIGGER	Executes a set of actions when a certain event occurs on a table	`CREATE TRIGGER trigger_name BEFORE INSERT ON table_name FOR EACH ROW BEGIN ... END`
SUBQUERY	Nested query inside another query	`SELECT column1 FROM table_name WHERE column1 IN (SELECT column1 FROM table2 WHERE condition)`
CASE	Provides conditional logic within a query	`SELECT column1, CASE WHEN condition THEN result1 ELSE result2 END AS result FROM table_name`

Process of organizing data in a relational database to minimize redundancy and improve data integrity.

Reduce data duplication: This saves storage space and minimizes maintenance headaches
Enhance data consistency: When a piece of data changes, it only needs to be updated in one place, ensuring consistency across the database
Improve data retrieval efficiency: Normalized databases allow for faster and more efficient querying of data

Normalization is achieved through a series of steps, each referred to as a Normal Form (NF) Each subsequent form builds upon the previous one, progressively reducing redundancy

Normal Form

Key Points

Steps

Example

0NF

Non-normalized data

Name	Address	Movie	Salutation
Jane Doe	1st Street	Pirates of the Caribbean, Game of Thrones	Ms
Joe Doe	38 Street	Kung Fu Panda, Squid Game	Mr
Joe Doe	8th Ave	Game of Thrones	Mr

1NF

Each table cell should contain a single value
Each record has to be unique

Identify repeating groups within a table
Create separate tables for these groups
Establish relationships between tables using primary and foreign keys. (A foreign key in one table references the primary key of another table, linking related data)

Flatten the table by storing in a cell a single value
Name Address Movie Salutation
Jane Doe 1st Street Pirates of the Caribbean Ms
Jane Doe 1st Street Game of Thrones Ms
Joe Doe 38 Street Squid Game Mr
Joe Doe 38 Street Squid Game Mr
Joe Doe 8th Ave Game of Thrones Mr

2NF

The table must already be in 1NF
Single column Primary Key that doesn't functionally dependant on any subset of candidate key relation

Identify non-key attributes that depend only on a part of the primary key
Create separate tables for these attributes, with a new primary key formed by the relevant part of the original key and any additional attributes that determine the value of the dependent attribute(s)
Establish foreign key relationships between the new table and the original table

Membership ID Name Address Salutation
1 Jane Doe 1st Street Ms
2 Joe Doe 38 Street Mr
3 Joe Doe 8th Ave Mr
Membership ID Movie
1 Pirates of the Caribbean
1 Game of Thrones
2 Squid Game
2 Squid Game
3 Game of Thrones

3NF

The table must already be in 2NF
Has no transitive functional dependencies
- If an A functionally determines B (A → B; or B is dependant on A, or B comes from an A ) and simultaneously B determines C (B → C), then A is transitively dependant on C (A → C)
- birth code → name, age; name, age → address; birth code → address

Identify transitive dependencies: Analyze the relationships between non-key attributes. Look for situations where one non-key attribute depends on another non-key attribute, which ultimately depends on the primary key
Decompose the table: If you find transitive dependencies, create a new table to isolate the dependent attribute(s) and any other attributes that determine their values
Establish foreign key relationships: Link the new table back to the original table using a foreign key that references the relevant determining attribute(s)

Membership ID Name Address Salutation ID
1 Jane Doe 1st Street 2
2 Joe Doe 38 Street 1
3 Joe Doe 8th Ave 1
Membership ID Movie
1 Pirates of the Caribbean
1 Game of Thrones
2 Squid Game
2 Squid Game
3 Game of Thrones
Salutation ID Salutation
1 Mr
2 Ms

Boyce-Codd Normal Form (BCNF)

The table must already be in 3NF
Decompose non-trivial functional dependencies by separating them into a new tables and link them with foreign keys (FK)
Ensure that every determinant is a candidate key

4NF

The table must already be in BCNF
If no database table instance contains 2 or more, independent and multivalued data describing the relevant entity

5NF

The table must already be in 4NF
Decompose join dependencies by separating them into a new tables and link them with foreign keys (FK)

6NF

The table must already be in 5NF
Decompose temporal dependencies which involve time-varying relationships between attributes

Type	Definition	Syntax
Left Join / Left Outer Join	Everything on the left + everything on the right that matches	`SELECT * FROM table1 LEFT JOIN table2 ON table1.key = table2.key`
Anti Left Join	Everything on the left that is NOT on the right	`SELECT * FROM table1 LEFT JOIN table2 ON table1.key = table2.key WHERE table2.key IS NULL`
Right Join / Right Outer Join	Everything on the right + everything on the left that matches	`SELECT * FROM table1 RIGHT JOIN table2 ON table1.key = table2.key`
Anti Right Join	Everything on the right that is NOT on the left	`SELECT * FROM table1 RIGHT JOIN table2 ON table1.key = table2.key WHERE table1.key IS NULL`
Full Outer Join	Everything on the left + everything on the right	`SELECT * FROM table1 FULL OUTER JOIN table2 ON table1.key = table2.key`
Anti Outer Join	Everything on the left and right that is unique to each other	`SELECT * FROM table1 FULL OUTER JOIN table2 ON table1.key = table2.key WHERE table1.key IS NULL OR table2.key IS NULL`
Join / Inner Join	Only the things that are common	`SELECT * FROM table1 INNER JOIN table2 ON table1.key = table2.key`
Cross Join	All combinations of rows from both tables (Cartesian Product)	`SELECT * FROM table1 CROSS JOIN table2`
Self Join	The table is joined with itself	`SELECT * FROM table1 AS t1 INNER JOIN table1 AS t2 ON t1.key = t2.key`

Feature	One-to-One	One-to-Many	Many-to-Many
Visualization
Definition	Each record in one table is associated with exactly one record in another	Each record in one table can be associated with one or more records in another	Records in both tables can be associated with multiple records in the other
Example	Person → Passport	Department → Employee	Student → Course

Issue	Definition	Impact	Solutions
Cartesian Products	Occurs when 2 or more tables are joined without a proper `WHERE` clause, resulting in huge datasets	High resource consumption Slow query performance Database server overload	Ensure `WHERE` clause in `JOIN` queries Optimize query design and indexing Limit result set size
Connections overhead	Establishing multiple connections to the same database	Resource wastage Decreased performance due to connection establishment overhead Scalability issues	Implement connection pooling mechanism Tune connection timeout settings
Data Inconsistency	Same data exists in different forms across the database	Incorrect query results Application errors Data corruption	Use transactions and locking mechanisms Implement referential integrity constraints Regular data validation and cleansing
Deadlocks	2 or more transactions are waiting for each other to release locks	Transaction failure Resource wastage System deadlock	Analyze and optimize transaction isolation levels Minimize transaction duration
Full Table Scans	Scanning the entire table to satisfy a query instead of using indexes	Slow query performance High I/O load	Create appropriate indexes Tune SQL queries to leverage existing indexes Partition large tables for better performance
Missing Indexes	Indexes that are required to satisfy a query are missing	Slow query performance High CPU and I/O utilization	Analyze query execution plans Create indexes based on query patterns and workload characteristics
N+1	Initial query fetches N rows, and then N additional queries are executed to fetch related data for each row	Increased number of queries Poor performance High resource consumption	Use eager loading or batch loading in ORMs Optimize data access patterns Implement custom data retrieval methods
Over/Under fetching	Overfetching occurs when unnecessary data is fetched from the database, while underfetching occurs when necessary data is not fetched	Wastage of network bandwidth and server resources Reduced application responsiveness Increased latency	Optimize SQL queries to fetch only required data Use pagination and filtering to limit result set size Normalize data to reduce redundancy and improve query efficiency
Poorly Designed Schema	Lacks normalization, proper data types, and relationships, leading to inefficient data access	Data redundancy Update anomalies Inefficient queries and joins	Normalize the database schema Choose appropriate data types Define relationships and constraints Analyze and refactor existing schema structures
Subquery Inefficiency	Subqueries are used inefficiently, resulting in poor query performance	Slow query execution Resource contention	Rewrite subqueries as joins where possible Use correlated subqueries judiciously Ensure subquery optimization through proper indexing and query tuning

Data Storage Solutions​

Key Points for Considerations​

Database Federation​

Pros​

Cons​

Relational Database​

Query Flow​

Key Concepts​