How Databases Work: SQL, NoSQL, and Data Storage

Databases are organized systems for storing, retrieving, and managing data. This article covers relational and NoSQL databases, SQL, indexing, and ACID properties.

The InfoNexus Editorial TeamMay 7, 20264 min read

What Is a Database?

A database is an organized collection of structured data stored and accessed electronically. Databases are managed by software systems called Database Management Systems (DBMS), which provide mechanisms for storing, retrieving, updating, and deleting data while enforcing consistency rules and controlling concurrent access by multiple users. Databases underpin virtually every modern software application — from web applications and mobile apps to enterprise resource planning systems and scientific data repositories. The global database management system market exceeded $80 billion in revenue in 2024, reflecting the central role databases play in the information economy.

Relational Databases and SQL

The relational database model, introduced by Edgar F. Codd at IBM in 1970, organizes data into tables (also called relations) consisting of rows (records) and columns (attributes). Relationships between tables are expressed through foreign keys — columns in one table that reference the primary key of another. This model enables complex queries that join data across multiple tables. SQL (Structured Query Language) is the standardized language used to interact with relational databases. Major relational DBMS products include MySQL, PostgreSQL, Microsoft SQL Server, and Oracle Database.

Core SQL Operations

  • SELECT: Retrieves rows from one or more tables based on specified conditions.
  • INSERT: Adds new rows to a table.
  • UPDATE: Modifies existing rows that match specified conditions.
  • DELETE: Removes rows matching specified conditions.
  • JOIN: Combines rows from two or more tables based on a related column (INNER JOIN, LEFT JOIN, etc.).
  • CREATE / ALTER / DROP: DDL statements that define, modify, or remove database schema objects.

ACID Properties

Relational databases guarantee reliable transaction processing through the ACID properties, which ensure that database operations maintain data integrity even in the event of errors or system failures:

PropertyDefinitionExample Guarantee
AtomicityA transaction is all-or-nothing; partial completion is not permittedA bank transfer either fully completes or fully rolls back
ConsistencyA transaction brings the database from one valid state to anotherForeign key constraints are never violated
IsolationConcurrent transactions execute as if they were sequentialTwo users updating the same row do not corrupt each other's data
DurabilityOnce committed, a transaction's effects persist even after a system crashData written to disk survives a power failure

How Databases Store Data

Relational databases store data on disk in pages — fixed-size blocks typically 8 KB or 16 KB in size. The database engine maintains a buffer pool (in-memory cache of frequently accessed pages) to reduce disk I/O. Data is organized within pages using structures such as heap files (unordered rows), B-tree indexes, or clustered indexes (where rows are physically sorted by a key). When a transaction modifies data, changes are first written to a write-ahead log (WAL) before being applied to data pages; this ensures durability and enables crash recovery by replaying the log after a failure.

Database Indexing

An index is an auxiliary data structure that allows the database engine to locate rows matching a query condition without scanning every row in a table. The most common index structure is the B-tree (balanced tree), which supports efficient equality and range queries in O(log n) time. A hash index offers O(1) equality lookups but does not support range queries. Full-text indexes enable fast keyword searches within text columns. While indexes dramatically accelerate read queries, they impose overhead on write operations (INSERT, UPDATE, DELETE) because the index must be maintained alongside the data. Query optimizers automatically evaluate available indexes when generating execution plans.

NoSQL Databases

NoSQL (Not only SQL) databases were developed to address scalability and flexibility requirements that relational databases handle less efficiently. NoSQL systems generally trade some ACID guarantees for horizontal scalability and schema flexibility. They are categorized by data model:

NoSQL TypeData ModelExample ProductsTypical Use Cases
Document storeJSON/BSON documentsMongoDB, CouchbaseContent management, catalogs
Key-value storeSimple key → value pairsRedis, DynamoDBCaching, session storage
Wide-column storeRows with variable columns per rowApache Cassandra, HBaseTime-series, IoT data
Graph databaseNodes and edges (relationships)Neo4j, Amazon NeptuneSocial networks, fraud detection
Time-series databaseTimestamped data pointsInfluxDB, TimescaleDBMetrics, monitoring

The CAP Theorem

The CAP theorem, formulated by Eric Brewer in 2000 and formally proved in 2002, states that a distributed data store can provide at most two of the following three guarantees simultaneously: Consistency (all nodes see the same data at the same time), Availability (every request receives a response), and Partition tolerance (the system continues operating even when network partitions separate nodes). Since network partitions are unavoidable in distributed systems, designers must choose between prioritizing consistency (CP systems, e.g., HBase, ZooKeeper) or availability (AP systems, e.g., Cassandra, CouchDB) during a partition event. The later PACELC theorem extended this analysis to also consider latency vs. consistency trade-offs when the network is operating normally.

NewSQL and Distributed SQL

NewSQL databases attempt to combine the horizontal scalability of NoSQL with the full ACID guarantees of relational databases. Examples include Google Spanner (which uses atomic clocks and GPS to synchronize transactions globally), CockroachDB, and YugabyteDB. These systems use consensus algorithms such as Raft or Paxos to achieve distributed agreement on transaction commits. NewSQL systems are widely used by large organizations that need both transactional integrity and the ability to scale across multiple data centers or cloud regions.

technologycomputer sciencedata management

Related Articles