This briefing document summarises the key information from the article "10 Must-Know Database Types for System Design Interviews". The article provides an overview of various database types, their use cases, design considerations, and examples, specifically tailored for system design interviews.
Choosing the appropriate database is highlighted as a critical decision in system design, impacting performance, scalability, and complexity management. The article aims to equip readers with a strong understanding of different database types and when to use each.
Here are the 10 database types discussed, with their main themes, important ideas, and key facts:
What it is: Stores data in structured tables with rows and columns, using foreign keys to define relationships and SQL for querying.
When to use it: Ideal for structured and relational data, when strong consistency (ACID compliance) is required (e.g., digital payments), and for complex queries and reporting using SQL.
Key design considerations: Indexing for read performance (avoid over-indexing), Normalization vs Denormalization (balancing redundancy and read speed), avoiding excessive joins and cross-shard joins, and Sharding for horizontal scaling (mindful of complexity). Scaling involves Vertical scaling (limited) and Horizontal scaling (read replicas, partitioning, caching).
Example databases: PostgreSQL, MySQL, Oracle DB.
Quote: "A Relational Database stores data in structured tables with rows and columns. It’s like an Excel sheet, but much more powerful."
What it is: Stores data directly in RAM for blazingly fast operations.
When to use it: For ultra-low latency requirements (e.g., real-time leaderboards), when data is temporary or can be regenerated (e.g., caching trending search results), and to reduce load on the main database by acting as a caching layer.
Key design considerations: Volatility (data loss on crash/restart unless persistence is enabled via RDB or AOF), Eviction Policies when memory is full (LRU, LFU, TTL), Keep It Lean (store only hot data), and Replication for read performance and failover (typically asynchronous).
Example databases: Redis, Memcached, Apache Ignite, Hazelcast.
Quote: "An In-Memory Database stores data directly in RAM instead of disk. This makes it blazingly fast for read and write operations."
What it is: Simplest database type, storing data as unique key-value pairs like a distributed HashMap. No tables, schemas, or relationships.
When to use it: For fast lookups by unique key (constant-time O(1) reads/writes), when complex queries or relationships are not needed, and for high-volume, low-latency workloads demanding millions of reads/writes per second.
Key design considerations: Lookup-only access (no filtering, sorting, joining, or secondary indexes), No enforced schema (values can be diverse, putting burden on the application), and Easy horizontal scaling through key-based partitioning (using consistent hashing or range-based partitioning with high-cardinality keys).
Example databases: Redis, Amazon DynamoDB, Riak KV, Aerospike.
Quote: "A Key-Value Database is the simplest type of database. It stores data as a collection of key-value pairs, where each key is unique and maps directly to a value."
What it is: Stores data as documents, typically in JSON or BSON, which are self-contained and schema-less.
When to use it: When data structures vary across records, for storing nested or hierarchical data (avoiding joins), and for schema flexibility and fast iteration (evolving schema without downtime).
Key design considerations: Indexing for performance (can add overhead for deeply nested fields), Document Size Limits (may require splitting large documents), Denormalization (embedding related data to avoid joins, can increase write complexity), and Sharding for horizontal scaling (requires careful shard key design).
Example databases: MongoDB, Couchbase, Firebase Firestore.
Quote: "A Document Database stores data as documents, typically in JSON or BSON format. Each document is a self-contained unit with fields and values making it flexible and schema-less."
What it is: Designed to store and navigate relationships, representing data as nodes (entities) and edges (relationships).
When to use it: When relationships are central to your data (e.g., social networks), when you need traversals or recommendations (discovering indirect relationships and patterns), and for efficient complex relationship queries.
Key design considerations: Traversal Efficiency (handling multiple levels of relationships better than relational joins), Indexing (essential for locating starting nodes), Schema Flexibility (optional but consistent labelling is helpful), Scalability (choose distributed graph databases for large datasets), and Query Language (domain-specific languages like Cypher and Gremlin).
Example databases: Neo4j, Amazon Neptune, ArangoDB.
Quote: "A Graph Database is designed to store and navigate relationships. It represents data as nodes (entities) and edges (relationships between entities)."
What it is: Stores data in tables, rows, and columns, but each row can have a different set of columns. Optimized for large-scale, write-heavy workloads and high-speed retrieval. Data is stored by column families for fast access to specific columns.
When to use it: For high write throughput at scale (append-only workloads), when data grows continuously at massive scale, and for fast lookups and flexible row-level schemas.
Key design considerations: Schema design (critical for performance, focusing on Row keys, Partition keys, and Clustering columns), Denormalization (duplicating related data to speed up reads as joins are not efficient), Indexing (primary indexes via partition and clustering keys, secondary indexes can be expensive), Sharding and Replication (automatic handling, but bad partition keys can cause hotspots), and Tunable consistency (balancing availability, performance, and correctness).
Example databases: Apache Cassandra, ScyllaDB, Google Bigtable, HBase.
Quote: "A Wide-Column Database stores data in tables, rows, and columns, but unlike traditional relational databases, each row can have a different set of columns."
What it is: Purpose-built to store, retrieve, and analyze time-stamped data points, perfect for tracking changes over time.
When to use it: When data is generated in chronological order (steady, timestamped flow), when you need to perform rollups, aggregations, or downsampling (summarising historical data), and for high write volume with time-bound queries.
Key design considerations: Write-optimized (append-only writes, compression, sorted storage), Time-based retention (auto-expiring data), Time as the primary index (efficient time-scoped queries, some support tag-based filtering), Downsampling (rolling up raw data into aggregated summaries), and Sharding by time ranges or series IDs for horizontal scalability.
Example databases: InfluxDB, TimescaleDB, Prometheus, Amazon Timestream.
Quote: "A Time-Series Database (TSDB) is purpose-built to store, retrieve, and analyze data points that are time-stamped."
What it is: Designed to efficiently store, index, and search through large volumes of textual data, supporting full-text search, ranking, tokenization, stemming, fuzzy matching, and relevance scoring using inverted indexes.
When to use it: When you need fast, flexible search over text, when you want ranked or fuzzy search (handling typos and variations), and when you need to combine search with structured filtering.
Key design considerations: Inverted Indexing (mapping terms to documents for fast lookups), Tokenization & Stemming (processing text for better matching), Relevance Scoring (ranking results using algorithms like TF-IDF or BM25), and Scalability (horizontal scaling, distributed indexing, sharding, and replication).
Example databases: Elasticsearch, Apache Solr, MeiliSearch, Typesense.
Quote: "A Text-Search Database is designed to efficiently store, index, and search through large volumes of textual data."
What it is: Designed to store and query geospatial data (locations, shapes, distances, coordinates), supporting complex spatial operations.
When to use it: When your system uses location-based features (e.g., ride-hailing apps), and when you need to store and query shapes or regions (geometric operations).
Key design considerations: Spatial Indexing (specialized structures like R-Trees, QuadTrees, Geohashing), Accuracy vs Performance (using approximations for performance, may not be suitable for mission-critical use cases), Scalability (using indexes, caching, limiting scope), and Integrations (many databases offer built-in or extended spatial support).
Example databases: PostGIS (for PostgreSQL), MongoDB, Google BigQuery GIS, Elasticsearch.
Quote: "A Spatial Database is designed to store and query geospatial data —information about locations, shapes, distances, and coordinates on Earth."
What it is: A storage system optimized for handling large, unstructured binary files (images, videos, PDFs, backups). Stores content as binary blobs with metadata for retrieval.
When to use it: When you need to store large media files at scale, when data doesn’t fit a traditional database (massive binary payloads), and when you want scalable, low-cost storage for large volumes of cold or rarely accessed data.
Key design considerations: Metadata Management (storing metadata in a separate database linked by the blob's key), Access Control (typically requires explicit handling via signed URLs, token-based access, or IAM policies), Chunking large files (for reliability, performance, and resuming transfers), Scalability and durability (designed for infinite scale and high durability), and CDN Integration (for reducing latency and improving performance for end users).
Example databases / services: Amazon S3, Google Cloud Storage, Azure Blob Storage, MinIO.
Quote: "A Blob Store is a storage system optimized for handling large, unstructured binary files like images, videos, PDFs, backups, or logs."