NoSQL, fast & predicable performance, seamless scale, encryption at rest for sensitive information, on-demand backup, automatic deletion of expired items to save cost (TTL), automatic spread out for HA and durability, GlobalTable to sync across regions
Table - schemeless, has collection of
Items - mandatory primary key (see below) & optional (0 or more) secondary indexes for query, has
Attributes - scalar, or nested (up to 32 levels deep)
PK: must be unique, two options, partition key only | partition key + sort key; partition key (hash attribute) determine where to store; partition + sort key (range attribute) then items with same partition key put together but must have different sort key and are sorted that way
Secondary indexes - one table can have 0 or more, two kinds: global or local; global: can have entirely different partition+sort key than defined in table, while local has same partition key & different sort key; up to (5 global + 5 local) secondary indexes per table; with secondary index it's like created a view of the table to query from different angle, and can specify what attributes from the base table are copied (projected) to the index; no index no alternative query pattern; affect writing performance (index need to be updated) and also take space; index is not automatically used (like SQL), only Query/Scan uses the index; eventual consistency;
DB Stream to capture data modification events in tables - in near real time, ordered, every event a record, lifetime 24 hours, serve an event source for lambda, and for replication, materialized views, data anaylsis, etc.
Limits: see developer guide
API: control plane - table levels, create/describe/list/update/delete tables; data plane - CRUD; streams - list, desc, get iterator, get records
When creating table / secondary index, primary key (partition key, sort key) must provide type - string, number, binary; otherwise don't need to provide any type.
Scalar types:
Document types (can nest)
Set types - must of same type, must not be empty
Write - if HTTP 200 received, data is written and durable.
Read - eventual consistent read (consistent usually within 1 s after write across all storage); strongly consistent read - guaranteed most recent data, may not available in case of network failure
Pricing: storage (25G free) + reading capacity unit (RCU) + writing capacity unit (WCU)
Scaling: Auto scaling: max, min, percentage target, OR provisioned throughput
Partition - entirely by AWS, never manage partition. Partition key determines partition - best practice make partition key evenly divide activity
Idempotent - if an operation is idempotent you can do it many times and the effect would be the same; when result is uncertain (network error) idempotent operation is ok to retry but otherwise needs certain check to retry;
Access - via HTTPS web service, stateless, each request with signature, authorize by IAM; NOTE: In December 2017, AWS began the process of migrating all DynamoDB endpoints to use secure certificates issued by Amazon Trust Services (ATS).
Access : authentication - root user, IAM user, IAM role (for Federated user, AWS Service user, applications on EC2); can use conditions to achieve item level / attribute level control; so worth further study here, can allow client direct interact with Dynamo DB without intermediate lambda (example)
CreateTable - define name, key schema, attribute definition (type for key attributes), Throughput Settings, with a JSON
DescribeTable - call with table name
with table name and item (native support of JSON), if return 200 it's done (no commit); may request ReturnValue ALL_OLD
Most efficient way to retrieve item, must provide pk, by default eventual consistency read, can provide ConsistRead parameter to request strong consist read; can use Projection Expression to return a subset of attributes; Batch get/write can reduce network round trips; they are wrapper of individual requests;
(on table with composite PK, i.e. partition key + sort key, provide exact partition key & optional comparison condition on sort key)
Get all items with a particular partition key, can limit by sort key, also quick; no SQL JOIN available, one table only; must provide partition key; use parameter binding (:par1) ; use KeyConditionExpression(see below, very limited) to supply partition key with equation condition and optional sort key with comparison condition; optional FilterExpression for non-key condition; when to secondary index, can request strong consistency but on global index only eventual consistency supported; result set always sorted by sort key; Limit - maximum number of returned results, applied before filtering, so may return fewer than actually able to; Pagination - return result <= 1M, otherwise result with LastEvaluatedKey indicating remaining of results, should do another query with exactly same conditions & ExclusiveStartKey to continual retrieval of results, --page-size can limit number of items in a page (not in low level API), may return empty set with LastEvaluatedKey if all items read are filtered out; SDK may provide advanced pagination through API abstraction; Also return ScannedCount (items match key condition before filtering) & Count (returned items match both key condition & filter condition), for this page only; there's no SQL style count(*) available; eventual consistency by default, may request strong consistency
Key ConditionExpression:
Can use ExclusiveStartKey & ScanIndexForward to control scan starting point & direction for pagination.
Filter Expression:
Get all items in a table, resource hungry for large tables, use either sparsely, with small table, or just have to. FilterExpression to filter (if table is large still charged for scan even few results returned), ProjectionExpression to limit returned attributes; when to secondary index can request strong consistency but on global index only eventual consistency supported; further see guide;
UpdateTable, then GlobalSecondaryIndexUpdates / Create...; see guide; provide index name, key schema, (attribute definition: provide to the table if used in index as key), projection, Throughput Settings, quite like creating a table and in fact also like; not ready until Backfilling attribute (in DescribeTable) turned false;
(actually "upsert", update or insert, no batch operation) - provide table name, keys, UpdateExpression (like SQL set with parameter placeholder) to update subset of attributes, optional ConditionExpression to update on condition (strong consistency, may use as optimistic lock, conditional update is idempotent if on the to-be-updated attribute, consume write capacity only, ConditionalCheckFailedException if condition does not met); ExpressionAttributes (bind parameter to values); ReturnValues specify what to return from the operation ALL_OLD entire item before update ALL_NEW entire item after update UPDATED_OLD UPDATED_NEW only return old/new attributes having been updated;
Number attribute increased/decreased, unconditionally, during UpdateItem, similar to sequence in SQL; if retry failed operation would risk updating twice, ok for relaxed usage such as visitor count but not for financial transactions; (use conditioned update in such circumstances)
(BatchWriteItem can batch delete) - provide table name and pk, optional ConditionExpression delete in condition, may ReturnValyes ALLL_OLD
ConditionExpression:
condition-expression ::=
operand comparator operand
| operand BETWEEN operand AND operand
| operand IN ( operand (',' operand (, ...) ))
| function
| condition AND condition
| condition OR condition
| NOT condition
| ( condition )
comparator ::=
=
| <>
| <
| <=
| >
| >=
function ::=
attribute_exists (path)
| attribute_not_exists (path)
| attribute_type (path, type)
| begins_with (path, substr)
| contains (path, operand)
| size (path)
Auto-delete expired items. Enable on table. Per item basis.
Local - as a .jar, also with Maven, or with Toolkit for Eclipse; now comes with a free and very helpful web-based user interface known as the DynamoDB JavaScript Shell
AWS - see guide
Access - use console, CLI, API
Getting started - see examples with various languages; on a first impression Javascript has much natural integration;
SDK provides Low-level Interface, Document Interface & High-level interface, depends on the language
SDK do (1) format request (2) sign request (3) send request / receive response (4) extract result (5) basic retry logic in error
Low-level interface: available in every language, method resemble DynamoDB operation; construct request, send, etc.; use Data Type Descriptor to specify data type;
Document interfaces: perform data plane operations; data type implied (no Data Type Descriptor); convert JSON <-> native DynamoDB data types; available in Java, Javascript & Node.js ... but not all languages; provides a document wrapper;
Object-persistence Interface: do not perform operation on DB but work on objects that represent items in DB; available in Java & .NET, in Java use annotation similar to JPA (@DynamoDBTable), DynamoDBMapper as wrapper of low level client;
Low-level API: the on-wire HTTPS protocol
Error handling: returns http code (400), exception name (ResourceNotFoundException) & error message; SDK cares of propagation of error in language so programmer can try/catch; some errors are ok to retry (server error, 5xx) while others not (4xx); SDK do their own retries; REQUEST ID in response to quote if need support; retry can be configured with ClientConfiguration (java); Batch operation - wrap around individual operations, some could fail while others success, return individual requests that fail; most likely failure - throttling, can retry but strongly recommended "exponentially back off strategy", be nice to server :)
Durability by default: no SLA, no official figure, but looks like quite durable (spreading out & duplication), there's guessing that it's less than S3, backup is suggested; quote "Amazon talks about customers backing up DynamoDB to S3 using MapReduce. They also say that some customers back up DynamoDB using Redshift, which has DynamoDB compatibility built in", non-AWS backup also recommended;
On-demand backup: one click (or API call) backup & restore, no impact on performance & availability, complete in seconds regardless of size, can make unlimited numbers of backups, all copies retained until explicit delete; charge on storage space usage;
Restore - takes time according to size, to a new destination table
Continuous backup & point in time recovery - one click enable, can restore up to 35 days in any second, no performance penalty, but restore can take hours to complete; price based on size;
Global table - multi-region, multi-master database
Encryption at rest - encrypt data at rest with AWS Key Management Service (KMS), enabled on table level, DynamoDB must be able to access the key to read from table;
In-memory acceleration with DAX - see guide
Tools:
VPC Endpoints for DynamoDB - you can launch Amazon EC2 instances into a virtual private cloud, which is logically isolated from other networks—including the public Internet. With an Amazon VPC, you have control over its IP address range, subnets, routing tables, network gateways, and security settings.
With Cognito - use IAM role to generate temporary credentials for authenticated / unauthenticated users
With Redshift - copy data from DynamoDB to Redshift for SQL based analysis
With Apache Hive (data warehouse) on EMR - read and write data in DynamoDB tables, enable query live data in HiveQL (SQL like), copy data from DynamoDB to S3 and vice-versa, copy to Hadoop Distributed File System (HDFS), and vice-versa, perform join operations
(quote from https://cloudacademy.com/blog/amazon-dynamodb-ten-things/ ) In a typical scenario, Elastic MapReduce (EMR) performs its complex analysis on datasets stored on DynamoDB. Users will often also use AWS Redshift for data warehousing, where BI tasks are carried out on data loaded from DynamoDB tables to Redshift.