DynamoDB get started
DynamoDb
No SQL database is ideal for applications with KNOWN ACCESS PATTERNS. This is different to relational database that has table, row, column.
Access DynamoDB through API (restful) or Object-Relational Mapping (ORM)
ORM is better integrated with programming language, e.g for python its Boto (AWS SDK for Python (Boto3) with DynamoDB)
The data is accessed through IAM (identity access management), instead of traditional database username, password, so you will need a user that has access to the dynamodb.
DynamoDB streaming to detect record level changes, and pipe it with a lambda function
Or integrated with S3 to do automatic table
Or kenethsis for streaming pipeline
core concepts, table, item, attribute, index
Table is a collection of items as below
AccountID, CreateDate, Country, Details
1, 2020, AU, a JSON object
2, 2021, US, a JSON object
2, 2022, US, a JSON object
3, 2020, AU, a JSON object
Every row here is an item
Item is a collection of attributes or key/value pairs
So here details is an attribute, that contains a json object
Here AccountID is used as the partition key
The createdate is used as the sort key
The AccountID + Createdate forms the primary key (unique)
If the accountID for example is already unique, then primary key can be only the accountID instead of a composite of accountid and createdate.
Global secondary index
For quick and efficient access of data, you may create indices as well
e.g. index on country, so only a certain country's rows/items are accessed.
without a proper index, it will scan through all items potentially costing more time and money.
In AWS Management Console, search for DynamoDB
To start using DynamoDB, the first thing is to create a Table.
Give it a name, and PARTITION KEY, and optional SORT KEY
Note the parition key and sort key can not change later after creation of table.
Choose billing to On-demand, or provisioned by allocating read/write capacity in advance.
Choose secondary indexes (local index, global index) to help with query performance
Local index applies to only the sort key if any created above. It doesn't impace partition at all, but offer different ways to sort the items within the partitions, e.g. create date, create by
Global index applies to the whole dataset, ignoreing existing partition key and sort key. Therefore, it needs extra storage to index the data, and costs extra money. e.g. You may want to index on sale price regardless of country, region, etc.
local index can only be created during table creation, i guess its part of the intial storage structure setup
global index can be created any time, because it replicates the data and reorganize it in some way.
Time to Live TTL
After the table is created, clicking on it brings you to a management/summary panel, where you can set the TTL
that is the time for keep the data, after that ,the data is deleted/evicted.
replicas, backups( point-in-time recovery, on-demand backup)
You can also do exports to S3, export to Kinesis data stream
DynamoDB stream, this captures item level changes and push the changes to a Dynamo stream, which can be accessed through DynamoDB streams API from e.g. Lambda.
Items
The console has interface for viewing, adding items.
The items (rows) are stored as DynamoDB JSON format, which uses e.g. S for string, B for boolean, to save space probably. You can use standard json as well.
PartiQL editor
On the console there is the PartiQL editor where you can run SQL like queries.
It still translates to the underlying API calls.
Benefits of DynamoDB
performance and scalability, auto scale
control rules, so easily apply access control
item(row)-level event stream data
item level changes can be stream through API to other venues like Lambda, ElasticSearch or S3 for further use.
This is a bit like Change Data Capture in SQL Server
Time To Live, deleting expired data by timestamp
Support incosistent schema items
Automatic backup on the cloud
Challenges of DynamoDB
DynamoDB is for prividing fast data transactions for applications. It's fast on a transaction level, but slow on analyzing large amount of data
Online Analytical Processing
Data warehousing requires a lot of aggregation, table joins, etc. which are impossible/difficult in DynamoDB. There may tools/modules help to translate those requirements into proper DynamoDB api calls, but key-value databases generally don't support data warehousing, OLAP well.
Querying and SQL
Although PartiQL is available now, but it's probably still limited. Probaly not every fancy SQL in ralational database is available in PartiQL.
Indexing is expensive
Secondary global indexing requires extra data storage/structure and extra cost