DynamoDB get started

DynamoDb

No SQL database is ideal for applications with KNOWN ACCESS PATTERNS. This is different to relational database that has table, row, column.

Access DynamoDB through API (restful) or Object-Relational Mapping (ORM)

ORM is better integrated with programming language, e.g for python its Boto (AWS SDK for Python (Boto3) with DynamoDB)

The data is accessed through IAM (identity access management), instead of traditional database username, password, so you will need a user that has access to the dynamodb.

DynamoDB streaming to detect record level changes, and pipe it with a lambda function

Or integrated with S3 to do automatic table

Or kenethsis for streaming pipeline

core concepts, table, item, attribute, index

Table is a collection of items as below

AccountID, CreateDate, Country, Details

1, 2020, AU, a JSON object

2, 2021, US, a JSON object

2, 2022, US, a JSON object

3, 2020, AU, a JSON object

Every row here is an item

Item is a collection of attributes or key/value pairs

So here details is an attribute, that contains a json object

Here AccountID is used as the partition key

The createdate is used as the sort key

The AccountID + Createdate forms the primary key (unique)

If the accountID for example is already unique, then primary key can be only the accountID instead of a composite of accountid and createdate.

Global secondary index

For quick and efficient access of data, you may create indices as well

e.g. index on country, so only a certain country's rows/items are accessed.

without a proper index, it will scan through all items potentially costing more time and money.

In AWS Management Console, search for DynamoDB

To start using DynamoDB, the first thing is to create a Table.

Give it a name, and PARTITION KEY, and optional SORT KEY

Note the parition key and sort key can not change later after creation of table.

Choose billing to On-demand, or provisioned by allocating read/write capacity in advance.

Choose secondary indexes (local index, global index) to help with query performance

Local index applies to only the sort key if any created above. It doesn't impace partition at all, but offer different ways to sort the items within the partitions, e.g. create date, create by

Global index applies to the whole dataset, ignoreing existing partition key and sort key. Therefore, it needs extra storage to index the data, and costs extra money. e.g. You may want to index on sale price regardless of country, region, etc.

local index can only be created during table creation, i guess its part of the intial storage structure setup

global index can be created any time, because it replicates the data and reorganize it in some way.

Time to Live TTL

After the table is created, clicking on it brings you to a management/summary panel, where you can set the TTL

that is the time for keep the data, after that ,the data is deleted/evicted.

replicas, backups( point-in-time recovery, on-demand backup)

You can also do exports to S3, export to Kinesis data stream

DynamoDB stream, this captures item level changes and push the changes to a Dynamo stream, which can be accessed through DynamoDB streams API from e.g. Lambda.

Items

The console has interface for viewing, adding items.

The items (rows) are stored as DynamoDB JSON format, which uses e.g. S for string, B for boolean, to save space probably. You can use standard json as well.

PartiQL editor

On the console there is the PartiQL editor where you can run SQL like queries.

It still translates to the underlying API calls.

Benefits of DynamoDB

performance and scalability, auto scale
control rules, so easily apply access control
item(row)-level event stream data

item level changes can be stream through API to other venues like Lambda, ElasticSearch or S3 for further use.

This is a bit like Change Data Capture in SQL Server

Time To Live, deleting expired data by timestamp
Support incosistent schema items
Automatic backup on the cloud

Challenges of DynamoDB

DynamoDB is for prividing fast data transactions for applications. It's fast on a transaction level, but slow on analyzing large amount of data

Online Analytical Processing

Data warehousing requires a lot of aggregation, table joins, etc. which are impossible/difficult in DynamoDB. There may tools/modules help to translate those requirements into proper DynamoDB api calls, but key-value databases generally don't support data warehousing, OLAP well.

Querying and SQL

Although PartiQL is available now, but it's probably still limited. Probaly not every fancy SQL in ralational database is available in PartiQL.

Indexing is expensive

Secondary global indexing requires extra data storage/structure and extra cost

Page updated

Google Sites

Report abuse