0

I need to design a DynamoDB schema to store items whose attributes are:

  • tid: a UUID string which is a unique identifier of the item
  • timestamp: an ISO-8601-formatted string representing a date and time related to the item
  • Other stuff...

and for the following query (really want to avoid having to do any scans) patterns:

  • Query by tid
  • Query by exact timestamp, and by relational ordering expressions (e.g., <=, BETWEEN, etc.) on timestamp. I.e., query all items from a certain date-time range without knowing their tids in advance.

Is this possible to do efficiently in DynamoDB, or is there perhaps another AWS solution that would serve me better?

0

Given a DynamoDB table as follows:

  • partition key: tid, type string
  • sort key: timestamp, type string

You can query on:

  • tid = 5
  • tid = 5, timestamp between 2018-12-21T09:00:00Z and 2018-12-21T15:00:00Z

Try it out using the awscli, for example to query all items with tid=5:

aws dynamodb query \
    --table-name mytable \
    --key-condition-expression "tid = :tid" \
    --expression-attribute-values '{":tid":{"S":"5"}}'

To query all items for tid=5 and timestamp between 09:00 and 15:00 on 2015-12-21:

aws dynamodb query \
    --table-name mytable \
    --key-condition-expression "tid = :tid AND #ts BETWEEN :ts1 AND :ts2" \
    --expression-attribute-values  '{":tid":{"S":"5"}, ":ts1":{"S":"2015-12-21T09:00:00Z"}, ":ts2":{"S":"2015-12-21T15:00:00Z"}}' \
    --expression-attribute-names '{"#ts":"timestamp"}'

Note: because timestamp is a reserved keyword in DynamoDB, you have to escape it using the expression attribute names.

You could also create the timestamp attribute as a number and then store epoch times, if you prefer.

To query all items with timestamp between 09:00 and 15:00 on 2015-12-21, regardless of tid, cannot be done with the same partition/sort key schema. You would need to add a Global Secondary Index something like this:

  • GSI partition key: yyyymmdd, type string
  • GSI sort key: timestamp, type string

Now you can query for items with a given timestamp range, as long as they're on the same day (they have the same YYYYMMDD, which might be a reasonable restriction). Or you could go to YYYYMM as the partition key allowing a wider timestamp range. At this point you really need to understand the use cases for queries to decide if YYYYMMDD (restricting queries to a single day) is right. See How to query DynamoDB by date with no obvious hash key for more on this idea.

  • The second query pattern I was referring to was querying by timestamp only, i.e., not having to specifying a tid. And since tid is a globally unique identifier, a composite key with tid as the partition key and some other key as the sort key is redundant, as it's equivalent to a simple primary key consisting of just the partition key tid. What I'm looking for is a structure supporting lookups by tid, as well as lookups of all items belonging to a certain date-time range (without knowing their tids in advance). – Kevin Hwang Apr 11 at 23:24
  • Apologies, misread that part of the question. The typical solution here afaik uses a GSI with a partition key that can be inferred/calculated from the timestamp and then a sort key which is the timestamp. The partition key could be something like the YYYYMMDD of the timestamp. Then you can query for items with a given timestamp range, as long as they're on the same day (have the same YYYYMMDD, which might be a reasonable restriction). Or you could go to YYYYMM as the partition key allowing a wider timestamp range, but for high i/o tables that's not ideal for sharding. Does that help? – jarmod Apr 11 at 23:59

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy

Not the answer you're looking for? Browse other questions tagged or ask your own question.