A look at DynamoDB Key’s.
Today, coming primarily from a MySQL background, I realized that choosing the right DynamoDB data structure requires some thinking. Here are some things I wish had been clearer before I started:
- In an Express.js Application, I am trying to access all the records from DynamoDB table.
- I am following the Node.js documentation for Query’s.
- Keep coming across a database error about keys. Unable to access any data.
For the sake of this situation, let us imagine that we are creating a table in our DynamoDB database which will hold Tweets from different users.
We should be able to:
- Loop through the Tweets of a specific user in chronological order.
- Get a specific Tweet from a user.
Image from the AWS blog.
What is the Primary Key? #
A unique identifier which identifies a single record in the database. This Primary Key is the combination of the Partition Key and Sort Key.
So, make sure you understand this: Primary Key = Partition Key + Sort Key
What is a Partition Key? #
Under the hood, DynamoDB spreads your data across provisioned areas so reads and writes are faster (this is known as having a wide cardinality). In our case, the Partition Key can be the Twitter users’ username. The Partition Key is required for every query. No query in the DynamoDB world is valid if you are not providing it a Partition Key.
What is a Sort Key? #
The Sort Key is an identifier which digs in to the specific record. For instance, we need a way to distinguish between different Tweets of a specific user. We can achieve this by letting our Sort Key be a timestamp of when the Tweet was added. The Sort Key is optional when it comes to doing queries.
What is the take away? #
Notice that we can produce a unique Primary Key by keeping the same Partition Key (username) but varying the Sort Key (timestamp). This means that the user can Tweet at two different times of the day, and the Partition Key will be the username, and the time of posting (timestamp) will be the Sort Key. Due to the changing nature of the timestamp, the Primary Key will be unique since the Primary Key consists of the combination of both the Partition Key (username) and Sort Key (timestamp).
The key take away for me was: think your data through. Ask yourself, how will the data be getting accessed? What access patterns will I have? How do I need to access specific records?
Choosing the Right DynamoDB Partition Key
Understand Access Patterns for Time Series Data