Amazon DynamoDB AWS NoSql database is getting lot of popularity these days for it’s capabilities. However before It should be used in production, proper analysis needs to be done. Specially if you have spent most of your time working with relational databases, it’s important to be more than 100% sure before moving towards a NoSQL database.
If you are beginner with AWS, I recommend reading below cloud articles:
- AWS Free Tier Usage Facts
- Amazon S3 WordPress Integeration
- Amazon CloudFront Integeration with WordPress
Amazon DynamoDB – A Cloud Database?
- Amazon DynamoDB is a fully managed NoSQL database service that promises performance in single digit(ms) for any amount of data.
- I needed to do benchmark analysis for our NoSQL use case. I thought we can bring up a small DynamoDB instance to benchmark its performance over conventional Mysql DB for our use case. However I was absolutely wrong.
- After signing to AWS console we realized there is no concept of physical/virtual instance in DynamoDB. It’s a database service that spreads the data and traffic for your tables over a sufficient number of servers to handle your throughput and storage requirements.
Amazon DynamoDB Key Features
- Amazon DynamoDB can be run locally in development environment. This is great for developers. Developers can do development, debug, write unit tests without spending any penny on the remote service.
- Amazon DynamoDB supports storing, querying, and updating documents. A row is equivalent to document.
- It’s schema-less. Amazon DynamoDB has flexible database schema. The data items in a table need not have same attributes or even the same number of attributes. Multiple data types (strings, numbers, binary data, and sets) add richness to the data model.
- Amazon DynamoDB gives you the flexibility to query on any attribute (column) using global and local secondary indexes. Secondary indexes are indexes that contain hash or hash-and-range keys that can be different from the keys in the table on which primary index is based.
- Amazon DynamoDB integrates with AWS Lambda to provide Triggers. Using Triggers, you will be able to automatically execute a custom Lambda function when item level changes in a DynamoDB table are detected.
- Amazon Redshift integrates with Amazon DynamoDB with advanced business intelligence capabilities and a powerful SQL-based interface. When you copy data from a DynamoDB table into Amazon Redshift, you can perform complex data analysis queries on that data, including joins with other tables in your Amazon Redshift cluster. You can learn more about Amazon Redshift from below.
- Amazon DynamoDB cloud database is integrated with Elasticsearch using the Amazon DynamoDB Logstash plugin. With this integration, you can easily search DynamoDB content such as messages, locations, tags, and keywords. It can be used for use cases like product search for e-commerce website.
- Amazon DynamoDB supports cross-region replication that automatically replicates DynamoDB tables across multiple AWS regions.
Limitations in Amazon DynamoDB
- You can’t query an item without a where clause having the primary key or using one of the secondary index. Scan can be used in this case. Scan is slow and not recommended as per Amazon DynamoDB docs.
- Secondary indexes by default do not allow selecting any columns which are not part of the index. To enable this, we need to either project these columns to the index (which duplicated them on disk with index) or have a second query after getting the primary key from the first query. However it’s not of a big concern since 25 GB disk size is free every month.
- You can define up to 5 local secondary indexes and 5 global secondary indexes per table. This could be a limitation in a complex business intensive table where various types of queries needs to run.
- As of now, new indexes can not be added after the table has been created. This means for modifying indexes you need to create new table which can be management headache. So chose your indexes wisely. This limitation only applies to Local Secondary Indexes (LSIs) and not to Global Secondary Indexes (GSIs).
- As of now, existing indexes can not be deleted / modified after the table has been created. Amazon DynamoDB import / export features will be useful if you have to do it. This limitation only applies to Local Secondary Indexes (LSIs) and not to Global Secondary Indexes (GSIs).
Data Creation for Benchmarking
I needed to benchmark Amazon DynamoDB queries with below use case:
- Create lists table with static attributes
- Primary index attributes are user_id & list_id. Projection => “All Attributes”
- Create document having 1+ million items. Use proper hash and range primary index.
- Add secondary index on list_id, status.
- There can be dynamic attributes also like V1, V2, V3. e.g. V1, V2 attributes are present for list_id 1. V1, V3 attributes are present for list_id 2.
Use Cases to benchmark
- Batch write in batch of 25 records.
- One by one write operation.
- Query list-id, status attributes and fetch all attributes. A list can have upto 0.1 million records.
- Query list-id, status attributes and fetch only user_id attribute with a list can have 0.1 million records.
- Scan query operation on list_id.
- I created 0.67 million items/rows in the table.
- Table was created in EU Ireland region.
- For creation of data we used Rails Faker and Fabricate gem to create random values.
- Read ThroughPut : Number of item reads per second × 4 KB item size.
- Write ThroughPut : Number of item writes per second × 1 KB item size.
You can anytime increase/decrease read/write throughput. If your application’s read or write requests exceed the provisioned throughput for a table, then those requests might be throttled. It’s important to keep monitor throughput from dashboard and modify till it matches the production requirement.
Initially I kept 1 read throughput and 1 write throughput. Script to create data was running very slow. After increasing write throughput to 100, script for creating benchmark data ended under 1 hour. This was great. We ran various queries as per benchmark use case.
|Use Case||Read Throughput||Write Throughput||Benchmark Results|
|Insert 1 by 1||100||6 ms|
|Batch Insert(25)||100||1.7 ms|
|Query primary index(fetch all attributes)||100||1 ms|
|Scan on non primary index||100||580 ms|
|Query Secondary Index(fetch all attributes)||100||44 ms|
Benchmark results were quite positive. We were convinced to use it for our use case. After finalizing Amazon DynamoDB, we researched more about it’s pricing, availability for production usage.
$0.25/day ($7.50/month) which is reasonable.
- The service runs across Amazon’s proven, high-availability data centers.
- The service replicates data across three facilities in an AWS Region to provide fault tolerance in the event of a server failure or availability zone outage.
- Amazon DynamoDB does the database management and administration, and you simply store and request your data.
- Automatic replication and failover provides built-in fault tolerance, high availability and data durability.
Database Backup using Snapshot & Streams
Below are ways by which you can take daily backup of dynamoDB table.
- Use AWS Console to manually trigger export process which would internally spawn and use AWS Data PipeLine and AWS EMR. You will be charged for this.
- Set up custom instances of Data PipeLine and EMR and write cron jobs to take snapshots using them.
- Using Scheduled Tasks in Lambda we can invoke node.js snippet which would dump the entire table content in a csv file to S3.
- Streams give us ability to capture changes to items stored in a DynamoDB table.
Amazon DynamoDB Redshift Integration for Data Backup
copy favoritemovies from '<span class="il">dynamodb</span>://ProductCatalog'
- Amazon DynamoDB attributes that do not match a column in the Amazon Redshift table are discarded. This means, every time we uploads a new attribute we would have to come back and alter Redshift table schema.
- Only Amazon DynamoDB attributes with scalar STRING and NUMBER data types are supported. The Amazon complex DynamoDB BINARY and SET data types are not supported.
Till now haven’t got any relevant answer. Will update you if I got any answer.
Security of DynamoDB
- Always use access key id and secret key for the api calls.
- Use IAM roles with proper read/write policy
In the end enjoy my answer on Amazon DynamoDB which got good attraction from redditers.
We always try to write high quality Amazon Web Services & Cloud Computing articles. You can read and learn more about AWS services by clicking below link:
Below are top viewed cloud articles:
Thank you for reading my article. If you face any problem or having any doubts, let me know in comments below. If you like my article please like our Facebook page and also follow us on Twitter so that you get regular updates. For regular updates you can also subscribe to hackpundit.com with your email.