Tuesday, September 27, 2016

AWS CDA Study List - DynamoDB

DynamoDB 

Basics

  • NoSQL database 
  • Key-value store
  • Managed by Amazon and data is automatically replicated on three (3) availability zones within selected region
  • Supports eventually and strongly consistent data models
    • Eventually means that data is stored at least in one zone
    • Strongly consistent means that data is stored on all availability zones
    • This means that eventually consistent read can return data that is not the latest
  • Has additional features
    • Streams, which is like having transactional logs
    • Triggers, as the name say, work like traditional RDBMS triggers but used AWS Lamdba for actions
  • Not ACID in RDBMS terms (e.g no Oracle SCN) even though has strongly consistent data model

Data model

  • Schema-less
  • Table is collection of items and item consists from one or several attributes
  • Table must have primary key
  • Supports two types of primary keys (1. hash key) and (2. hash key + sort key)
  • Data is spread using hash key/attribute (primary key) and meaning so that unique primary key will spread the data evenly on partitions
  • Data is stored in partitions

Limits

  • Item size in table must be 1 byte to 400KB (Item key and attributes)
  • Maximum, default amount of tables is 256

Following topics are exam questions collected through Internet and should be evaluated as so. Answers are mine and have been checked with answers collected through the internet, but might still be wrong.


Basic operations

Query


  • Basic query based on primary key, which return data matching the primary key. 
  • By default results are sorted by sort key in ascending order. This can be changed by setting ScanIndexForward parameter to false
  • By default reads are eventually consistent, but can be changed to strongly consistent
  • More efficient than Scan
  • Requires primary key meaning that you must provide partition key attribute name and distinct value to search for


In RDMBS same would be done by:
select * from music 
where artist = 'No One You Know" ;

In DynamoDB this done in following way:
var params = { TableName: "Music", KeyConditionExpression: "Artist = :artist", <--- PRIMARY KEY ExpressionAttributeValues: { ":artist": "No One You Know" } };


Scan


  • Basic query which returns all data on table.
  • Used to dump to whole table
  • Results can be minimized by setting smaller page size
  • Performance is slower than query, due to DynamoDB dumping the whole table 


In RDMBS same would be done by:
select * from music ; 

In DynamoDB this done in following way:
var params = { TableName: "Music"};


ProjectExpression 


  • Can be used to filter results on Query or on Scan


In RDMBS same would be done by:
select SongTitle from music 
where artist = 'No One You Know" ;

In DynamoDB Query this done in following way:
var params = { TableName: "Music", ProjectionExpression: "SongTitle", KeyConditionExpression: "Artist = :artist",  
ExpressionAttributeValues: { ":artist": "No One You Know" }};

In RDMBS same would be done by:
select SongTitle from music ;

In DynamoDB Scan this done in following way:
var params = { TableName: "Music"
ProjectionExpression: "SongTitle"};


DynamoDB provisioned throughput

Which approach below provides the least impact to provisioned throughput on the “Product” table?

A. Create an “Images” DynamoDB table to store the Image with a foreign key constraint to the “Product” table
B. Add an image data type to the “Product” table to store the images in binary format
C. Serialize the image and store it in multiple DynamoDB tables
D. Store the images in Amazon S3 and add an S3 URL pointer to the “Product” table item for each image

Why?
Every 4 KB of data scanned consumes 0.5 read capacity units in eventually consistent model, so having images (which likely are larger than 4KB) it would be wiser to have a pointer to S3
https://www.reddit.com/r/aws/comments/4h8zga/dynamodb_vs_s3_when_using_images/


DynamoDB limitations 

Which DynamoDB limits can be raised by contacting AWS support? Choose 2 answers
A.The number of hash keys per account
B. The maximum storage used per account
C. The number of tables per account
D. The number of local secondary indexes per account
E. The number of provisioned throughput units per account
Why?
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Limits.html

Tables Per Account
Provisioned Throughput Minimums and Maximums
For any AWS account, there is an initial limit of 256 tables per region.
You can request an increase on this limit. For more information, go tohttp://aws.amazon.com/support.

For any table or global secondary index, the minimum settings for provisioned throughput are 1 read capacity unit and 1 write capacity unit.
An AWS account places some initial maximum limits on the throughput you can provision:

US East (N. Virginia) Region:

  • Per table – 40,000 read capacity units and 40,000 write capacity units
  • Per account – 80,000 read capacity units and 80,000 write capacity units

All Other Regions:

  • Per table – 10,000 read capacity units and 10,000 write capacity units
  • Per account – 20,000 read capacity units and 20,000 write capacity units

The provisioned throughput limit includes the sum of the capacity of the table together with the capacity of all of its global secondary indexes.
You can request an increase on any of these limits. For more information, see http://aws.amazon.com/support.


DynamoDB and IAM
Which of the following items are required to allow an application deployed on an EC2 instance to write data to a DynamoDB table? Assume that no security keys are allowed to be stored on the EC2 instance. Choose 2 answers

A. Create an IAM User that allows write access to the DynamoDB table.
B. Launch an EC2 Instance with the IAM User included in the launch configuration.
C. Create an IAM Role that allows write access to the DynamoDB table.
D. Launch an EC2 Instance with the IAM Role included in the launch configuration.
E. Add an IAM Role to a running EC2 instance.
F. Add an IAM User to a running EC2 Instance.

Why?

IAM role can't be added or changed after EC2 Instance has been provisioned and role is more flexible than single user.


DynamoDB provisioned throughput efficiency 

Which of the following is an example of a good DynamoDB hash key schema for provisioned throughput efficiency?
A. User ID, where the application has many different users.
B. Status Code where most status codes are the same
C. Device ID, where one is by far more popular than all the others.
D. Game Type, where there are three possible game types
Why?
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GuidelinesForTables.html#GuidelinesForTables.UniformWorkload
If a single table has only a very small number of partition key values, consider distributing your write operations across more distinct partition key values. In other words, structure the primary key elements to avoid one "hot" (heavily requested) partition key value that slows overall performance. 
For example, consider a table with a composite primary key. The partition key represents the item's creation date, rounded to the nearest day. The sort key is an item identifier. On a given day, say 2014-07-09, all of the new items will be written to that same partition key value. 


DynamoDB optimistic concurrency control

Which statements about DynamoDB are true? Choose 2 answers

A. DynamoDB uses optimistic concurrency control
B. DynamoDB restricts item access during writes
C. DynamoDB uses a pessimistic locking model
D. DynamoDB restricts item access during reads
E. DynamoDB uses conditional writes for consistency

Why?

http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Expressions.Modifying.html#Expressions.Modifying.ConditionalWrites and https://aws.amazon.com/dynamodb/faqs/

Q: Does Amazon DynamoDB support conditional operations?

Yes, you can specify a condition that must be satisfied for a put, update, or delete operation to be completed on an item . To perform a conditional operation, you can define a ConditionExpression that is constructed from the following:
Boolean functions: ATTRIBUTE_EXIST, CONTAINS, and BEGINS_WITH
Comparison operators: =, <>, , =, BETWEEN, and IN
Logical operators: NOT, AND, and OR.
You can construct a free-form conditional expression that combines multiple conditional clauses, including nested clauses. Conditional operations allow users to implement optimistic concurrency control systems on DynamoDB. For more information on conditional operations, please see our documentation.

DynamoDB error messages

You are writing to a DynamoDB table and receive the following exception: ”ProvisionedThroughputExceededException” though according to your Cloudwatch metrics for the table, you are not exceeding your provisioned throughput. What could be an explanation for this?

A. You haven’t provisioned enough DynamoDB storage instances
B. You’re exceeding your capacity on a particular Range Key
C. You’re exceeding your capacity on a particular Hash Key
D. You’re exceeding your capacity on a particular Sort Key
E. You haven’t configured DynamoDB Auto Scaling triggers

Why?

http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Programming.Errors.html

ProvisionedThroughputExceededException
Message: You exceeded your maximum allowed provisioned throughput for a table or for one or more global secondary indexes. To view performance metrics for provisioned throughput vs. consumed throughput, open the Amazon CloudWatch console.

Example: Your request rate is too high. The AWS SDKs for DynamoDB automatically retry requests that receive this exception. Your request is eventually successful, unless your retry queue is too large to finish. Reduce the frequency of requests, using exponential backoff. OK to retry? Yes

DynamoDB write capacity unit calculations

A meteorological system monitors 600 temperature gauges, obtaining temperature samples every minute and saving each sample to a DynamoDB table. Each sample involves writing 1K of data and the writes are evenly distributed over time. How much write throughput is required for the target table?

A. 3600 write capacity units
B. 1 write capacity unit
C. 10 write capacity units
D. 60 write capacity units
E. 600 write capacity units

Why?

Consider that DynamoDB write capacity unit can handle 1KB, which means that 100 x 1KB items would be calculated as follows:

1 write capacity unit per item × 100 writes per second = 100 write capacity units

100 x 1,5KB items would be calculated as follows:

1.5 KB / 1 KB = 1.5 --> 2 write capacity units per item x 100 writes per second  = 200 write capacity units

http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.ProvisionedThroughput.html

So in the following example 600 gauges are obtaining 1K samples every minute (60th second) and saving that into DynamoDB. This means that this would be calculated as follows:

1 write capacity unit per item x (600/60) writes per second = 10 write capacity units

DynamoDB strongly consistent vs. eventually consistent read throughput consumption

How is provisioned throughput affected by the chosen consistency model when reading data from a DynamoDB table?

A. Strongly consistent reads use the same amount of throughput as eventually consistent reads
B. Strongly consistent reads use variable throughput depending on read activity
C. Strongly consistent reads use more throughput than eventually consistent reads.
D. Strongly consistent reads use less throughput than eventually consistent reads

Why?

http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.ProvisionedThroughput.html

"For example, suppose that you want to read 80 items per second from a table. Suppose that the items are 3 KB in size and you want strongly consistent reads. For this scenario, each read would require one provisioned read capacity unit. To determine this, you divide the item size of the operation by 4 KB, and then round up to the nearest whole number, as shown following:

3 KB / 4 KB = 0.75 --> 1
For this scenario, you need to set the table's provisioned read throughput to 80 read capacity units:

1 read capacity unit per item × 80 reads per second = 80 read capacity units
If you wanted eventually consistent reads instead of strongly consistent reads, you would only need to provision 40 read capacity units."


DynamoDB and web identity federation

Games-R-Us is launching a new game app for mobile devices. Users will log into the game using their existing Facebook account and the game will record player data and scoring information directly to a DynamoDB table. What is the most secure approach for signing requests to the DynamoDB API?

A. Create an IAM user with access credentials that are distributed with the mobile app to sign the requests
B. Distribute the AWS root account access credentials with the mobile app to sign the requests
C. Request temporary security credentials using web identity federation to sign the requests
D. Establish cross account access between the mobile app and the DynamoDB table to sign the requests

Why?
http://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_providers_oidc.html

Imagine that you are creating a mobile app that accesses AWS resources, such as a game that runs on a mobile device and stores player and score information using Amazon S3 and DynamoDB.

When you write such an app, you'll make requests to AWS services that must be signed with an AWS access key. However, we strongly recommend that you do not embed or distribute long-term AWS credentials with apps that a user downloads to a device, even in an encrypted store. Instead, build your app so that it requests temporary AWS security credentials dynamically when needed using web identity federation. 


DynamoDB table deletion provisioned throughput consumption

You are inserting 1000 new items every second in a DynamoDB table. Once an hour these items are analyzed and then are no longer needed. You need to minimize provisioned throughput, storage, and API calls. Given these requirements, what is the most efficient way to manage these Items after the analysis?

A. Retain the items in a single table
B. Delete items individually over a 24 hour period
C. Delete the table and create a new table per hour
D. Create a new table per hour

Why?

http://stackoverflow.com/questions/9386456/dynamodb-does-delete-count-against-read-or-write-capacity

http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/WorkingWithTables.html#ProvisionedThroughput

"When you issue a DeleteItem request, DynamoDB uses the size of the deleted item to calculate provisioned throughput consumption."
A. would consume storage as the items would be retained.
B. would mean several API calls
D. wouldn't solve the issue of deleting old, no longer needed items, issue

DynamoDB API calls

What item operation allows the retrieval of multiple items from a DynamoDB table in a single API call?

A. GetItem
B. BatchGetItem
C. GetMultipleItems
D. GetItemRange

Why?
https://aws.amazon.com/dynamodb/faqs/

GetItem – The GetItem operation returns a set of Attributes for an item that matches the primary key. The GetItem operation provides an eventually consistent read by default. If eventually consistent reads are not acceptable for your application, use ConsistentRead.

BatchGetItem – The BatchGetItem operation returns the attributes for multiple items from multiple tables using their primary keys. A single response has a size limit of 16 MB and returns a maximum of 100 items. Supports both strong and eventual consistency.
"For BatchGetItem, each item in the batch is read separately, so DynamoDB first rounds up the size of each item to the next 4 KB and then calculates the total size. The result is not necessarily the same as the total size of all the items. For example, if BatchGetItem reads a 1.5 KB item and a 6.5 KB item, DynamoDB will calculate the size as 12 KB (4 KB + 8 KB), not 8 KB (1.5 KB + 6.5 KB)."

Query –  Gets one or more items using the table primary key, or from a secondary index using the index key. You can narrow the scope of the query on a table by using comparison operators or expressions. You can also filter the query results using filters on non-key attributes. Supports both strong and eventual consistency. A single response has a size limit of 1 MB.
"For Query, all items returned are treated as a single read operation. As a result, DynamoDB computes the total size of all items and then rounds up to the next 4 KB boundary. For example, suppose your query returns 10 items whose combined size is 40.8 KB. DynamoDB rounds the item size for the operation to 44 KB. If a query returns 1500 items of 64 bytes each, the cumulative size is 96 KB."

Scan – Gets all items and attributes by performing a full scan across the table or a secondary index. You can limit the return set by specifying filters against one or more attributes.

DynamoDB streams = Transaction logs

DynamoDB read limiting

When using a large Scan operation in DynamoDB, what technique can be used to minimize the impact of a scan on a table’s provisioned throughput?

A. Set a smaller page size for the scan
B. Prewarm the table by updating all items
C. Use parallel scans
D. Define a range index on the table

Why?

http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/QueryAndScanGuidelines.html#QueryAndScanGuidelines.BurstsOfActivity

Reduce Page Size
Because a Scan operation reads an entire page (by default, 1 MB), you can reduce the impact of the scan operation by setting a smaller page size. The Scan operation provides a Limit parameter that you can use to set the page size for your request. Each Scan or Query request that has a smaller page size uses fewer read operations and creates a "pause" between each request. For example, if each item is 4 KB and you set the page size to 40 items, then a Query request would consume only 40 strongly consistent read operations or 20 eventually consistent read operations. A larger number of smaller Scan or Query operations would allow your other critical requests to succeed without throttling.




DynamoDB Hash and Range 

An application stores payroll information nightly in DynamoDB for a large number of employees across hundreds of offices. Item attributes consist of individual name, office identifier, and cumulative daily hours. Managers run reports for ranges of names working in their office. One query is. “Return all Items in this office for names starting with A through E”. Which table configuration will result in the lowest impact on provisioned throughput for this query?

A. Configure the table to have a range index on the name attribute, and a hash index on the office identifier
B. Configure a hash index on the name attribute and no range index
C. Configure the table to have a hash index on the name attribute, and a range index on the office identifier
D. Configure a hash index on the office Identifier attribute and no range index

Why?

http://stackoverflow.com/questions/27329461/what-is-hash-and-range-primary-key

This means that every row's primary key is the combination of the hash and range key. You can make direct gets on single rows if you have both the hash and range key, or you can make a query against the sorted range index. For example, get Get me all rows from the table with Hash key X that have range keys greater than Y, or other queries to that affect. They have better performance and less capacity usage compared to Scans and Queries against fields that are not indexed. 

And in this question, it's the names which we want to limit/range (not the offices as on C.)


DynamoDB and HTTP error codes

In DynamoDB, what type of HTTP response codes indicate that a problem was found with
the client request sent to the service?

A. 5xx HTTP response code
B. 200 HTTP response code
C. 306 HTTP response code
D. 4xx HTTP response code

Why?

https://en.wikipedia.org/wiki/List_of_HTTP_status_codes
  • 1xx Informational
  • 2xx Success
  • 3xx Redirection
  • 4xx Client error
  • 5xx Server error

No comments:

Post a Comment