Thursday, September 29, 2016

AWS CDA Study List - SQS

SQS

Basics

  • Messaging system for decoupling
  • Does not support FIFO (first in, first out)
  • Message are sent one or more times to ensure that messages are sent
  • Supports short and long polling
    • In short polling SQS Service gets subset of your messages or nothing
    • In long polling SQS Service waits until at least on message is available in queue
    • In simple terms short polling goes to check whether message exists and comes back either with data or not. Long polling waits as long as there is some data.
  • VisibilityTimeout -parameter controls time time how long message is not visible to twice or from queue
  • ChangeMessageVisibility -parameter can be used to prolong VisibilityTimeout period. Time is added to the to time how long message has been hidden
  • ReceiveMessageWaitTimeSeconds -parameter controls whether long polling is on or not 
  • DelaySeconds -parameter can be used to hid message from all clients from queue. This happens before VisibilityTimeout -parameter kicks in
  • GetQueueAttributes API -action with "ApproximateNumberOfMessages" returns the number of messages waiting in the queue
  • GetQueueAttributes API -action with "ApproximateNumberOfMessagesNotVisible" returns the number of messages in flight
  • Dead Letter Queue is used for messages that can't be processed and need further investigation
  • MessageRetentionPeriod -parameter determines how long SQS holds the message in the queue

Limits

  • Minimum message size is 1KB
  • Maximum message size is 256KB
  • At max 120,000 messages can be inflight 
  • Message can only contain XML, JSON and unformatted text
  • Message can contain 10 metadata attributes
  • Queue name can be up to 80 characters long
  • Queue name is case-sensitive

Defaults

  • VisibilityTimeout minimum is 0 seconds
  • VisibilityTimeout default is 30 seconds
  • VisibilityTimeout maximum time is 12 hours
  • ReceiveMessageWaitTimeSeconds is 0
  • MessageRetention minimum retention period is 1 minute
  • MessageRetention default retention period is 4 days
  • MessageRetention maximum retention period is 14 days
  • Long Polling maximum timeout value is 20 seconds

Following topics are exam questions collected through Internet and should be evaluated as so. Answers are mine and have been checked with answers collected through the internet, but might still be wrong.


SQS message lifecycle

When a Simple Queue Service message triggers a task that takes 5 minutes to complete, which process below will result in successful processing of the message and remove it from the queue while minimizing the chances of duplicate processing?

A. Retrieve the message with an increased Visibility timeout, delete the message from the queue, process the message
B. Retrieve the message with increased DelaySeconds, process the message, delete the message
C. Retrieve the message with an increased Visibility timeout, process the message, delete the message from the queue
D. Retrieve the message with increased DelaySeconds, delete the message from the


Why?
Increased visibility timeout will reduce the possibility of duplicate processing. Delete message should always happen after proper processing of the message.
http://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/AboutVT.html





SQS message FIFO or not?


Which of the following statements about SQS is true?

A. Messages will be delivered one or more times and messages will be delivered in First in, First out order
B. Messages will be delivered exactly once and message delivery order is indeterminate
C. Messages will be delivered exactly once and messages will be delivered in First in, First out order
D. Messages will be delivered one or more times and message delivery order is indeterminate

Why?

https://aws.amazon.com/sqs/faqs/
Q: Does Amazon SQS provide first-in-first-out (FIFO) access to messages?

Amazon SQS provides a loose-FIFO capability that attempts to preserve the order of messages. However, we have designed Amazon SQS to be massively scalable using a distributed architecture. Thus, we can't guarantee that you will always receive messages in the exact order you sent them (FIFO).

If your system requires the order of messages to be preserved, place sequencing information in each message so that messages can be ordered when they are received.

Q: Does Amazon SQS provide at-least-once delivery of messages?

Yes. Amazon SQS guarantees that each message is delivered at least once. Amazon SQS stores copies of your messages on multiple servers for redundancy and high availability. On rare occasions, one of the servers that stores a copy of a message might be unavailable when you receive or delete the message.

If this occurs, the copy of the message will not be deleted on that unavailable server, and you might get a copy of that message again when you receive messages (at-least-once delivery).

You must design your applications to be idempotent (that is, they must not be affected adversely when processing the same message more than once).


SQS Visibility timeout default

If a message is retrieved from a queue in Amazon SQS, how long is the message inaccessible to other users by default?

A. 30 seconds
B. 0 seconds
C. 1 hour
D. 1 day
E. Forever

Why? 

http://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/AboutVT.html

Each queue starts with a default setting of 30 seconds for the visibility timeout. You can change that setting for the entire queue. Typically, you'll set the visibility timeout to the average time it takes to process and delete a message from the queue. When receiving messages, you can also set a special visibility timeout for the returned messages without changing the overall queue timeout.

Use "ChangeMessageVisibility" action to change it on the fly.


SQS Long polling

Company B provides an online image recognition service and utilizes SOS to decouple system components for scalability The SQS consumers poll the imaging queue as often as possible to keep end-to-end throughput as high as possible. However, Company B is realizing that polling in tight loops is burning CPU cycles and increasing costs with empty responses. How can Company B reduce the number of empty responses?

A. Set the imaging queue Visibility Timeout attribute to 20 seconds
B. Set the DelaySeconds parameter of a message to 20 seconds
C. Set the Imaging queue ReceiveMessageWaitTimeSeconds attribute to 20 seconds
D. Set the imaging queue MessageRetentionPeriod attribute to 20 seconds

Why? 

http://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-long-polling.html

Enable long polling why changing ReceiveMessageWaitTimeSeconds > 0. 

One benefit of long polling with Amazon SQS is the reduction of the number of empty responses, when there are no messages available to return, in reply to a ReceiveMessage request sent to an Amazon SQS queue. Long polling allows the Amazon SQS service to wait until a message is available in the queue before sending a response. So unless the connection times out, the response to the ReceiveMessage request will contain at least one of the available messages (if any) and up to the maximum number requested in the ReceiveMessage call.

Reducing the number of empty responses and false empty responses also helps reduce your cost of using Amazon SQS.

There are three different API action calls you can use to enable long polling in Amazon SQS, ReceiveMessage, CreateQueue, and SetQueueAttributes. For ReceiveMessage, you configure the WaitTimeSeconds parameter, and for CreateQueue and SetQueueAttributes, you configure the ReceiveMessageWaitTimeSeconds attribute.


Your application provides data transformation services. Files containing data to be transformed are first uploaded to Amazon S3 and then transformed by a fleet of spot EC2 instances. Files submitted by your premium customers must be transformed with the highest priority. How should you implement such a system?

A. Use a DynamoDB table with an attribute defining the priority level. Transformation instances will scan the table for tasks, sorting the results by priority level.
B. Use Route 53 latency based-routing to send high priority tasks to the closest transformation instances.
C. Use two SQS queues, one for high priority messages, the other for default priority. Transformation instances first poll the high priority queue; if there is no message, they poll the default priority queue.
D. Use a single SQS queue. Each message contains the priority level. Transformation instances poll high-priority messages first.

Why? SQS is perfect fit for this and similar scenario of using two SQS queues has been question on CDA-exam.


A company has a workflow that sends video files from their on-premise system to AWS for transcoding. They use EC2 worker instances that pull transcoding jobs from SQS. Why is SQS an appropriate service for this scenario?

A. SQS guarantees the order of the messages.
B. SQS synchronously provides transcoding output.
C. SQS checks the health of the worker instances.
D. SQS helps to facilitate horizontal scaling of encoding tasks.

Why? You can rule out A, B and C. A and B are not features of SQS and as SQS is pull system, it doesn't check health of worker instances.



Tuesday, September 27, 2016

AWS CDA Study List - DynamoDB

DynamoDB 

Basics

  • NoSQL database 
  • Key-value store
  • Managed by Amazon and data is automatically replicated on three (3) availability zones within selected region
  • Supports eventually and strongly consistent data models
    • Eventually means that data is stored at least in one zone
    • Strongly consistent means that data is stored on all availability zones
    • This means that eventually consistent read can return data that is not the latest
  • Has additional features
    • Streams, which is like having transactional logs
    • Triggers, as the name say, work like traditional RDBMS triggers but used AWS Lamdba for actions
  • Not ACID in RDBMS terms (e.g no Oracle SCN) even though has strongly consistent data model

Data model

  • Schema-less
  • Table is collection of items and item consists from one or several attributes
  • Table must have primary key
  • Supports two types of primary keys (1. hash key) and (2. hash key + sort key)
  • Data is spread using hash key/attribute (primary key) and meaning so that unique primary key will spread the data evenly on partitions
  • Data is stored in partitions

Limits

  • Item size in table must be 1 byte to 400KB (Item key and attributes)
  • Maximum, default amount of tables is 256

Following topics are exam questions collected through Internet and should be evaluated as so. Answers are mine and have been checked with answers collected through the internet, but might still be wrong.


Basic operations

Query


  • Basic query based on primary key, which return data matching the primary key. 
  • By default results are sorted by sort key in ascending order. This can be changed by setting ScanIndexForward parameter to false
  • By default reads are eventually consistent, but can be changed to strongly consistent
  • More efficient than Scan
  • Requires primary key meaning that you must provide partition key attribute name and distinct value to search for


In RDMBS same would be done by:
select * from music 
where artist = 'No One You Know" ;

In DynamoDB this done in following way:
var params = { TableName: "Music", KeyConditionExpression: "Artist = :artist", <--- PRIMARY KEY ExpressionAttributeValues: { ":artist": "No One You Know" } };


Scan


  • Basic query which returns all data on table.
  • Used to dump to whole table
  • Results can be minimized by setting smaller page size
  • Performance is slower than query, due to DynamoDB dumping the whole table 


In RDMBS same would be done by:
select * from music ; 

In DynamoDB this done in following way:
var params = { TableName: "Music"};


ProjectExpression 


  • Can be used to filter results on Query or on Scan


In RDMBS same would be done by:
select SongTitle from music 
where artist = 'No One You Know" ;

In DynamoDB Query this done in following way:
var params = { TableName: "Music", ProjectionExpression: "SongTitle", KeyConditionExpression: "Artist = :artist",  
ExpressionAttributeValues: { ":artist": "No One You Know" }};

In RDMBS same would be done by:
select SongTitle from music ;

In DynamoDB Scan this done in following way:
var params = { TableName: "Music"
ProjectionExpression: "SongTitle"};


DynamoDB provisioned throughput

Which approach below provides the least impact to provisioned throughput on the “Product” table?

A. Create an “Images” DynamoDB table to store the Image with a foreign key constraint to the “Product” table
B. Add an image data type to the “Product” table to store the images in binary format
C. Serialize the image and store it in multiple DynamoDB tables
D. Store the images in Amazon S3 and add an S3 URL pointer to the “Product” table item for each image

Why?
Every 4 KB of data scanned consumes 0.5 read capacity units in eventually consistent model, so having images (which likely are larger than 4KB) it would be wiser to have a pointer to S3
https://www.reddit.com/r/aws/comments/4h8zga/dynamodb_vs_s3_when_using_images/


DynamoDB limitations 

Which DynamoDB limits can be raised by contacting AWS support? Choose 2 answers
A.The number of hash keys per account
B. The maximum storage used per account
C. The number of tables per account
D. The number of local secondary indexes per account
E. The number of provisioned throughput units per account
Why?
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Limits.html

Tables Per Account
Provisioned Throughput Minimums and Maximums
For any AWS account, there is an initial limit of 256 tables per region.
You can request an increase on this limit. For more information, go tohttp://aws.amazon.com/support.

For any table or global secondary index, the minimum settings for provisioned throughput are 1 read capacity unit and 1 write capacity unit.
An AWS account places some initial maximum limits on the throughput you can provision:

US East (N. Virginia) Region:

  • Per table – 40,000 read capacity units and 40,000 write capacity units
  • Per account – 80,000 read capacity units and 80,000 write capacity units

All Other Regions:

  • Per table – 10,000 read capacity units and 10,000 write capacity units
  • Per account – 20,000 read capacity units and 20,000 write capacity units

The provisioned throughput limit includes the sum of the capacity of the table together with the capacity of all of its global secondary indexes.
You can request an increase on any of these limits. For more information, see http://aws.amazon.com/support.


DynamoDB and IAM
Which of the following items are required to allow an application deployed on an EC2 instance to write data to a DynamoDB table? Assume that no security keys are allowed to be stored on the EC2 instance. Choose 2 answers

A. Create an IAM User that allows write access to the DynamoDB table.
B. Launch an EC2 Instance with the IAM User included in the launch configuration.
C. Create an IAM Role that allows write access to the DynamoDB table.
D. Launch an EC2 Instance with the IAM Role included in the launch configuration.
E. Add an IAM Role to a running EC2 instance.
F. Add an IAM User to a running EC2 Instance.

Why?

IAM role can't be added or changed after EC2 Instance has been provisioned and role is more flexible than single user.


DynamoDB provisioned throughput efficiency 

Which of the following is an example of a good DynamoDB hash key schema for provisioned throughput efficiency?
A. User ID, where the application has many different users.
B. Status Code where most status codes are the same
C. Device ID, where one is by far more popular than all the others.
D. Game Type, where there are three possible game types
Why?
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GuidelinesForTables.html#GuidelinesForTables.UniformWorkload
If a single table has only a very small number of partition key values, consider distributing your write operations across more distinct partition key values. In other words, structure the primary key elements to avoid one "hot" (heavily requested) partition key value that slows overall performance. 
For example, consider a table with a composite primary key. The partition key represents the item's creation date, rounded to the nearest day. The sort key is an item identifier. On a given day, say 2014-07-09, all of the new items will be written to that same partition key value. 


DynamoDB optimistic concurrency control

Which statements about DynamoDB are true? Choose 2 answers

A. DynamoDB uses optimistic concurrency control
B. DynamoDB restricts item access during writes
C. DynamoDB uses a pessimistic locking model
D. DynamoDB restricts item access during reads
E. DynamoDB uses conditional writes for consistency

Why?

http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Expressions.Modifying.html#Expressions.Modifying.ConditionalWrites and https://aws.amazon.com/dynamodb/faqs/

Q: Does Amazon DynamoDB support conditional operations?

Yes, you can specify a condition that must be satisfied for a put, update, or delete operation to be completed on an item . To perform a conditional operation, you can define a ConditionExpression that is constructed from the following:
Boolean functions: ATTRIBUTE_EXIST, CONTAINS, and BEGINS_WITH
Comparison operators: =, <>, , =, BETWEEN, and IN
Logical operators: NOT, AND, and OR.
You can construct a free-form conditional expression that combines multiple conditional clauses, including nested clauses. Conditional operations allow users to implement optimistic concurrency control systems on DynamoDB. For more information on conditional operations, please see our documentation.

DynamoDB error messages

You are writing to a DynamoDB table and receive the following exception: ”ProvisionedThroughputExceededException” though according to your Cloudwatch metrics for the table, you are not exceeding your provisioned throughput. What could be an explanation for this?

A. You haven’t provisioned enough DynamoDB storage instances
B. You’re exceeding your capacity on a particular Range Key
C. You’re exceeding your capacity on a particular Hash Key
D. You’re exceeding your capacity on a particular Sort Key
E. You haven’t configured DynamoDB Auto Scaling triggers

Why?

http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Programming.Errors.html

ProvisionedThroughputExceededException
Message: You exceeded your maximum allowed provisioned throughput for a table or for one or more global secondary indexes. To view performance metrics for provisioned throughput vs. consumed throughput, open the Amazon CloudWatch console.

Example: Your request rate is too high. The AWS SDKs for DynamoDB automatically retry requests that receive this exception. Your request is eventually successful, unless your retry queue is too large to finish. Reduce the frequency of requests, using exponential backoff. OK to retry? Yes

DynamoDB write capacity unit calculations

A meteorological system monitors 600 temperature gauges, obtaining temperature samples every minute and saving each sample to a DynamoDB table. Each sample involves writing 1K of data and the writes are evenly distributed over time. How much write throughput is required for the target table?

A. 3600 write capacity units
B. 1 write capacity unit
C. 10 write capacity units
D. 60 write capacity units
E. 600 write capacity units

Why?

Consider that DynamoDB write capacity unit can handle 1KB, which means that 100 x 1KB items would be calculated as follows:

1 write capacity unit per item × 100 writes per second = 100 write capacity units

100 x 1,5KB items would be calculated as follows:

1.5 KB / 1 KB = 1.5 --> 2 write capacity units per item x 100 writes per second  = 200 write capacity units

http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.ProvisionedThroughput.html

So in the following example 600 gauges are obtaining 1K samples every minute (60th second) and saving that into DynamoDB. This means that this would be calculated as follows:

1 write capacity unit per item x (600/60) writes per second = 10 write capacity units

DynamoDB strongly consistent vs. eventually consistent read throughput consumption

How is provisioned throughput affected by the chosen consistency model when reading data from a DynamoDB table?

A. Strongly consistent reads use the same amount of throughput as eventually consistent reads
B. Strongly consistent reads use variable throughput depending on read activity
C. Strongly consistent reads use more throughput than eventually consistent reads.
D. Strongly consistent reads use less throughput than eventually consistent reads

Why?

http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.ProvisionedThroughput.html

"For example, suppose that you want to read 80 items per second from a table. Suppose that the items are 3 KB in size and you want strongly consistent reads. For this scenario, each read would require one provisioned read capacity unit. To determine this, you divide the item size of the operation by 4 KB, and then round up to the nearest whole number, as shown following:

3 KB / 4 KB = 0.75 --> 1
For this scenario, you need to set the table's provisioned read throughput to 80 read capacity units:

1 read capacity unit per item × 80 reads per second = 80 read capacity units
If you wanted eventually consistent reads instead of strongly consistent reads, you would only need to provision 40 read capacity units."


DynamoDB and web identity federation

Games-R-Us is launching a new game app for mobile devices. Users will log into the game using their existing Facebook account and the game will record player data and scoring information directly to a DynamoDB table. What is the most secure approach for signing requests to the DynamoDB API?

A. Create an IAM user with access credentials that are distributed with the mobile app to sign the requests
B. Distribute the AWS root account access credentials with the mobile app to sign the requests
C. Request temporary security credentials using web identity federation to sign the requests
D. Establish cross account access between the mobile app and the DynamoDB table to sign the requests

Why?
http://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_providers_oidc.html

Imagine that you are creating a mobile app that accesses AWS resources, such as a game that runs on a mobile device and stores player and score information using Amazon S3 and DynamoDB.

When you write such an app, you'll make requests to AWS services that must be signed with an AWS access key. However, we strongly recommend that you do not embed or distribute long-term AWS credentials with apps that a user downloads to a device, even in an encrypted store. Instead, build your app so that it requests temporary AWS security credentials dynamically when needed using web identity federation. 


DynamoDB table deletion provisioned throughput consumption

You are inserting 1000 new items every second in a DynamoDB table. Once an hour these items are analyzed and then are no longer needed. You need to minimize provisioned throughput, storage, and API calls. Given these requirements, what is the most efficient way to manage these Items after the analysis?

A. Retain the items in a single table
B. Delete items individually over a 24 hour period
C. Delete the table and create a new table per hour
D. Create a new table per hour

Why?

http://stackoverflow.com/questions/9386456/dynamodb-does-delete-count-against-read-or-write-capacity

http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/WorkingWithTables.html#ProvisionedThroughput

"When you issue a DeleteItem request, DynamoDB uses the size of the deleted item to calculate provisioned throughput consumption."
A. would consume storage as the items would be retained.
B. would mean several API calls
D. wouldn't solve the issue of deleting old, no longer needed items, issue

DynamoDB API calls

What item operation allows the retrieval of multiple items from a DynamoDB table in a single API call?

A. GetItem
B. BatchGetItem
C. GetMultipleItems
D. GetItemRange

Why?
https://aws.amazon.com/dynamodb/faqs/

GetItem – The GetItem operation returns a set of Attributes for an item that matches the primary key. The GetItem operation provides an eventually consistent read by default. If eventually consistent reads are not acceptable for your application, use ConsistentRead.

BatchGetItem – The BatchGetItem operation returns the attributes for multiple items from multiple tables using their primary keys. A single response has a size limit of 16 MB and returns a maximum of 100 items. Supports both strong and eventual consistency.
"For BatchGetItem, each item in the batch is read separately, so DynamoDB first rounds up the size of each item to the next 4 KB and then calculates the total size. The result is not necessarily the same as the total size of all the items. For example, if BatchGetItem reads a 1.5 KB item and a 6.5 KB item, DynamoDB will calculate the size as 12 KB (4 KB + 8 KB), not 8 KB (1.5 KB + 6.5 KB)."

Query –  Gets one or more items using the table primary key, or from a secondary index using the index key. You can narrow the scope of the query on a table by using comparison operators or expressions. You can also filter the query results using filters on non-key attributes. Supports both strong and eventual consistency. A single response has a size limit of 1 MB.
"For Query, all items returned are treated as a single read operation. As a result, DynamoDB computes the total size of all items and then rounds up to the next 4 KB boundary. For example, suppose your query returns 10 items whose combined size is 40.8 KB. DynamoDB rounds the item size for the operation to 44 KB. If a query returns 1500 items of 64 bytes each, the cumulative size is 96 KB."

Scan – Gets all items and attributes by performing a full scan across the table or a secondary index. You can limit the return set by specifying filters against one or more attributes.

DynamoDB streams = Transaction logs

DynamoDB read limiting

When using a large Scan operation in DynamoDB, what technique can be used to minimize the impact of a scan on a table’s provisioned throughput?

A. Set a smaller page size for the scan
B. Prewarm the table by updating all items
C. Use parallel scans
D. Define a range index on the table

Why?

http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/QueryAndScanGuidelines.html#QueryAndScanGuidelines.BurstsOfActivity

Reduce Page Size
Because a Scan operation reads an entire page (by default, 1 MB), you can reduce the impact of the scan operation by setting a smaller page size. The Scan operation provides a Limit parameter that you can use to set the page size for your request. Each Scan or Query request that has a smaller page size uses fewer read operations and creates a "pause" between each request. For example, if each item is 4 KB and you set the page size to 40 items, then a Query request would consume only 40 strongly consistent read operations or 20 eventually consistent read operations. A larger number of smaller Scan or Query operations would allow your other critical requests to succeed without throttling.




DynamoDB Hash and Range 

An application stores payroll information nightly in DynamoDB for a large number of employees across hundreds of offices. Item attributes consist of individual name, office identifier, and cumulative daily hours. Managers run reports for ranges of names working in their office. One query is. “Return all Items in this office for names starting with A through E”. Which table configuration will result in the lowest impact on provisioned throughput for this query?

A. Configure the table to have a range index on the name attribute, and a hash index on the office identifier
B. Configure a hash index on the name attribute and no range index
C. Configure the table to have a hash index on the name attribute, and a range index on the office identifier
D. Configure a hash index on the office Identifier attribute and no range index

Why?

http://stackoverflow.com/questions/27329461/what-is-hash-and-range-primary-key

This means that every row's primary key is the combination of the hash and range key. You can make direct gets on single rows if you have both the hash and range key, or you can make a query against the sorted range index. For example, get Get me all rows from the table with Hash key X that have range keys greater than Y, or other queries to that affect. They have better performance and less capacity usage compared to Scans and Queries against fields that are not indexed. 

And in this question, it's the names which we want to limit/range (not the offices as on C.)


DynamoDB and HTTP error codes

In DynamoDB, what type of HTTP response codes indicate that a problem was found with
the client request sent to the service?

A. 5xx HTTP response code
B. 200 HTTP response code
C. 306 HTTP response code
D. 4xx HTTP response code

Why?

https://en.wikipedia.org/wiki/List_of_HTTP_status_codes
  • 1xx Informational
  • 2xx Success
  • 3xx Redirection
  • 4xx Client error
  • 5xx Server error

AWS CDA Study List - S3

S3 

Basics

  • Key-value store
  • Object -level storage (not block based storage as EBS is)
  • Resources are buckets, which are by default private
  • Unlimited storage

Buckets

  • By default are private
  • Names are globally unique. You can't have two buckets with the same name even in different region
  • Buckets are though created in within single region. They don't exist globally, only their names exist.
  • Can't be nested. You can have bucket within bucket (eg. subfolders)
  • Ownership can't be transferred. Owner is always the AWS root account.
  • Buckets have no limit for object amount. You can have unlimited amount of objects within single bucket.
  • Default limit for buckets is 100, which can be raised by contacting AWS Support
  • Offers versioning and three levels for storing data (Standard, IA and Glacier)

Objects

  • Single object maximum size is 5GB
  • Supports 5TB for object size
  • Objects larger than 5GB can be uploaded by using Multipart-upload API
  • Pre-signed URL's can be used to share objects in private buckets
  • Successful upload of object returns HTTP 200 -message

Security

  • Bucket access be restricted by using S3 ACL or S3 bucket policy
  • Cloudfront can be used to do regional restriction


Following topics are exam questions collected through Internet and should be evaluated as so. Answers are mine and have been checked with answers collected through the internet, but might still be wrong.

S3 performance

If an application is storing hourly log files from thousands of instances from a high traffic web site, which naming scheme would give optimal performance on S3?

A. Sequential
B. HH-DD-MM-YYYY-log_instanceID
C. YYYY-MM-DD-HH-log_instanceID
D. instanceID_log-HH-DD-MM-YYYY
E. instanceID_log-YYYY-MM-DD-HH


You are designing a web application that stores static assets in an Amazon Simple Storage Service (S3) bucket. You expect this bucket to immediately receive over 150 PUT requests per second. What should you do to ensure optimal performance?

A. Use multi-part upload.
B. Add a random prefix to the key names.
C. Amazon S3 will automatically manage performance at this scale.
D. Use a predictable naming scheme, such as sequential numbers or date time sequences, in the key names


Why?

http://docs.aws.amazon.com/AmazonS3/latest/dev/request-rate-perf-considerations.html

Using a sequential prefix, such as timestamp or an alphabetical sequence, increases the likelihood that Amazon S3 will target a specific partition for a large number of your keys, overwhelming the I/O capacity of the partition. If you introduce some randomness in your key name prefixes, the key names, and therefore the I/O load, will be distributed across more than one partition.



S3 bucket limits

Your application is trying to upload a 6 GB file to Simple Storage Service and receive a “Your proposed upload exceeds the maximum allowed object size.” error message. What is a possible solution for this?

A. None, Simple Storage Service objects are limited to 5 GB
B. Use the multi-part upload API for this object
C. Use the large object upload API for this object
D. Contact support to increase your object size limit
E. Upload to a different region


You have an application running on an Amazon Elastic Compute Cloud instance, that uploads 5 GB video objects to Amazon Simple Storage Service (S3). Video uploads are taking longer than expected, resulting in poor application performance. Which method will help improve performance of your application?

A. Enable enhanced networking
B. Use Amazon S3 multipart upload
C. Leveraging Amazon CloudFront, use the HTTP POST method to reduce latency.
D. Use Amazon Elastic Block Store Provisioned IOPs and use an Amazon EBS-optimized instance


A media company produces new video files on-premises every day with a total size of around 100GB after compression. All files have a size of 1-2 GB and need to be uploaded to Amazon S3 every night in a fixed time window between 3am and 5am. Current upload takes almost 3 hours, although less than half of the available bandwidth is used. What step(s) would ensure that the file uploads are able to complete in the allotted time window?

A. Increase your network bandwidth to provide faster throughput to S3
B. Upload the files in parallel to S3 using multipart upload
C. Pack all files into a single archive, upload it to S3, then extract the files in AWS
D. Use AWS Import/Export to transfer the video files


Why?

S3 transfers are limited to 5GB, but by using Multipart Upload API 5TB objects can be uploaded to S3

https://aws.amazon.com/blogs/aws/amazon-s3-multipart-upload/



Using S3 as static website

Company C is currently hosting their corporate site in an Amazon S3 bucket with Static Website Hosting enabled. Currently, when visitors go to http://www.companyc.com the index.html page is returned. Company C now would like a new page welcome.html to be returned when a visitor enters http://www.companyc.com in the browser. Which of the following steps will allow Company C to meet this requirement? Choose 2 answers

A. Upload an html page named welcome.html to their S3 bucket
B. Create a welcome subfolder in their S3 bucket
C. Set the Index Document property to welcome.html
D. Move the index.html page to a welcome subfolder
E. Set the Error Document property to welcome.html


An Amazon S3 bucket, “myawsbucket” is configured with website hosting in Tokyo region,
what is the region-specific website endpoint?

A. www.myawsbucket.ap-northeast-1.amazonaws.com
B. myawsbucket.s3-website-ap-northeast-l.amazonawscom
C. myawsbucket.amazonaws.com
D. myawsbucket.tokyo.amazonaws.com

Why?

AWS websites always have text "website" on the URL


S3 encryption cipher

What type of block cipher does Amazon S3 offer for server side encryption?

A. RC5
B. Blowfish
C. Triple DES
D. Advanced Encryption Standard

Why?
http://docs.aws.amazon.com/AmazonS3/latest/dev/UsingServerSideEncryption.html
Amazon S3 server-side encryption uses one of the strongest block ciphers available, 256-bit Advanced Encryption Standard (AES-256), to encrypt your data.


S3 bucket policies and pre-signed URL's

Company A has an S3 bucket containing premier content that they intend to make available to only paid subscribers of their website. The S3 bucket currently has default permissions of all objects being private to prevent inadvertent exposure of the premier content to non-paying website visitors. How can Company A provide only paid subscribers the ability to download a premier content file in the S3 bucket?

A. Apply a bucket policy that grants anonymous users to download the content from the S3 bucket
B. Generate a pre-signed object URL for the premier content file when a paid subscriber requests a download
C. Add a bucket policy that requires Multi-Factor Authentication for requests to access the S3 bucket objects
D. Enable server side encryption on the S3 bucket for data protection against the non-paying website visitors


You run an ad-supported photo sharing website using S3 to serve photos to visitors of your
site. At some point you find out that other sites have been linking to the photos on your site, causing loss to your business. What is an effective method to mitigate this?

A. Store photos on an EBS volume of the web server
B. Remove public read access and use signed URLs with expiry dates.
C. Use CloudFront distributions for static content.
D. Block the IPs of the offending websites in Security Groups.



Why? http://docs.aws.amazon.com/AmazonS3/latest/dev/ShareObjectPreSignedURL.html

"Anyone who receives the pre-signed URL can then access the object. For example, if you have a video in your bucket and both the bucket and the object are private, you can share the video with others by generating a pre-signed URL."


S3 bucket limits

What is the maximum number of S3 Buckets available per AWS account?

A. There is no limit
B. 100 per account
C. 100 per IAM user
D. 100 per region
E. 500 per account

Why?

http://docs.aws.amazon.com/general/latest/gr/aws_service_limits.html

Amazon Simple Storage Service (Amazon S3) Limits

Resource Default Limit
Buckets 100 per account
For information about additional documented limits, see Amazon S3 limits in the Amazon Simple Storage Service Developer Guide.

S3 security token

When uploading an object, what request header can be explicitly specified in a request to Amazon S3 to encrypt object data when saved on the server side?

A. x-amz-storage-class
B. Content-MD5
C. x-amz-security-token
D. x-amz-server-side-encryption

Why?

http://docs.aws.amazon.com/AmazonS3/latest/dev/RESTAuthentication.html

If you are signing your request using temporary security credentials (see Making Requests), you must include the corresponding security token in your request by adding the x-amz-security-token header.



S3 data restriction

Which features can be used to restrict access to data in S3? Choose 2 answers

A. Set an S3 Bucket policy.
B. Enable IAM Identity Federation.
C. Set an S3 ACL on the bucket or the object.
D. Create a CloudFront distribution for the bucket
E. Use S3 Virtual Hosting

Why?

https://blogs.aws.amazon.com/security/post/TxPOJBY6FE360K/IAM-policies-and-Bucket-Policies-and-ACLs-Oh-My-Controlling-Access-to-S3-Resourc

This week we’ll discuss another frequently asked about topic: the distinction between IAM policies, S3 bucket policies, S3 ACLs, and when to use each. They’re all part of the AWS access control toolbox, but differ in how they’re used. 


A customer is leveraging Amazon Simple Storage Service in eu-west-1 to store static content for a web-based property. The customer is storing objects using the Standard Storage class. Where are the customers objects replicated?

A. A single facility in eu-west-1 and a single facility in eu-central-1
B. A single facility in eu-west-1 and a single facility in us-east-1
C. Multiple facilities in eu-west-1
D. A single facility in eu-west-1


Why? https://aws.amazon.com/s3/faqs/ "You specify a region when you create your Amazon S3 bucket. Within that region, your objects are redundantly stored on multiple devices across multiple facilities"


A customer wants to leverage Amazon Simple Storage Service (S3) and Amazon Glacier as part of their backup and archive infrastructure. The customer plans to use third-party software to support this integration. Which approach will limit the access of the third party software to only the Amazon S3 bucket named “companybackup”?


A. A custom bucket policy limited to the Amazon S3 API in thee Amazon Glacier archive “company-backup”
B. A custom bucket policy limited to the Amazon S3 API in “company-backup”
C. A custom IAM user policy limited to the Amazon S3 API for the Amazon Glacier archive “company-backup”.
D. A custom IAM user policy limited to the Amazon S3 API in “company-backup”.

Why? Although this can be done by B, you're still missing the IAM user needed hence making the D more precise answer.


What are characteristics of Amazon S3? Choose 2 answers

A. S3 allows you to store objects of virtually unlimited size.
B. S3 offers Provisioned IOPS.
C. S3 allows you to store unlimited amounts of data.
D. S3 should be used to host a relational database.
E. Objects are directly accessible via a URL.

Why? S3 objects are limited to 5TB size. S3 doesn't provide Provisioned IOPS and you should store RDBMS datafiles on S3.


You are working with a customer who has 10 TB of archival data that they want to migrate to Amazon Glacier. The customer has a 1-Mbps connection to the Internet. Which service or feature provides the fastest method of getting the data into Amazon Glacier?

A. Amazon Glacier multipart upload
B. AWS Storage Gateway
C. VM Import/Export
D. AWS Import/Export

Why? http://docs.aws.amazon.com/amazonglacier/latest/dev/uploading-archive-mpu.html


You need to configure an Amazon S3 bucket to serve static assets for your public-facing web application. Which methods ensure that all objects uploaded to the bucket are set to public read? Choose 2 answers

A. Set permissions on the object to public read during upload.
B. Configure the bucket ACL to set all objects to public read.
C. Configure the bucket policy to set all objects to public read.
D. Use AWS Identity and Access Management roles to set the bucket to public read.
E. Amazon S3 objects default to public read, so no action is needed.

Why? https://aws.amazon.com/articles/5050 "You can use ACLs to grant permissions to individual AWS accounts; however, it is strongly recommended that you do not grant public access to your bucket using an ACL."


A company is storing data on Amazon Simple Storage Service (S3). The company’s security policy mandates that data is encrypted at rest. Which of the following methods can achieve this? Choose 3 answers

A. Use Amazon S3 server-side encryption with AWS Key Management Service managed keys.
B. Use Amazon S3 server-side encryption with customer-provided keys.
C. Use Amazon S3 server-side encryption with EC2 key pair.
D. Use Amazon S3 bucket policies to restrict access to the data at rest.
E. Encrypt the data on the client-side before ingesting to Amazon S3 using their own master key.
F. Use SSL to encrypt the data while in transit to Amazon S3.

Why? Other ones are not about securing data at rest.


Which of the following are valid statements about Amazon S3? Choose 2 answers

A. S3 provides read-after-write consistency for any type of PUT or DELETE.
B. Consistency is not guaranteed for any type of PUT or DELETE.
C. A successful response to a PUT request only occurs when a complete object is saved.
D. Partially saved objects are immediately readable with a GET after an overwrite PUT.
E. S3 provides eventual consistency for overwrite PUTS and DELETES.


Which features can be used to restrict access to data in S3? Choose 2 answers

A. Set an S3 ACL on the bucket or the object.
B. Create a CloudFront distribution for the bucket.
C. Set an S3 bucket policy.
D. Enable IAM Identity Federation
E. Use S3 Virtual Hosting


Why? Ruling out the wrong B,D and E leaves correct answer.


You run an ad-supported photo sharing website using S3 to serve photos to visitors of your site. At some point you find out that other sites have been linking to the photos on your site, causing loss to your business. What is an effective method to mitigate this?

A. Remove public read access and use signed URLs with expiry dates.
B. Use CloudFront distributions for static content.
C. Block the IPs of the offending websites in Security Groups.
D. Store photos on an EBS volume of the web server.


Which set of Amazon S3 features helps to prevent and recover from accidental data loss?

A. Object lifecycle and service access logging
B. Object versioning and Multi-factor authentication
C. Access controls and server-side encryption
D. Website hosting and Amazon S3 policies

Why? Ruling out the wrong ACD.


You have an application running on an Amazon Elastic Compute Cloud instance, that uploads 5 GB video objects to Amazon Simple Storage Service (S3). Video uploads are taking longer than expected, resulting in poor application performance. Which method will help improve performance of your application?

A. Enable enhanced networking
B. Use Amazon S3 multipart upload
C. Leveraging Amazon CloudFront, use the HTTP POST method to reduce latency.
D. Use Amazon Elastic Block Store Provisioned IOPs and use an Amazon EBS-optimized instance

Why? S3 multipart upload is the only option which speeds up S3 upload from following options.


A customer wants to track access to their Amazon Simple Storage Service (S3) buckets and also use this information for their internal security and access audits. Which of the following will meet the Customer requirement?

A. Enable AWS CloudTrail to audit all Amazon S3 bucket access.
B. Enable server access logging for all required Amazon S3 buckets.
C. Enable the Requester Pays option to track access via AWS Billing
D. Enable Amazon S3 event notifications for Put and Post.

Why? http://docs.aws.amazon.com/AmazonS3/latest/dev/ServerLogs.html Although server access logs are confusing name, they provide the functionality


A company is deploying a two-tier, highly available web application to AWS. Which service provides durable storage for static content while utilizing lower overall CPU resources for the web tier?

A. Amazon EBS volume
B. Amazon S3
C. Amazon EC2 instance store
D. Amazon RDS instance

Why? From following, S3 doesn't use CPU resources and can be used to store static content.


You are designing a web application that stores static assets in an Amazon Simple Storage Service (S3) bucket. You expect this bucket to immediately receive over 150 PUT requests per second. What should you do to ensure optimal performance?

A. Use multi-part upload.
B. Add a random prefix to the key names.
C. Amazon S3 will automatically manage performance at this scale.
D. Use a predictable naming scheme, such as sequential numbers or date time sequences, in the key names

Why? Using random prefix in file name will increase upload speed.


What is the Reduced Redundancy option in Amazon S3?

A. Less redundancy for a lower cost.
B. It doesn’t exist in Amazon S3, but in Amazon EBS.
C. It allows you to destroy any copy of your files outside a specific jurisdiction.
D. It doesn’t exist at all

Why? https://aws.amazon.com/s3/reduced-redundancy/


Can Amazon S3 uploads resume on failure or do they need to restart?

A. Restart from beginning
B. You can resume them, if you flag the “resume on failure” option before uploading.
C. Resume on failure
D. Depends on the file size

Why? A, unless using Multi-Part upload which would mean that C is correct.


What is the durability of S3 RRS?

A. 99.99%
B. 99.95%
C. 99.995%
D. 99.999999999%

Why? https://aws.amazon.com/s3/reduced-redundancy/


What is Amazon Glacier?

A. You mean Amazon “Iceberg”: it’s a low-cost storage service.
B. A security tool that allows to “freeze” an EBS volume and perform computer forensics on it.
C. A low-cost storage service that provides secure and durable storage for data archiving and backup.
D. It’s a security tool that allows to “freeze” an EC2 instance and perform computer forensics on it.

Why? https://aws.amazon.com/glacier/ "Amazon Glacier is a secure, durable, and extremely low-cost cloud storage service for data archiving and long-term backup."

AWS CDA Study List - Random questions

Following topics are exam questions collected through Internet and should be evaluated as so. Answers are mine and have been checked with answers collected through the internet, but might still be wrong.

Scenario about Load balancing (ELB)

A startup s photo-sharing site is deployed in a VPC. An ELB distributes web traffic across two subnets. ELB session stickiness is configured to use the AWS-generated session cookie, with a session TTL of 5 minutes. The webserver Auto Scaling Group is configured as: min-size=4, max-size=4. The startups preparing for a public launch, by running load-testing software installed on a single EC2 instance running in us-west-2a. After 60 minutes of load-testing, the webserver logs show:



Which recommendations can help ensure load-testing HTTP requests are evenly distributed across the four webservers? Choose 2 answers.

A. Re-configure the load-testing software to re-resolve DNS for each web request.
B. Use a 3rd-party load-testing service which offers globally-distributed test clients.
C. Configure ELB and Auto Scaling to distribute across us-west-2a and us-west-2c.
D. Configure ELB session stickiness to use the app-specific session cookie.
E. Launch and run the load-tester EC2 instance from us-east-1 instead.

Why?

https://aws.amazon.com/articles/1636185810492479

“If you do not ensure that DNS is re-resolved or use multiple test clients to simulate increased load, the test may continue to hit a single IP address when Elastic Load Balancing has actually allocated many more IP addresses. Because your end users will not all be resolving to that single IP address, your test will not be a realistic sampling of real-world behaviour.”

Load Testing with Session Affinity

If your configuration leverages session affinity, then it is important for the load generator to use multiple clients, so that Elastic Load Balancing can behave as it would in the real world. If you do not make these adjustments, then Elastic Load Balancing will always send requests to the same back-end servers, potentially overwhelming the back-end servers well before Elastic Load Balancing has to scale to meet the load. To test in this scenario, you will need to use a load testing solution that uses multiple clients to generate the load.


Securing data at rest on EBS volumes

How can you secure data at rest on an EBS volume?
A. Write the data randomly instead of sequentially.
B. Use an encrypted file system on top of the EBS volume.
C. Encrypt the volume using the S3 server-side encryption service.
D. Create an IAM policy that restricts read and write access to the volume.
E. Attach the volume to an instance using EC2’s SSL interface.
Why? 
"Another option would be to use file system-level encryption, which works by stacking an encrypted file system on top of an existing file system"

EBS vs Instance-store

What is one key difference between an Amazon EBS-backed and an instance-store backed instance?

A. Virtual Private Cloud requires EBS backed instances
B. Amazon EBS-backed instances can be stopped and restarted
C. Auto scaling requires using Amazon EBS-backed instances.
D. Instance-store backed instances can be stopped and restarted.

Why? 

http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ComponentsAMIs.html

Storage for the Root Device, Stopped state
Instance store: Cannot be in stopped state; instances are running or terminated vs
EBS-Backed: Can be placed in stopped state where instance is not running, but the root volume is persisted in Amazon EBS


Service identification

Which of the following services are key/value stores? Choose 3 answers

A. Amazon ElastiCache
B. Simple Notification Service
C. DynamoDB 
D. Simple Workflow Service
E. Simple Storage Service

Why? SNS is sending messages and SWF is meant for tasks

NAT instances

After launching an instance that you intend to serve as a NAT (Network Address Translation) device in a public subnet you modify your route tables to have the NAT device be the target of internet bound traffic of your private subnet. When you try and make an outbound connection to the Internet from an instance in the private subnet, you are not successful. Which of the following steps could resolve the issue?

A.Attaching a second Elastic Network interface (ENI) to the NAT instance, and placing it in the private subnet
B. Attaching an Elastic IP address to the instance in the private subnet
C. Attaching a second Elastic Network Interface (ENI) to the instance in the private subnet, and placing it in the public subnet
D. Disabling the Source/Destination Check attribute on the NAT instance

Why?

http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/VPC_NAT_Instance.html

Each EC2 instance performs source/destination checks by default. This means that the instance must be the source or destination of any traffic it sends or receives. However, a NAT instance must be able to send and receive traffic when the source or destination is not itself. Therefore, you must disable source/destination checks on the NAT instance.

AWS free tools

Which of the following services are included at no additional cost with the use of the AWS
platform? Choose 2 answers

A. CloudFormation
B. Simple Workflow Service
C. Elastic Load Balancing
D. Elastic Compute Cloud
E. Simple Storage Service
F. Auto Scaling

Why? Auto Scaling and CloudFormation are not resources. They are tools to create resources.

Instance Metadata

How can software determine the public and private IP addresses of the Amazon EC2 instance that it is running on?

A. Query the appropriate Amazon CloudWatch metric.
B. Use ipconfig or ifconfig command.
C. Query the local instance userdata.
D. Query the local instance metadata.

Why? 
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-metadata.html

To view all categories of instance metadata from within a running instance, use the following URL:
http://169.254.169.254/latest/meta-data/


Amazon VPC problem solving

You have an environment that consists of a public subnet using Amazon VPC and 3 instances that are running in this subnet. These three instances can successfully communicate with other hosts on the Internet. You launch a fourth instance in the same subnet, using the same AMI and security group configuration you used for the others, but find that this instance cannot be accessed from the Internet. What should you do to enable internet access?

A. Deploy a NAT instance into the public subnet.
B. Modify the routing table for the public subnet
C. Configure a publically routable IP Address In the host OS of the fourth instance.
D. Assign an Elastic IP address to the fourth instance.

Why? 

Ruling out
A. NAT would not solve anything.
B. Routing already works for other instances
C. This not how you add IP address to instance

SWF

Which of the following statements about SWF are true? Choose 3 answers

A. SWF uses deciders and workers to complete tasks
B. SWF requires at least 1 EC2 instance per domain
C. SWF triggers SNS notifications on task assignment
D. SWF requires an S3 bucket for workflow storage
E. SWF tasks are assigned once and never duplicated
F. SWF workflow executions can last up to a year

Why?

A. http://docs.aws.amazon.com/amazonswf/latest/developerguide/swf-dg-intro-to-swf.html

An activity worker is a program that receives activity tasks, performs them, and provides results back. 

The coordination logic in a workflow is contained in a software program called a decider. The decider schedules activity tasks, provides input data to the activity workers, processes events that arrive while the workflow is in progress, and ultimately ends (or closes) the workflow when the objective has been completed.


E. http://docs.aws.amazon.com/amazonswf/latest/developerguide/swf-dev-task-lists.html
A task is always scheduled on only one task list; tasks are not shared across lists. 

F. http://docs.aws.amazon.com/amazonswf/latest/developerguide/swf-dg-limits.html
Maximum workflow execution time – 1 year


Responsibilities

In AWS, which security aspects are the customer’s responsibility? Choose 4 answers

A. Decommissioning storage devices
B. Patch management on the EC2 instance’s operating system
C. Controlling physical access to compute resources
D. Security Group and ACL (Access Control List) settings
E. Life-cycle management of IAM credentials
F. Encryption of EBS (Elastic Block Storage) volumes

Why?

Decommissioning storage devices and controlling physical access to compute resources is not customer's responsibility as those are PHYSICAL assets

    ElasticCache as Session state store

    You have written an application that uses the Elastic Load Balancing service to spread traffic to several web servers. Your users complain that they are sometimes forced to login again in the middle of using your application, after they have already togged in. This is not behavior you have designed. What is a possible solution to prevent this happening?

    A. Use instance memory to save session state.
    B. Use instance storage to save session state.
    C. Use EBS to save session state
    D. Use ElastiCache to save session state.
    E. Use Glacier to save session slate.

    Why?

    https://blogs.aws.amazon.com/net/post/TxMREMF0459SXT/ElastiCache-as-an-ASP-NET-Session-Store

    ElastiCache is a web service that makes it easy to deploy, operate, and scale an in-memory cache in the cloud. ElastiCache supports both Memcached and Redis cache clusters. While either technology can store ASP.NET session state, Microsoft offers a provider for Redis, and I will focus on Redis here.

    EC2 API calls

    Which EC2 API call would you use to retrieve a list of Amazon Machine Images (AMIs)?

    A. DescribeInstances
    B. You cannot retrieve a list of AMIs as there are over 10,000 AMIs
    C. GetAMls
    D. DescribeImages
    E. DescribeAMls

    Why?

    http://docs.aws.amazon.com/cli/latest/reference/ec2/describe-images.html

    Describes one or more of the images (AMIs, AKIs, and ARIs) available to you. Images available to you include public images, private images that you own, and private images owned by other AWS accounts but for which you have explicit launch permissions.

    Public AMI 

    EC2 instances are launched from Amazon Machine images (AMIS). A given public AMI can:

    A. be used to launch EC2 Instances in any AWS region.
    B. only be used to launch EC2 instances in the same country as the AMI is stored.
    C. only be used to launch EC2 instances in the same AWS region as the AMI is stored.
    D. only be used to launch EC2 instances in the same AWS availability zone as the AMI is  stored

    Why?

    http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/sharingamis-intro.html

    AMIs are a regional resource. Therefore, sharing an AMI makes it available in that region. To make an AMI available in a different region, copy the AMI to the region and then share it. For more inforamation, see Copying an AMI.

    LDAP federation

    A corporate web application is deployed within an Amazon VPC, and is connected to the corporate data center via IPSec VPN. The application must authenticate against the on-premise LDAP server. Once authenticated, logged-in users can only access an S3 keyspace specific to the user. Which two approaches can satisfy the objectives? Choose 2 answers

    A. The application authenticates against LDAP, and retrieves the name of an IAM role associated with the user. The application then calls the IAM Security Token Service to  assume that IAM Role. The application can use the temporary credentials to access the  appropriate S3 bucket.
    B. Develop an identity broker which authenticates against IAM Security Token Service to assume an IAM Role to get temporary AWS security credentials. The application calls the  identity broker to get AWS temporary security credentials with access to the appropriate S3 bucket.
    C. The application authenticates against IAM Security Token Service using the LDAP credentials. The application uses those temporary AWS security credentials to access the appropriate S3 bucket.
    D. The application authenticates against LDAP. The application then calls the IAM Security Service to login to IAM using the LDAP credentials. The application can use the IAM temporary credentials to access the appropriate S3 bucket.
    E. Develop an identity broker which authenticates against LDAP, and then calls IAM  Security Token Service to get IAM federated user credentials. The application calls the identity broker to get IAM federated user credentials with access to the appropriate S3 bucket.

    Why?

    Because the application must authenticate against LDAP
    B – there are no LDAP authentication, so this is incorrect
    C – you cannot authenticate with STS directly using LDAP
    D – same, it’s using LDAP credentials to logon directly, cannot be done