Amazon S3 is a simple key, value object store designed for the Internet
S3 provides unlimited storage space and works on the pay as you use model. Service rates gets cheaper as the usage volume increases
S3 is an Object level storage (not a Block level storage) and cannot be used to host OS or dynamic websites.
S3 resources for e.g. buckets and objects are private by default
to use IPv6 yo need to manage a dual stack endpoint.
To require server-side encryption of all objects in a particular Amazon S3 bucket, you can use a policy.
To require that a particular AWS KMS key be used to encrypt the objects in a bucket, you can use the s3:x-amz-server-side-encryption-aws-kms-key-id condition key.
If you need server-side encryption for all of the objects that are stored in a bucket, use a bucket policy.
For example, the following bucket policy denies permissions to upload an object unless the request includes the x-amz-server-side-encryption header to request server-side encryption:
{
"Version": "2012-10-17",
"Id": "PutObjectPolicy",
"Statement": [
{
"Sid": "DenyIncorrectEncryptionHeader",
"Effect": "Deny",
"Principal": "*",
"Action": "s3:PutObject",
"Resource": "arn:aws:s3:::awsexamplebucket1/*",
"Condition": {
"StringNotEquals": {
"s3:x-amz-server-side-encryption": "AES256"
}
}
},
{
"Sid": "DenyUnencryptedObjectUploads",
"Effect": "Deny",
"Principal": "*",
"Action": "s3:PutObject",
"Resource": "arn:aws:s3:::awsexamplebucket1/*",
"Condition": {
"Null": {
"s3:x-amz-server-side-encryption": "true"
}
}
}
]
}
A bucket is a container for objects stored in S3 and help organize the S3 namespace.
A bucket is owned by the AWS account that creates it and helps identify the account responsible for storage and data transfer charges.
Bucket ownership is not transferable
S3 bucket names are globally unique, regardless of the AWS region in which you create the bucket
Even though S3 is a global service, buckets are created within a region specified during the creation of the bucket.
Every object is contained in a bucket
There is no limit to the number of objects that can be stored in a bucket and no difference in performance whether you use many buckets to store your objects or a single bucket to store all your objects.
S3 data model is a flat structure i.e. there are no hierarchies or folders within the buckets. However, logical hierarchy can be inferred using the keyname prefix e.g. Folder1/Object1.
Restrictions
100 buckets (soft limit) can be created in each of AWS account
Bucket names should be globally unique and DNS compliant
Bucket ownership is not transferable
Buckets cannot be nested and cannot have bucket within another bucket
You can delete a empty or a non-empty bucket
S3 allows retrieval of 1000 objects and provides pagination support
rules for naming your bucket:
Use 3–63 characters.
Use only lowercase letters, numbers, and hyphens (-).
Do not use a period (.) while using virtual hosted-style buckets with SSL. Buckets that have a period in the bucket name can cause certificate exceptions when accessed with HTTPS-based URLs.
Do not use the underscore (_) character.
Objects are the fundamental entities stored in S3 bucket
Object is uniquely identified within a bucket by a keyname and version ID
Objects consist of object data, metadata and others
Key is object name
Value is data portion is opaque to S3
Metadata is the data about the data and is a set of name-value pairs that describe the object for e.g. content-type, size, last modified. Custom metadata can also be specified at the time the object is stored.
Version ID is the version id for the object and in combination with the key helps to unique identify an object within a bucket
Subresources helps provide additional information for an object
Access Control Information helps control access to the objects
S3 objects allow two kinds of metadata for an S3 object
System metadata
Metadata such as the Last-Modified date is controlled by the system. Only S3 can modify the value.
System metadata that user can control, for e.g., the storage class configured for the object.
User-defined metadata
User-defined metadata can be assigned during uploading the object or after the object has been uploaded.
User-defined metadata is stored with the object and is returned when object is downloaded
S3 does not process user-defined metadata.
User-defined metadata must begin with the prefix “x-amz-meta“, otherwise S3 will not set the key value pair as you define it
Object metadata cannot be modified after the object is uploaded and it can be only modified by performing copy operation and setting the metadata
Objects belonging to a bucket reside in a specific AWS region never leave that region, unless explicitly copied using Cross Region replication
Object can be retrieved as a whole or a partially
With Versioning enabled, current as well as previous versions of an object can be retrieved.
Objects in S3 buckets have no hierarchy. You can use prefixes (such as Dev) in key names to group similar items.
Listing
S3 allows listing of all the keys within a bucket
A single listing request would return a max of 1000 object keys with pagination support using an indicator in the response to indicate if the response was truncated.
Keys within a bucket can be listed using Prefix and Delimiter.
Prefix limits results to only those keys (kind of filtering) that begin with the specified prefix, and delimiter causes list to roll up all keys that share a common prefix into a single summary list result.
For example, to list all the states in USA, set Delimiter='/' and Prefix='North America/USA/'.
Retrieval
Object can be retrieved as a whole
Object can be retrieved in parts or partially (specific range of bytes) by using the Range HTTP header.
Range HTTP header is helpful
if only partial object is needed for e.g. multiple files were uploaded as a single archive
for fault tolerant downloads where the network connectivity is poor
Through the “Range” header in the HTTP GET request, a specified portion of the objects can be downloaded instead of the whole objects.
Objects can also be downloaded by sharing Pre-Signed urls
Metadata of the object is returned in the response headers
Object Uploads
Single Operation – Objects of size 5GB can be uploaded in a single PUT operation
Multipart upload – can be used for objects of size > 5GB and supports max size of 5TB can is recommended for objects above size 100MB.
Pre-Signed URLs can also be used shared for uploading objects
Uploading object if successful, can be verified if the request received a success response. Additionally, returned ETag can be compared to the calculated MD5 value of the upload object.
Copying Objects
Copying of object up to 5GB can be performed using a single operation and multipart upload can be used for uploads up to 5TB
When an object is copied
user-controlled system metadata e.g. storage class and user-defined metadata are also copied.
system controlled metadata e.g. the creation date etc is reset
Copying Objects can be needed
Create multiple object copies
Copy object across locations
Renaming of the objects
Change object metadata for e.g. storage class, server-side encryption etc
Updating any metadata for an object requires all the metadata fields to be specified again
Deleting Objects
S3 allows deletion of a single object or multiple objects (max 1000) in a single call
For Non Versioned buckets, the object key needs to be provided and object is permanently deleted
For Versioned buckets,
if an object key is provided, S3 inserts a delete marker and the previous current object becomes non current object
if an object key with a version ID is provided, the object is permanently deleted
if the version ID is of the delete marker, the delete marker is removed and the previous non current version becomes the current version object
Deletion can be MFA enabled for adding extra security
Restoring Objects from Glacier
Objects must be restored before you can access an archived object
Restoration of an Object can take about 3 to 5 hours for standard retrievals. Glacier now offers expedited retrievals within minutes
Restoration request also needs to specify the number of days for which the object copy needs to be maintained.
During this period, the storage cost for both the archive and the copy is charged
All buckets and objects are by default private
Pre-signed URLs allows user to be able to download or upload a specific object without requiring AWS security credentials or permissions.
Pre-signed URL allows anyone access to the object identified in the URL, provided the creator of the URL has permissions to access that object.
Creation of the pre-signed urls requires the creator to provide his security credentials, specify a bucket name, an object key, an HTTP method (GET for download object & PUT of uploading objects), and expiration date and time.
Pre-signed urls are valid only till the expiration date & time.
Multipart upload allows the user to upload a single object as a set of parts. Each part is a contiguous portion of the object’s data.
Multipart uploads supports 1 to 10000 parts and each part can be from 5MB to 5GB with last part size allowed to be less than 5MB.
Multipart uploads allows max upload size of 5TB
Object parts can be uploaded independently and in any order. If transmission of any part fails, it can be retransmitted without affecting other parts.
After all parts of the object are uploaded and complete initiated, S3 assembles these parts and creates the object.
Advantages:
Improved throughput – parallel upload of parts to improve throughput
Quick recovery from any network issues – Smaller part size minimizes the impact of restarting a failed upload due to a network error.
Pause and resume object uploads – Object parts can be uploaded over time. Once a multipart upload is initiated there is no expiry; you must explicitly complete or abort the multipart upload.
Begin an upload before the final object size is known – an object can be uploaded as is it being created
Three Step process
Multipart Upload Initiation
Initiation of a Multipart upload request to S3 returns a unique ID for each multipart upload.
This ID needs to be provided for each part uploads, completion or abort request and listing of parts call.
All the Object metadata required needs to be provided during the Initiation call
Parts Upload
Parts upload of objects can be performed using the unique upload ID
A part number (between 1 – 10000) needs to be specified with each request which identifies each part and its position in the object
If a part with the same part number is uploaded, the previous part would be overwritten
After the part upload is successful, S3 returns an ETag header in the response which must be recorded along with the part number to be provided during the multipart completion request.
Multipart Upload Completion or Abort
On Multipart Upload Completion request, S3 creates an object by concatenating the parts in ascending order based on the part number and associates the metadata with the object.
Multipart Upload Completion request should include the unique upload ID with all the parts and the ETag information.
S3 response includes an ETag that uniquely identifies the combined object data
On Multipart upload Abort request, the upload is aborted and all parts are removed. Any new part upload would fail. However, any in progress part upload is completed and hence and abort request must be sent after all the parts upload have been completed.
S3 should receive a multipart upload completion or abort request else it will not delete the parts and storage would be charged.
S3 allows the buckets and objects to be referred in Path-style or Virtual hosted-style URLs
Path-style
Bucket name is not part of the domain (unless you use a region specific endpoint)
The endpoint used must match the region in which the bucket resides for e.g, if you have a bucket called mybucket that resides in the EU (Ireland) region with object named puppy.jpg, the correct path-style syntax URI is http://s3-eu-west-1.amazonaws.com/mybucket/puppy.jpg.
A “PermanentRedirect” error is received with an HTTP response code 301, and a message indicating what the correct URI is for the resource.
Virtual hosted-style
S3 supports virtual hosted-style and path-style access in all regions.
In a virtual-hosted-style URL, the bucket name is part of the domain name in the URL for e.g. http://bucketname.s3.amazonaws.com/objectname
S3 virtual hosting can be used to address a bucket in a REST API call by using the HTTP Host header.
Benefits
attractiveness of customized URLs,
provides an ability to publish to the “root directory” of the bucket’s virtual server. This ability can be important because many existing applications search for files in this standard location.
S3 updates DNS to reroute the request to the correct location when a bucket is created in any region, which might take time.
S3 routes any virtual hosted-style requests to the US East (N.Virginia) region, by default, if the US East (N. Virginia) endpoint s3.amazonaws.com is used, instead of the region-specific endpoint (for example, s3-eu-west-1.amazonaws.com) and S3 redirects it with HTTP 307 redirect to the correct region.
When using virtual hosted-style buckets with SSL, the SSL wildcard certificate only matches buckets that do not contain periods.To work around this, use HTTP or write your own certificate verification logic.
If you make a request to the http://bucket.s3.amazonaws.com endpoint, the DNS has sufficient information to route your request directly to the region where your bucket resides.
Amazon S3 costs vary by Region
Charges in S3 are incurred for
Storage – cost is per GB/month
Requests – per request cost varies depending on the request type GET, PUT
Data Transfer
data transfer in is free
data transfer out is charged per GB/month (except in the same region or to Amazon CloudFront)
S3 achieves high availability by replicating data across multiple servers within Amazon’s data centers.
S3 provides read-after-write consistency for PUTS of new objects
For a PUT request, S3 synchronously stores data across multiple facilities before returning SUCCESS
A process writes a new object to S3 and will be immediately able to read the Object
A process writes a new object to S3 and immediately lists keys within its bucket. Until the change is fully propagated, the object might not appear in the list.
S3 provides eventual consistency for overwrite PUTS and DELETES in all regions.
For updates and deletes to Objects, the changes are eventually reflected and not available immediately
if a process replaces an existing object and immediately attempts to read it. Until the change is fully propagated, S3 might return the prior data
if a process deletes an existing object and immediately attempts to read it. Until the deletion is fully propagated, S3 might return the deleted data.
if a process deletes an existing object and immediately lists keys within its bucket. Until the deletion is fully propagated, S3 might list the deleted object.
Updates to a single key are atomic. for e.g., if you PUT to an existing key, a subsequent read might return the old data or the updated data, but it will never write corrupted or partial data.
S3 does not currently support object locking. for e.g. If two PUT requests are simultaneously made to the same key, the request with the latest time stamp wins. If this is an issue, you will need to build an object-locking mechanism into your application.
Updates are key-based; there is no way to make atomic updates across keys. for e.g, you cannot make the update of one key dependent on the update of another key unless you design this functionality into your application.
Amazon S3 Subresources provides support to store, and manage the bucket configuration information.
S3 subresources only exist in the context of a specific bucket or object.
S3 Subresources are subordinates to entities; that is, they do not exist on their own, they are always associated with some other entity, such as an object or a bucket.
Object Lifecycle
Go to the section about Object Lifecycle.
Static Website hosting
S3 can be used for Static Website hosting with Client side scripts.
S3 does not support server-side scripting
S3, in conjunction with Route 53, supports hosting a website at the root domain which can point to the S3 website endpoint.
S3 website endpoints do not support https.
For S3 website hosting the content should be made publicly readable which can be provided using a bucket policy or an ACL on an object.
User can configure the index, error document as well as configure the conditional routing of on object name.
Bucket policy applies only to objects owned by the bucket owner. If your bucket contains objects not owned by the bucket owner, then public READ permission on those objects should be granted using the object ACL.
Requester Pays buckets or DevPay buckets do not allow access through the website endpoint. Any request to such a bucket will receive a 403 -Access Denied response
Previously we are only allowed domain prefix, when creating AWS Route53 aliases for AWS S3 static websites was the “www”. You can now use other sub-domains. Example: http://mydomain.com/error.html, http://www.mydomain.com, http://downloads.mydomain.com/index.html
Versioning
Go to the section about Versioning.
Policy & Access Control List (ACL)
Go to the section about ACL.
CORS (Cross Origin Resource Sharing)
All browsers implement the Same-Origin policy, for security reasons, where the web page from an domain can only request resources from the same domain.
CORS allow client web applications loaded in one domain access to the restricted resources to be requested from another domain.
With CORS support in S3 allows cross-origin access to S3 resources.
CORS configuration rules identify the origins allowed to access the bucket, the operations (HTTP methods) that would be supported for each origin, and other operation-specific information.
To enable CORS, create a CORS configuration XML file.
Example:
<CORSConfiguration>
<CORSRule>
<AllowedOrigin>*</AllowedOrigin>
<AllowedMethod>GET</AllowedMethod>
</CORSRule>
</CORSConfiguration>
You want to use JavaScript on the webpages that are stored in this bucket to be able to make authenticated GET and PUT requests against the same bucket by using the Amazon S3 API endpoint for the bucket, website.s3.us-east-1.amazonaws.com.
A browser would normally block JavaScript from allowing those requests, but with CORS you can configure your bucket to explicitly enable cross-origin requests from website.s3-website.us-east-1.amazonaws.com.
Logging
Logging, disabled by default, enables tracking access requests to S3 bucket
Each access log record provides details about a single access request, such as the requester, bucket name, request time, request action, response status, and error code, if any.
Access log information can be useful in security and access audits and also help learn about the customer base and understand the S3 bill
S3 periodically collects access log records, consolidates the records in log files, and then uploads log files to a target bucket as log objects.
If logging is enabled on multiple source buckets with same target bucket, the target bucket will have access logs for all those source buckets, but each log object will report access log records for a specific source bucket.
Tagging
To store and manage tags on a bucket
Cost allocation tags can be added to the bucket to categorize and track AWS costs.
AWS can generate a cost allocation report with usage and costs aggregated by the tags applied to the buckets.
Usage
Group objects – Tag resources with unique business, compliance, or project identifiers.
Cost allocation – Use tags in billing that are specific to your cost centers. Then, generate AWS tag-based reports to find the actual costs for each cost center.
Automation – Tag resources for specific automation procedures, such as backup or replication.
Access control – Create access control lists and restrict access.
Operation support and monitoring – Use tags to identify key systems.
Location
When you create a bucket, AWS region needs to be specified where the S3 bucket will be created, and we can use an API for retrieving this information
Notification
S3 notification feature enables notifications to be triggered when certain events happen in your bucket
Notifications are enabled at Bucket level
Notifications can be configured to be filtered by the prefix and suffix of the key name of objects. However, filtering rules cannot be defined with overlapping prefixes, overlapping suffixes, or prefix and suffix overlapping.
S3 can publish the following events
New Objects created event
Can be enabled for PUT, POST or COPY operations
You will not receive event notifications from failed operations
Object Removal event
Can public delete events for object deletion, version object deletion or insertion of delete marker
You will not receive event notifications from automatic deletes from lifecycle policies or from failed operations.
Reduced Redundancy Storage (RRS) object lost event
Can be used to reproduce/recreate the Object
S3 can publish events to the following destination: SNS topic, SQS queue, AWS Lambda
For S3 to be able to publish events to the destination, S3 principal should be granted necessary permissions.
You can configure Amazon S3 Event Notifications to publish events to specific destinations, such as the following: Amazon SNS topics, Amazon, SQS queues, AWS Lambda functions
Cross Region Replication
Cross-region replication is a bucket-level feature that enables automatic, asynchronous copying of objects across buckets in different AWS regions.
S3 can replicate all or a subset of objects with specific key name prefixes.
S3 encrypts all data in transit across AWS regions using SSL
Object replicas in the destination bucket are exact replicas of the objects in the source bucket with the same key names and the same metadata.
Useful for the following scenarios :-
Compliance requirement to have data backed up across regions
Minimize latency to allow users across geography to access objects
Operational reasons compute clusters in two different regions that analyze the same set of objects.
Requirements
Source and destination buckets must be versioning-enabled
Source and destination buckets must be in different AWS regions
Objects can be replicated from a source bucket to only one destination bucket
S3 must have permission to replicate objects from that source bucket to the destination bucket on your behalf.
If the source bucket owner also owns the object, the bucket owner has full permissions to replicate the object. If not, the source bucket owner must have permission for the S3 actions s3:GetObjectVersion and s3:GetObjectVersionACL to read the object and object ACL.
Setting up cross-region replication in a cross-account scenario (where the source and destination buckets are owned by different AWS accounts), the source bucket owner must have permission to replicate objects in the destination bucket.
Replicated & Not Replicated
Any new objects created after you add a replication configuration are replicated.
S3 does not retroactively replicate objects that existed before you added replication configuration.
Only Objects created with (Amazon S3-managed keys) SSE-S3 are replicated using server-side encryption using the S3-managed encryption key.
Objects created with server-side encryption using either customer-provided (SSE-C) or Objects created with server-side encryption using AWS KMS–managed encryption (SSE-KMS) keys are not replicated, by default. It requires additional handling.
S3 replicates only objects in the source bucket for which the bucket owner has permission to read objects and read ACLs.
Any object ACL updates are replicated, although there can be some delay before Amazon S3 can bring the two in sync. This applies only to objects created after you add a replication configuration to the bucket.
Updates to bucket-level S3 subresources are not replicated, allowing different bucket configurations on the source and destination buckets.
Only customer actions are replicated & actions performed by lifecycle configuration are not replicated.
Objects in the source bucket that are replicas, created by another cross-region replication, are not replicated.
Requester Pays
By default, buckets are owned by the AWS account that created it (the bucket owner) and the AWS account pays for storage costs, downloads and data transfer charges associated with the bucket.
Using Requester Pays subresource :-
Bucket owner specifies that the requester requesting the download will be charged for the download
However, the bucket owner still pays the storage costs
Enabling Requester Pays on a bucket
disables anonymous access to that bucket
does not support BitTorrent
does not support SOAP requests
cannot be enabled for end user logging bucket
Torrent
Bucket owner bears the cost of Storage as well as the request and transfer charges which can increase linearly for an popular object
S3 also supports the BitTorrent protocol
BitTorrent is an open source Internet distribution protocol
BitTorrent addresses this problem by recruiting the very clients that are downloading the object as distributors themselves
S3 bandwidth rates are inexpensive, but BitTorrent allows developers to further save on bandwidth costs for a popular piece of data by letting users download from Amazon and other users simultaneously
Benefit for publisher is that for large, popular files the amount of data actually supplied by S3 can be substantially lower than what it would have been serving the same clients via client/server download.
Any object in S3 that is publicly available and can be read anonymously can be downloaded via BitTorrent.
Torrent file can be retrieved for any publicly available object by simply adding a “?torrent” query string parameter at the end of the REST GET request for the object.
Generating the .torrent for an object takes time proportional to the size of that object, so its recommended to make a first torrent request yourself to generate the file so that subsequent requests are faster.
Torrent are enabled only for objects that are less than 5 GB in size.
Torrent subresource can only be retrieve, and cannot be created, updated or deleted.
Object ACL
Go to the section about ACL.
Amazon S3 storage classes are designed to sustain the concurrent loss of data in one or two facilities.
S3 storage classes allows lifecycle management for automatic migration of objects for cost savings.
S3 storage classes support SSL encryption of data in transit and data encryption at rest.
S3 also regularly verifies the integrity of your data using checksums and provides auto healing capability.
Standard
Storage class is ideal for performance-sensitive use cases and frequently accessed data and is designed to sustain the loss of data in a two facilities.
STANDARD is the default storage class, if none specified during upload.
Low latency and high throughput performance.
Designed for durability of 99.999999999% i.e. 11 9’s of objects
Designed for 99.99% availability over a given year
Backed with the Amazon S3 Service Level Agreement for availability.
Intelligent Tiering
INTELLIGENT_TIERING storage class is designed to optimize storage costs by automatically moving data to the most cost-effective storage access tier, without performance impact or operational overhead.
Delivers automatic cost savings by moving data on a granular object level between two access tiers, a frequent access tier and a lower-cost infrequent access tier, when access patterns change.
Ideal to optimize storage costs automatically for long-lived data when access patterns are unknown or unpredictable.
For a small monthly monitoring and automation fee per object, S3 monitors access patterns of the objects in the INTELLIGENT_TIERING storage class and moves objects that have not been accessed for 30 consecutive days to the infrequent access tier.
There are no retrieval fees when using the INTELLIGENT_TIERING storage class. If an object in the infrequent access tier is accessed, it is automatically moved back to the frequent access tier.
No additional tiering fees apply when objects are moved between access tiers within the INTELLIGENT_TIERING storage class.
Suitable for larger objects greater than 128 KB (smaller objects are charged for 128 KB only) kept for at least 30 days (charged for minimum 30 days)
Standard IA
S3 STANDARD_IA (Infrequent Access) storage class is optimized for long-lived and less frequently accessed data. for e.g. for backups and older data where access is limited, but the use case still demands high performance
Ideal for use for the primary or only copy of data that can’t be recreated.
STANDARD_IA has data stored redundantly across multiple geographically separated AZs and are resilient to the loss of an Availability Zone.
STANDARD_IA storage class offers greater availability and resiliency than the ONEZONE_IA class.
STANDARD_IA objects are available for real-time access.
STANDARD_IA storage class is suitable for larger objects greater than 128 KB (smaller objects are charged for 128 KB only) kept for at least 30 days (charged for minimum 30 days)
Same low latency and high throughput performance of Standard.
Designed for durability of 99.999999999% of objects
Designed for 99.9% availability over a given year
S3 charges a retrieval fee for these objects, so they are most suitable for infrequently accessed data.
Backed with the Amazon S3 Service Level Agreement for availability.
OneZone IA
ONEZONE_IA storage classes are designed for long-lived and infrequently accessed data, but available for millisecond access (similar to the STANDARD and STANDARD_IA storage class).
Ideal when the data can be recreated if the AZ fails, and for object replicas when setting cross-region replication (CRR).
Objects are available for real-time access.
Suitable for larger objects greater than 128 KB (smaller objects are charged for 128 KB only) kept for at least 30 days (charged for minimum 30 days)
Stores the object data in only one AZ, which makes it less expensive than STANDARD_IA.
ONEZONE_IA data is not resilient to the physical loss of the AZ.
It's as durable as STANDARD_IA, but it is less available and less resilient.
Designed for durability of 99.999999999% of objects
Designed for 99.5% availability over a given year
S3 charges a retrieval fee for these objects, so they are most suitable for infrequently accessed data.
Glacier
GLACIER storage class is suitable for low cost data archiving where data access is infrequent and retrieval time of minutes to hours is acceptable.
Glacier redundantly stores data in multiple facilities and on multiple devices within each facility, before returning SUCCESS on uploading archives.
Glacier performs regular, systematic data integrity checks and is built to be automatically self-healing.
Glacier is a great storage choice when low storage cost is paramount, with data rarely retrieved.
Has a minimum storage duration period of 90 days and can be accessed in as little as 1-5 minutes using expedited retrieval.
Glacier now offers a range of data retrievals options where the retrieval time varies from hours to 1-5 minutes.
GLACIER storage class uses the very low-cost Glacier storage service, but the objects in this storage class are still managed through S3.
Glacier can store virtually any kind of data in any format.
All data is encrypted using AES-256.
With Glacier Select, you can perform filtering directly (CSV,GZIP) against a Glacier object using standard SQL statements.
Glacier allows interaction through AWS Management Console, Command Line Interface CLI and SDKs or REST based APIs.
The management console can only be used to create and delete vaults. Rest of the operations to upload, download data, create jobs for retrieval need CLI, SDK or REST based APIs
GLACIER cannot be specified as the storage class at the object creation time but has to be transitioned from STANDARD, RRS, or STANDARD_IA to GLACIER storage class using lifecycle management.
For accessing GLACIER objects,
object must be restored which can taken anywhere between minutes to hours
objects are only available for the time period (number of days) specified during the restoration request
object’s storage class remains GLACIER
charges are levied for both the archive (GLACIER rate) and the copy restored temporarily (RRS rate)
Vault Lock feature enforces compliance via a lockable policy.
GLACIER offers the same durability and resiliency as the STANDARD storage class
Designed for durability of 99.999999999% of objects
Designed for 99.99% availability over a given year
Use cases include:
Digital media archives
Data that must be retained for regulatory compliance
Financial and healthcare records
Raw genomic sequence data
Long-term database backups
Vault:
A vault is a container for storing archives
Each vault resource has a unique address, which comprises of the region the vault was created and the unique vault name within the region and account for e.g. https://glacier.us-west-2.amazonaws.com/111122223333/vaults/examplevault
Vault allows storage of unlimited number of archives
Glacier supports various vault operations which are region specific
An AWS account can create up to 1,000 vaults per region.
Archive
An archive can be any data such as a photo, video, or document and is a base unit of storage in Glacier.
Each archive has a unique ID and an optional description, which can only be specified during the upload of an archive.
Glacier assigns the archive an ID, which is unique in the AWS region in which it is stored.
Archive can be uploaded in a single request. While for large archives, Glacier provides a multipart upload API that enables uploading an archive in parts.
Job
A Job is required to retrieve an Archive and vault inventory list
Data retrieval requests are asynchronous operations, are queued and most jobs take about four hours to complete.
A job is first initiated and then the output of the job is downloaded after the job is completes.
Vault inventory jobs needs the vault name.
Data retrieval jobs needs both the vault name and the archive id, with an optional description.
A vault can have multiple jobs in progress at any point in time and can be identified by Job ID, assigned when is it created for tracking.
Glacier maintains job information such as job type, description, creation date, completion date, and job status and can be queried.
After the job completes, the job output can be downloaded in full or partially by specifying a byte range.
Notification Configuration
As the jobs are asynchronous, Glacier supports notification mechanism to a SNS topic when job completes
SNS topic for notification can either be specified with each individual job request or with the vault.
Glacier stores the notification configuration as a JSON document.
Standard retrievals
Standard retrievals allow access to any of the archives within several hours.
Standard retrievals typically complete within 3-5 hours.
Bulk retrievals
Bulk retrievals are Glacier’s lowest-cost retrieval option, enabling retrieval of large amounts, even petabytes, of data inexpensively in a day.
Bulk retrievals typically complete within 5 – 12 hours.
Expedited Retrievals
Expedited retrievals allows quick access to the data when occasional urgent requests for a subset of archives are required.
For all but the largest archives (250MB+), data accessed using Expedited retrievals are typically made available within 1 – 5 minutes.
There are two types of Expedited retrievals: On-Demand (available the vast majority of the time )and Provisioned (guaranteed to be available when needed).
Vault Operations
Glacier provides operations to create and delete vaults.
A vault can be deleted only if there are no archives in the vault as of the last computed inventory and there have been no writes to the vault since the last inventory (as the inventory is prepared periodically).
Vault Inventory Operations
Vault inventory helps retrieve list of archives in a vault with information such as archive ID, creation date, and size for each archive.
Inventory for each vault is prepared periodically, every 24 hours.
Vault inventory is updated approximately once a day, starting on the day the first archive is uploaded to the vault.
When a vault inventory job is, Glacier returns the last inventory it generated, which is a point-in-time snapshot and not real-time data.
Vault Metadata Operations
Vault Metadata or Description can also be obtained for a specific vault or for all vaults in a region, which provides information such as creation date, number of archives in the vault, total size in bytes used by all the archives in the vault, and the date the vault inventory was generated.
Notification Operations
Glacier also provides operations to set, retrieve, and delete a notification configuration on the vault. Notifications can be used to identify vault events.
Archive Operations
Glacier provides operations to upload, download and delete archives.
Uploading an Archive
An archive can be uploaded in a single operation (1 byte to up to 4 GB in size ) or in parts referred as Multipart upload (40 TB)
Multipart Upload helps to improve the upload experience for larger archives, upload archives in parts, independently, parallely and in any order, faster recovery by needing to upload only the part that failed upload and not the entire archive, upload archives without even knowing the size, upload archives from 1 byte to about 40,000 GB (10,000 parts * 4 GB) in size.
To upload existing data to Glacier, consider using the AWS Import/Export service, which accelerates moving large amounts of data into and out of AWS using portable storage devices for transport. AWS transfers the data directly onto and off of storage devices using Amazon’s high-speed internal network, bypassing the Internet.
Glacier returns a response that includes an archive ID which is unique in the region in which the archive is stored.
Glacier does not support any additional metadata information apart from an optional description. Any additional metadata information required should be maintained at client side
Downloading an Archive
Downloading an archive is an asynchronous operation and is the 2 step process.
1: Initiate an archive retrieval job:
When a Job is initiated, a job ID is returned as a part of the response
Job is executed asynchronously and the output can be downloaded after the job completes.
Job can be initiated to download the entire archive or a portion of the archive.
2: After the job completes, download the bytes
Archive can downloaded as all the bytes or specific byte range to download only a portion of the output.
Downloading the archive in chunks helps in the event of the download failure, as only that part needs to be downloaded
Job completion status can be checked by:
Check status explicitly (Not Recommended): periodically poll the describe job operation request to obtain job information.
Completion notification: An SNS topic can be specified, when the job is initiated or with the vault, to be used to notify job completion
About Range Retrievals
Amazon Glacier allows retrieving an archive either in whole (default) or a range, or portion.
Range retrievals need a range to be provided that is megabyte aligned.
Glacier returns checksum in the response which can be used to verify if any errors in download by comparing with checksum computed on the client side.
Specifying a range of bytes can be helpful when:
Control bandwidth costs : Glacier allows retrieval of up to 5 percent of the average monthly storage (pro-rated daily) for free each month;
We can use ranges and only manage the monthly free allowance of 5 percent by spreading out the data requested.
if the amount of data retrieved doesn’t meet the free allowance percentage, scheduling range retrievals enables reduction of peak retrieval rate, which determines the retrieval fees.
Manage your data downloads:
Glacier allows retrieved data to be downloaded for 24 hours after the retrieval request completes.
Only portions of the archive can be retrieved so that the schedule of downloads can be managed within the given download window.
Retrieve a targeted part of a large archive
Retrieving an archive in range can be useful if an archive is uploaded as an aggregate of multiple individual files, and only few files need to be retrieved.
Deleting an Archive
An archive can be deleted from the vault only one at a time
This operation is idempotent. Deleting an already-deleted archive does not result in an error.
AWS applies pro-rated charge for items that are deleted prior to 90 days, as it is meant for long term storage.
Updating an Archive
An existing archive cannot be updated and must be deleted and re-uploaded, which would be assigned a new archive id.
AWS Vault Lock
Amazon Glacier Vault Lock allows you to easily deploy and enforce compliance controls on individual Glacier vaults via a lockable policy. You can specify controls such as “Write Once Read Many” (WORM) in a Vault Lock policy and lock the policy from future edits. Once locked, the policy becomes immutable and Amazon Glacier will enforce the prescribed controls to help achieve your compliance objectives.
Reduced Redundancy Storage – RRS
NOTE – AWS recommends not to use this storage class. The STANDARD storage class is more cost effective.
Designed for noncritical, reproducible data stored at lower levels of redundancy than the STANDARD storage class, which reduces storage costs.
Designed for durability of 99.99% of objects.
Designed for 99.99% availability over a given year.
Lower level of redundancy results in less durability and availability.
RRS stores objects on multiple devices across multiple facilities, providing 400 times the durability of a typical disk drive.
RRS does not replicate objects as many times as S3 standard storage and is designed to sustain the loss of data in a single facility.
If an RRS object is lost, S3 returns a 405 error on requests made to that object.
S3 can send an event notification, configured on the bucket, to alert a user or start a workflow when it detects that an RRS object is lost which can be used to replace the lost object.
DEEP_ARCHIVE
DEEP_ARCHIVE storage class is suitable for low cost data archiving where data access is infrequent and retrieval time of hours is acceptable.
It has a minimum storage duration period of 180 days and can be accessed in at a default retrieval time of 12 hours.
DEEP_ARCHIVE is the lowest cost storage option in AWS. Storage costs for DEEP_ARCHIVE are less expensive than using the GLACIER storage class.
DEEP_ARCHIVE retrieval costs can be reduced by using bulk retrieval, which returns data within 48 hours.
S3 Object Versioning can be used to protect from unintended overwrites and deletions.
Versioning helps to keep multiple variants of an object in the same bucket and can be used to preserve, retrieve, and restore every version of every object stored in your Amazon S3 bucket.
As Versioning maintains multiple copies of the same objects as whole and you accrue charges for multiple versions for e.g. for a 1GB file with 5 copies with minor differences would consume 5GB of S3 storage space and you would be charged for the same.
Versioning is not enabled by default and has to be explicitly enabled for each bucket.
Versioning once enabled, cannot be disabled and can only be suspended.
Versioning enabled on a bucket applies to all the objects within the bucket.
Permissions are set at the version level. Each version has its own object owner; an AWS account that creates the object version is the owner. So, you can set different permissions for different versions of the same object.
Irrespective of the Versioning, each object in the bucket has a version.
For Non Versioned bucket, the version ID for each object is null
For Versioned buckets, a unique version ID is assigned to each object
With Versioning, version ID forms a key element to define uniqueness of an object within an bucket along with the bucket name and object key.
Object Retrieval
For Non Versioned bucket
An Object retrieval always return the only object available
For Versional bucket
An object retrieval returns the Current object.
Non Current object can be retrieved by specifying the version ID.
Object Addition
For Non Versioned bucket
If an object with the same key is uploaded again it overwrites the object
For Versioned bucket
If an object with the same key is uploaded the new uploaded object becomes the Current version and the previous object becomes the Non current version.
A non current versioned object can be retrieved and restored hence protecting against accidental overwrites
When an object in a bucket is deleted
For Non Versioned bucket
An object is permanently deleted and cannot be recovered.
For Versioned bucket,
All versions remain in the bucket and Amazon inserts a delete marker which becomes the Current version.
A non current versioned object can be retrieved and restored hence protecting against accidental overwrites
If a Object with a specific version ID is deleted, a permanent deletion happens and the object cannot be recovered.
Delete marker
Delete Marker object does not have any data or acl associated with it, just the key and the version ID
An object retrieval on a bucket with delete marker as the Current version would return a 404
Only a DELETE operation is allowed on the Delete Marker object
If the Delete marker object is deleted by specifying its version ID, the previous non current version object becomes the current version object
If a DELETE request is fired on the Bucket with Delete Marker as the current version, the Delete marker object is not deleted but an Delete Marker is added again.
Restoring Previous Versions
Copy a previous version of the object into the same bucket. Copied object becomes the current version of that object and all object versions are preserved – Recommended as you still keep all the versions.
Permanently delete the current version of the object. When you delete the current object version, you, in effect, turn the previous version into the current version of that object.
Versioning Suspended Bucket
Existing objects in your bucket do not change and only future requests behavior changes.
For each new object addition, a object with version ID null is added.
For each object addition with the same key name, the object with the version ID null is overwritten.
An object retrieval request will always return the current version of the object.
A DELETE request on the bucket, would permanently delete the version ID null object and inserts a Delete Marker.
A DELETE request does not delete anything if the bucket does not have an object with version ID null.
A DELETE request can still be fired with a specific version ID for any previous object with version IDs stored.
MFA Delete
MFA Delete can be enabled on a bucket to ensure that data in your bucket cannot be accidentally deleted
While the bucket owner, the AWS account that created the bucket (root account), and all authorized IAM users can enable versioning, but only the bucket owner (root account) can enable MFA delete.
S3 Object lifecycle can be managed by using a lifecycle configuration, which defines how S3 manages objects during their lifetime, for e.g. moving of less frequently access objects, backup or archival of data for several years or permanent deletion of objects, all transitions can be controlled automatically
1000 lifecycle rules can be configured per bucket.
S3 Object Lifecycle Management rules applied to an bucket are applicable to all the existing objects in the bucket as well as the ones that will be added anew.
S3 Object lifecycle management allows 2 types of behavior
Transition in which the storage class for the objects change
Expiration where the objects are permanently deleted
Lifecycle Management can be configured with Versioning.
Object’s lifecycle management applies to both Non Versioning and Versioning enabled buckets.
For Non Versioned buckets: Transitioning period is considered from the object’s creation date.
For Versioned buckets,
Transitioning period for current object is calculated for the object creation date.
Transitioning period for non current object is calculated for the date when the object became a noncurrent versioned object.
S3 uses the number of days since its successor was created as the number of days an object is noncurrent.
S3 calculates the time by adding the number of days specified in the rule to the object creation time and rounding the resulting time to the next day midnight UTC. For e.g., if an object was created at 15/1/2016 10:30 AM UTC and you specify 3 days in a transition rule, which results in 18/1/2016 10:30 AM UTC and rounded of to next day midnight time 19/1/2016 00:00 UTC.
Lifecycle configuration on MFA-enabled buckets is not supported.
STANDARD or REDUCED_REDUNDANCY -> (128 KB & 30 days) -> STANDARD_IA
Only objects with size more than 128 KB can be transitioned, as cost benefits for transitioning to STANDARD_IA can be realized only for larger objects.
Objects must be stored for at least 30 days in the current storage class before being transitioned to the STANDARD_IA, as younger objects are accessed more frequently or deleted sooner than is suitable for STANDARD_IA.
STANDARD_IA -> X -> STANDARD or REDUCED_REDUNDANCY
Cannot transition
STANDARD or REDUCED_REDUNDANCY or STANDARD_IA -> GLACIER
Any Storage class can be transitioned to GLACIER
STANDARD or REDUCED_REDUNDANCY -> (1 day) -> GLACIER
Can be done in a day
STANDARD_IA -> (30 days) -> GLACIER
Transitioning from Standard IA to Glacier can be done only after 30 days or 60 days from the object creation date or non current version date.
GLACIER-> X -> STANDARD or REDUCED_REDUNDANCY or STANDARD_IA
Cannot transition
GLACIER -> (90 days) -> Permanent Deletion
Deleting data that is archived to Glacier is free, if the objects you delete are archived for three months or longer.
Amazon S3 charges a prorated early deletion fee, if the object is deleted or overwritten within three months of archiving it.
STANDARD or STANDARD_IA or GLACIER -> X-> REDUCED_REDUNDANCY
Cannot transition
Archival of objects to Amazon Glacier by using object lifecycle management is performed asynchronously and there may be a delay between the transition date in the lifecycle configuration rule and the date of the physical transition. However, AWS charges Amazon Glacier prices based on the transition date specified in the rule.
For a versioning-enabled bucket
Transition and Expiration actions apply to current versions.
NoncurrentVersionTransition and NoncurrentVersionExpiration actions apply to noncurrent versions and works similar to the non versioned objects except the time period is from the time the objects became noncurrent.
Expiration Rules
For Non Versioned bucket
Object is permanently deleted
For Versioned bucket
Expiration is applicable to the Current object only and does not impact any of the non current objects.
S3 will insert a Delete Marker object with unique id and the previous current object becomes a non current version.
S3 will not take any action if the Current object is a Delete Marker
If the bucket has a single object which is the Delete Marker (referred to as expired object delete marker), S3 removes the Delete Marker.
For Versioned Suspended bucket
S3 will insert a Delete Marker object with version ID null and overwrite the any object with version ID null.
When an object reaches the end of its lifetime, Amazon S3 queues it for removal and removes it asynchronously. There may be a delay between the expiration date and the date at which S3 removes an object.You are not charged for storage time associated with an object that has expired.
There are additional cost considerations if you put lifecycle policy to expire objects that have been in STANDARD_IA for less than 30 days, or GLACIER for less than 90 days.
By default, all S3 buckets, objects and related subresources are private
User is the AWS Account or the IAM user who access the resource.
Bucket owner is the AWS account that created a bucket
Object owner is the AWS account that uploads the object to a bucket, not owned by the account.
Only the Resource owner, the AWS account that creates the resource, can access the resource.
Resource owner can be
AWS account that creates the bucket or object owns those resources
If an IAM user creates the bucket or object, the AWS account of the IAM user owns the resource.
If the bucket owner grants cross-account permissions to other AWS account users to upload objects to the buckets, the objects are owned by the AWS account of the user who uploaded the object and not the bucket owner except for the following conditions
Bucket owner can deny access to the object, as its still the bucket owner who pays for the object.
Bucket owner can delete or apply archival rules to the object and perform restoration.
S3 permissions are classified into Resource based policies and User policies
MFA Access
Add MFA-related conditions to your bucket policy that require users from other AWS accounts to authenticate using an MFA device.
Example:
{
"Version": "2012-10-17",
"Id": "Policy201612130001aa",
"Statement": [
{
"Sid": "Stmt201612130001ab",
"Effect": "Deny",
"Principal": {
"AWS": "arn:aws:iam::111122223333:root"
},
"Action": [
"s3:PutObject",
"s3:PutObjectAcl",
"s3:DeleteObject"
],
"Resource": "arn:aws:s3:::example.accounta.bucket/*",
"Condition": {
"BoolIfExists": {
"aws:MultiFactorAuthPresent": "false"
}
}
},
...
User based policies use IAM with S3 to control the type of access a user or group of users has to specific parts of an S3 bucket the AWS account owns.
User based policy is always attached to an User, Group or a Role, anonymous permissions cannot be granted.
If an AWS account that owns a bucket wants to grant permission to users in its account, it can use either a bucket policy or a user policy.
Bucket policies and access control lists (ACLs) are resource-based because they are attached to the Amazon S3 resources.
Bucket Policies
Bucket policy can be used to grant cross-account access to other AWS accounts or IAM users in other accounts for the bucket and objects in it.
Bucket policies provide centralized, access control to buckets and objects based on a variety of conditions, including S3 operations, requesters, resources, and aspects of the request (e.g. IP address)
It can define access to specific S3 buckets or objects, grant access across AWS accounts, and allow or block access based on conditions.
If an AWS account that owns a bucket wants to grant permission to users in its account, it can use either a bucket policy or a user policy.
Permissions attached to a bucket apply to all of the objects in that bucket created and owned by the bucket owner.
Policies can either add or deny permissions across all (or a subset) of objects within a bucket.
Only the bucket owner is allowed to associate a policy with a bucket.
Access Control Lists (ACLs) (Legacy)
Each bucket and object has an ACL associated with it.
An ACL is a list of grants identifying grantee and permission granted
ACLs are used to grant basic read/write permissions on resources to other AWS accounts.
ACL supports limited permissions set and
cannot grant conditional permissions, nor can you explicitly deny permissions
cannot be used to grant permissions for bucket subresources
Permission can be granted to an AWS account by the email address or the canonical user ID (is just an obfuscated Account Id). If an email address is provided, S3 will still find the canonical user ID for the user and add it to the ACL.
It is Recommended to use Canonical user ID as email address would not be supported.
Bucket ACL
Only recommended use case for the bucket ACL is to grant write permission to S3 Log Delivery group to write access log objects to the bucket.
Only way you can grant necessary permissions to the Log Delivery group is via a bucket ACL.
Object ACL
Object ACLs control only Object-level Permissions
Object ACL is the only way to manage permission to an object in the bucket not owned by the bucket owner i.e. If the bucket owner allows cross-account object uploads and if the object owner is different from the bucket owner, the only way for the object owner to grant permissions on the object is through Object ACL.
If the Bucket and Object is owned by the same AWS account, Bucket policy can be used to manage the permissions.
If the Object and User is owned by the same AWS account, User policy can be used to manage the permissions.
S3 evaluates the policies in 3 context
User context is basically the context in which S3 evaluates the User policy that the parent AWS account (context authority) attaches to the user
Bucket context is the context in which S3 evaluates the access policies owned by the bucket owner (context authority) to check if the bucket owner has not explicitly denied access to the resource
Object context is the context where S3 evaluates policies owned by the Object owner (context authority)
Analogy
Consider 3 Parents (AWS Account) A, B and C with Child (IAM User) AA, BA and CA respectively
Parent A owns a Toy box (Bucket) with Toy AAA and also allows toys (Objects) to be dropped and picked up
Parent A can grant permission (User Policy OR Bucket policy OR both) to his Child AA to access the Toy box and the toys
Parent A can grant permissions (Bucket policy) to Parent B (different AWS account) to drop toys into the toys box.
Parent B can grant permissions (User policy) to his Child BA to drop Toy BAA
Parent B can grant permissions (Object ACL) to Parent A to access Toy BAA.
Parent A can grant permissions (Bucket Policy) to Parent C to pick up the Toy AAA who in turn can grant permission (User Policy) to his Child CA to access the toy.
Parent A can grant permission (through IAM Role) to Parent C to pick up the Toy BAA who in turn can grant permission (User Policy) to his Child CA to access the toy.
Bucket Operation Authorization
If the requester is an IAM user, the user must have permission (User Policy) from the parent AWS account to which it belongs
Amazon S3 evaluates a subset of policies owned by the parent account. This subset of policies includes the user policy that the parent account attaches to the user.
If the parent also owns the resource in the request (in this case, the bucket), Amazon S3 also evaluates the corresponding resource policies (bucket policy and bucket ACL) at the same time.
Requester must also have permissions (Bucket Policy or ACL) from the bucket owner to perform a specific bucket operation.
Amazon S3 evaluates a subset of policies owned by the AWS account that owns the bucket. The bucket owner can grant permission by using a bucket policy or bucket ACL.
Note that, if the AWS account that owns the bucket is also the parent account of an IAM user, then it can configure bucket permissions in a user policy or bucket policy or both.
Object Operation Authorization
If the requester is an IAM user, the user must have permission (User Policy) from the parent AWS account to which it belongs.
Amazon S3 evaluates a subset of policies owned by the parent account. This subset of policies includes the user policy that the parent attaches to the user.
If the parent also owns the resource in the request (bucket, object), Amazon S3 evaluates the corresponding resource policies (bucket policy, bucket ACL, and object ACL) at the same time.
If the parent AWS account owns the resource (bucket or object), it can grant resource permissions to its IAM user by using either the user policy or the resource policy.
S3 evaluates policies owned by the AWS account that owns the bucket.
If the AWS account that owns the object in the request is not same as the bucket owner, in the bucket context Amazon S3 checks the policies if the bucket owner has explicitly denied access to the object.
If there is an explicit deny set on the object, Amazon S3 does not authorize the request.
Requester must have permissions from the object owner (Object ACL) to perform a specific object operation.
Amazon S3 evaluates the object ACL.
If bucket and object owners are the same, access to the object can be granted in the bucket policy, which is evaluated at the bucket context.
If the owners are different, the object owners must use an object ACL to grant permissions.
If the AWS account that owns the object is also the parent account to which the IAM user belongs, it can configure object permissions in a user policy, which is evaluated at the user context.
Permission Delegation
If an AWS account owns a resource, it can grant those permissions to another AWS account.
That account can then delegate those permissions, or a subset of them, to users in the account. This is referred to as permission delegation.
But an account that receives permissions from another account cannot delegate permission cross-account to another AWS account.
If the Bucket owner wants to grant permission to the Object which does not belong to it to an other AWS account it cannot do it through cross-account permissions and need to define a IAM role which can be assumed by the AWS account to gain access
Objects are redundantly stored on multiple devices across multiple facilities in an S3 region.
Amazon S3 PUT and PUT Object copy operations synchronously store the data across multiple facilities before returning SUCCESS.
Once the objects are stored, S3 maintains its durability by quickly detecting and repairing any lost redundancy.
S3 also regularly verifies the integrity of data stored using checksums. If Amazon S3 detects data corruption, it is repaired using redundant data.
In addition, S3 calculates checksums on all network traffic to detect corruption of data packets when storing or retrieving data.
Data protection against accidental overwrites and deletions can be added by enabling Versioning to preserve, retrieve and restore every version of the object stored
S3 also provides the ability to protect data in-transit (as it travels to and from S3) and at rest (while it is stored in S3).
Data in-transit
S3 allows protection of data in-transit by enabling communication via SSL or using client-side encryption
Data at Rest
S3 supports both client side encryption and server side encryption for protecting data at rest
Using Server-Side Encryption, S3 encrypts the object before saving it on disks in its data centers and decrypt it when the objects are downloaded
Using Client-Side Encryption, you can encrypt data client-side and upload the encrypted data to S3. In this case, you manage the encryption process, the encryption keys, and related tools.
Server-side encryption is about data encryption at rest
Server-side encryption encrypts only the object data. Any object metadata is not encrypted.
S3 handles the encryption (as it writes to disks) and decryption (when you access the objects) of the data objects.
There is no difference in the access mechanism for both encrypted or unencrypted objects and is handled transparently by S3
Server-Side Encryption with Amazon S3-Managed Keys (SSE-S3)
Each object is encrypted with a unique data key employing strong multi-factor encryption.
SSE-S3 encrypts the data key with a master key that is regularly rotated.
S3 server-side encryption uses one of the strongest block ciphers available , 256-bit Advanced Encryption Standard (AES-256), to encrypt the data.
Whether or not objects are encrypted with SSE-S3 can’t be enforced when they are uploaded using pre-signed URLs, because the only way you can specify server-side encryption is through the AWS Management Console or through an HTTP request header.
Server-Side Encryption with AWS KMS-Managed Keys (SSE-KMS)
SSE-KMS is similar to SSE-S3, but it uses AWS Key management Services (KMS) which provides additional benefits along with additional charges:
KMS is a service that combines secure, highly available hardware and software to provide a key management system scaled for the cloud.
KMS uses customer master keys (CMKs) to encrypt the S3 objects.
Master key is never made available.
KMS enables you to centrally create encryption keys, define the policies that control how keys can be used.
Allows audit use of key usage to prove they are being used correctly, by inspecting logs in AWS CloudTrail.
Allows keys to temporarily disabled and re-enabled
Allows keys to be rotated regularly.
Security controls in AWS KMS can help meet encryption-related compliance requirements.
SSE-KMS enables separate permissions for the use of an envelope key (that is, a key that protects the data’s encryption key) that provides added protection against unauthorized access of the objects in S3.
SSE-KMS provides the option to create and manage encryption keys yourself, or use a default customer master key (CMK) that is unique to you, the service you’re using, and the region you’re working in.
Creating and Managing your own CMK gives you more flexibility, including the ability to create, rotate, disable, and define access controls, and to audit the encryption keys used to protect your data.
Customer managed CMK will generate plain text Data Key & encrypted Data Keys. All project-related sensitive documents will be encrypted using these plain text Data Keys. After encryption, plain text Data keys need to be deleted to avoid any inappropriate use and encrypted Data Keys and encrypted data stored in S3 buckets.
Data keys used to encrypt your data are also encrypted and stored alongside the data they protect and are unique to each object.
Process flow
An application or AWS service client requests an encryption key to encrypt data and passes a reference to a master key under the account.
Client requests are authenticated based on whether they have access to use the master key.
A new data encryption key is created, and a copy of it is encrypted under the master key.
Both the data key and encrypted data key are returned to the client.
Data key is used to encrypt customer data and then deleted as soon as is practical.
Encrypted data key is stored for later use and sent back to AWS KMS when the source data needs to be decrypted.
Server-Side Encryption with Customer-Provided Keys (SSE-C)
Encryption keys can be managed and provided by the Customer and S3 manages the encryption, as it writes to disks, and decryption, when you access the objects.
When you upload an object, the encryption key is provided as a part of the request and S3 uses that encryption key to apply AES-256 encryption to the data and removes the encryption key from memory.
When you download an object, the same encryption key should be provided as a part of the request. S3 first verifies the encryption key and if matches decrypts the object before returning back to you
As each object and each object’s version can be encrypted with a different key, you are responsible for maintaining the mapping between the object and the encryption key used.
SSE-C request must be done through HTTPS and S3 will reject any requests made over http when using SSE-C.
For security considerations, AWS recommends to consider any key sent erroneously using http to be compromised and discarded or rotated.
S3 does not store the encryption key provided. Instead, it stores a randomly salted HMAC value of the encryption key which can be used to validate future requests. The salted HMAC value cannot be used to derive the value of the encryption key or to decrypt the contents of the encrypted object. That means, if you lose the encryption key, you lose the object.
Encryption master keys are completely maintained at Client-side
Uploading Object
Amazon S3 encryption client ( for e.g. AmazonS3EncryptionClient in the AWS SDK for Java) locally generates randomly a one-time-use symmetric key (also known as a data encryption key or data key).
Client encrypts the data encryption key using the customer provided master key.
Client uses this data encryption key to encrypt the data of a single S3 object (for each object, the client generates a separate data key).
Client then uploads the encrypted data to Amazon S3 and also saves the encrypted data key and itsmaterial description as object metadata (x-amz-meta-x-amz-key) in Amazon S3 by default
Downloading Object
Client first downloads the encrypted object from Amazon S3 along with the object metadata.
Using the material description in the metadata, the client first determines which master key to use to decrypt the encrypted data key.
Using that master key, the client decrypts the data key and uses it to decrypt the object
Client-side master keys and your unencrypted data are never sent to AWS.
If the master key is lost the data cannot be decrypted.
If you need that the new VPC endpoint is only used to communicate with this specific S3 Bucket. On the other hand, the S3 bucket allows the read/write operations to come from this VPC endpoint.
Use a VPC Endpoint policy for Amazon S3 to restrict access to the S3 Bucket “my-bucket” so that the VPC Endpoint is only allowed to perform S3 actions on “my-bucket”.
For the S3 bucket “my-bucket”, use an S3 bucket policy that denies all actions if the source VPC Endpoint is not equal to the endpoint ID that is created.
A VPC Endpoint policy is needed.
{
"Statement": [
{
"Sid": "Access-to-my-bucket-only",
"Principal": "*",
"Action": [
"s3:GetObject",
"s3:PutObject"
],
"Effect": "Allow",
"Resource": ["arn:aws:s3:::my-bucket",
"arn:aws:s3:::my-bucket/*"]
}
]
}
S3 Bucket policy is required.
{
"Version": "2012-10-17",
"Id": "Policy1415115909152",
"Statement": [
{
"Sid": "Access-to-specific-VPCE-only",
"Principal": "*",
"Action": "s3:*",
"Effect": "Deny",
"Resource": ["arn:aws:s3:::my-bucket",
"arn:aws:s3:::my-bucket/*"],
"Condition": {
"StringNotEquals": {
"aws:sourceVpce": "vpce-1a2b3c4d"
}
}
}
]
}
S3 Batch Operations performs large-scale operations on Amazon S3 objects.
Batch operations are performed from the AWS Management Console or though the API.
You can label and control access to your S3 Batch Operations jobs.
The objects for the batch operations, are specified through an Amazon S3 inventory report or a custom CSV file.
These files, known as manifest files, contain a list of object keys that you want Amazon S3 to act on.
Each row in the file includes the bucket name, object key, and (optionally) the object version.
Version IDs must be included for all objects or omited for all objects.
Object keys must be URL-encoded.
S3 Batch Operations supports the following operations:
Put object copy
Initiate restore object
Put object ACL
Put object tagging
Manage Object Lock retention dates
Manage Object Lock legal hold
Run a custom Lambda operation
Multiple Concurrent PUTs/GETs
S3 scales to support very high request rates. If the request rate grows steadily, S3 automatically partitions the buckets as needed to support higher request rates.
S3 can achieve at least 3,500 PUT/COPY/POST/DELETE and 5,500 GET/HEAD requests per second per prefix in a bucket.
Workloads that are GET-intensive
Cloudfront can be used for performance optimization and can help by
distributing content with low latency and high data transfer rate.
caching the content and thereby reducing the number of direct requests to S3
providing multiple endpoints (Edge locations) for data availability
available in two flavors as Web distribution or RTMP distribution
To fast data transport over long distances between a client and an S3 bucket, use Amazon S3 Transfer Acceleration. Transfer Acceleration uses the globally distributed edge locations in CloudFront to accelerate data transport over geographical distances.
PUTs/GETs for Large Objects
AWS allows Parallelizing the PUTs/GETs request to improve the upload and download performance as well as the ability to recover in case it fails.
For PUTs, Multipart upload can help improve the uploads by
performing multiple uploads at the same time and maximizing network bandwidth utilization.
quick recovery from failures, as only the part that failed to upload needs to be re-uploaded
ability to pause and resume uploads
begin an upload before the Object size is known
For GETs, range http header can help to improve the downloads by
allowing the object to be retrieved in parts instead of the whole object
quick recovery from failures, as only the part that failed to download needs to be retried.
List Operations
Object key names are stored lexicographically in Amazon S3 indexes, making it hard to sort and manipulate the contents of LIST
S3 maintains a single lexicographically sorted list of indexes
Build and maintain Secondary Index outside of S3 for e.g. DynamoDB or RDS to store, index and query objects metadata rather then performing operations on S3.
Security
Use Versioning
can be used to protect from unintended overwrites and deletions
allows the ability to retrieve and restore deleted objects or rollback to previous versions
Enable additional security by configuring a bucket to enable MFA (Multi-Factor Authentication) delete.
Versioning does not prevent Bucket deletion and must be backed up, as if accidentally or maliciously deleted the data is lost.
Use Cross Region replication feature to backup data to a different region.
When using VPC with S3, use VPC S3 endpoints as
are horizontally scaled, redundant, and highly available VPC components
help establish a private connection between VPC and S3 and the traffic never leaves the Amazon network.
Cost
Optimize S3 storage cost by selecting an appropriate storage class for objects.
Configure appropriate lifecycle management rules to move objects to different storage classes and expire them.
Tracking
Use Event Notifications to be notified for any put or delete request on the S3 objects
Use CloudTrail, which helps capture specific API calls made to S3 from the AWS account and delivers the log files to an S3 bucket.
Use CloudWatch to monitor the Amazon S3 buckets, tracking metrics such as object counts and bytes stored and configure appropriate actions.
Storing Large Items
If your application needs to store more data in an item than the DynamoDB size limit permits (400 KB), you can try compressing one or more large attributes or breaking the item into multiple items (efficiently indexed by sort keys).
You can also store the item as an object in Amazon S3 and store the Amazon S3 object identifier in your DynamoDB item.
If you notice a significant increase in the number of HTTP 503-slow down responses received for Amazon S3 PUT or DELETE object requests to a bucket that has versioning enabled, you might have one or more objects in the bucket for which there are millions of versions.
When you have objects with millions of versions, Amazon S3 automatically throttles requests to the bucket to protect the customer from an excessive amount of request traffic, which could potentially impede other requests made to the same bucket.
To determine which S3 objects have millions of versions, use the Amazon S3 Inventory tool. The inventory tool generates a report that provides a flat file list of the objects in a bucket.