Created in a region (by default US East (N. Virginia))
Name - must be globally unique, DNS compliance
Features
Object Versioning
CROS (cross origin control)
Event notification
Lifecycle
Logging
Object locking
Select - use SQL on certain object types (CSV, JSON, etc.)
Access control - policy (bucket level) & ACL (bucket & object level)
Storage class (object level)
Encryption (object level)
Replication
Requester payment (who download who pay for it, no anonymous access)
Tagging (bucket)
Transfer acceleration (work with CloudFront)
Static website hosting
BitTorrent support (limited regions, for distribution only)
Billing & Usage Report
S3 default encryption: set default encryption on a bucket so that all new objects are encrypted:
Existing objects - not affected (to encrypt, can use one single batch operation)
New PUT without encryption information - by default encryption
New PUT with encryption information - by request
Using server-side encryption with either Amazon S3-managed keys (SSE-S3) or customer master keys (CMKs) stored in AWS Key Management Service (AWS KMS).
SSE-KMS option subject to KMS rate limit
Can use client-side encryption
Use a customer master key (CMK) stored in AWS Key Management Service (AWS KMS).
Use a master key you store within your application.
Fundamental entities stored in bucket; includes data (itself) and meta data (standard & custom);
Identification (in bucket): key + version ID
Subresource - object specific additional information
ACL
torrent (BT protocol)
Meta
System defined, only system modifies
User defined, key/value pairs, key stored in lowercase
Tag
object level supported; replication - will copy to replica
can be used (in addition) in lifecycle, permission (IAM policy)
One object, one key (bucket + key + version ID -> unique ID of object)
UTF8 string (max 1024 length) - still wise to use just safe characters
Flat, no hierarchy by nature
Hierarchy inferred (prefix & delimiter '/')
Bucket in a region and will not change
Object level storage class
Change / set:
Put
For existing - Put Object - Copy
Through lifecycle
Replication
INTELLIGENT_TIERING - intelligently decide
Transition between classes has constraints (see under lifecycle section)
Amazon S3 Analytics – Storage Class Analysis - tools to analyze and help decide class
Subresources (something dependent on a resource):
bucket: lifecycle, website, versioning, logging, policy, ACL, cors
object: ACL, restore (from archive), torrent
Ownership:
bucket - the creator AWS account
object:
Uploaded with IAM credential - the account where the user / role belongs (in case of cross account, the uploading account)
In case of bucket owner is not uploader:
bucket owner pays bill;
bucket owner can control access or delete object - even does not own it
bucket owner can archive / restore - even does not own
Accounts involved:
Requester (may be public - anonymous)
Bucket owner
Object owner
Request Authentication
Authenticated (with signature)
Unauthenticated
By 'anonymous', special canonical ID '65a011a29cdf8ec533ec3d1ccaae921c'
Anonymous may handle object & modify object ACL (see Authorization below)
Relevant policies:
Requester related user policy
Bucket based:
bucket policy & ACL
Object based:
object ACL
Authorization (see more details below):
This is akin to a child (IAM user) who wants to play with a toy (object or bucket) that belongs to someone else (object owner & bucket owner).
In this case, the child must get permission from
a parent
permission from the toy owner (and the bucket owner).
Guidelines - when to use what mechanism (recommended scenario)
User policy
if preferred
Bucket Policy
want to manage cross-account permissions for all Amazon S3 permissions (not just simple read/write permissions in ACL)
or if preferred
Bucket ACL
(the only way to) grant write permission to the Amazon S3 Log Delivery group to write access log objects to your bucket
Object ACL
when is the only way to do it - bucket owned by someone else
when needs object level control
grant another account object level ACL control
Access Analyzer for S3 - tools for analyze access control
Collect all policies into a set (user policy, bucket policy, ACL for bucket and object)
Evaluate in user context (only when requester is IAM user/role, NOT root user, not anonymous):
The parent account where the user belongs is the context (not necessarily the bucket owner, not necessarily the object owner)
Evaluate the user policy (attached to the user)
If the parent account owns the resource (a bucket or an object), also evaluates the resource policy (bucket policy, bucket ACL, object ACL)
User must have the permission (from the parent account) to perform the action
Bucket context (if operation is on bucket & is on object)
Evaluates policies owned by the bucket owner
If request bucket operation:
Must has permission
If request object operation:
Bucket owner must have not explicit DENIED request
Object context (only if operation is on object)
Evaluate policies owned by object owner
If bucket owner = object owner, access can be granted by bucket policy & ACL
If bucket owner != object owner, object owner MUST explicitly grant
Public access is granted to buckets and objects through access control lists (ACLs), access point policies, bucket policies, or all.
Meaning of Public:
ACL is public if grant any permission to members of AllUsers or AuthenticatedUsers group
Policies is (1) assumed public (2) until determined is non-public
Access point is public: origin Internet, grant access...
Provides four settings (can be any combined):
BlockPublicAcls
IgnorePublicAcls
BlockPublicPolicy
RestrictPublicBuckets
Any combination of the settings can be applied to:
to individual access points, buckets, or entire AWS accounts
Recommend that you turn on all four settings for block public access for your account (by default blocked).
IAM policies (principal, action, resource, etc.)
Bucket policies supplement, and in many cases, replace ACL-based access policies.
Attach to: bucket
Granularity:
Bucket level
Subset of objects (match key using wildcards, variable, etc., so to control keys with prefix/suffix etc.)
S3-specific XML grammar; only basic read/write/read ACP/write ACP permissions; uses canonical account ID (with special ID for anonymous)
Pre-signed URL - can allow others (without AWS account) get/upload object
Get
Upload - up to 5G, otherwise multi-part API must be used; also multi-part API recommended for (1) parallel upload for acceleration (2) unstable network for continuation
Copy - may across region, may be used to change something (encryption, class etc.); can use single operation or multi-part; can use batch
List - can filter with prefix
Delete - can delete single or multi objects; if bucket versioning enabled, can specify versioned / non-versioned delete; also be aware of MFA enabled bucket
Select - works on objects stored in CSV, JSON, or Apache Parquet format. It also works with objects that are compressed with GZIP or BZIP2 (for CSV and JSON objects only), and server-side encrypted objects.
Restore archived - from GLACIER or DEEP_ARCHIVE classes
Query Archived
Https only, for access at scale, purposed only certain operations on Objects (e.g. GetObject, PutObject, not bucket), has distinct permission & network controls.
Access point policy:
Work together with bucket policy
NetworkOrigin: Internet or VPC
VpcConfiguration: specify VPC
Policy must allow specifically the access point as resource in policy:
"arn:aws:s3:us-west-2:123456789012:accesspoint/example-vpc-ap/object/*"
PublicAccessBlockConfiguration - block public access from Internet
IAM attributes (for policy matching)
s3:DataAccessPointArn
s3:DataAccessPointAccount - account of access point owner
s3:AccessPointNetworkOrigin (Internet / VPC)
Using:
Access point has ARN
arn:aws:s3:us-west-2:123456789012:accesspoint/test (access point named 'test', * can be used)
arn:aws:s3:region:account-id:accesspoint/access-point-name/object/resource (objects in bucket through access point)
read-after-write for PUT of new objects
if HEAD or GET the key name (to find if the object exists) before creating the object, S3 provides eventual consistency for read-after-write
eventual consistency for overwrite PUTS and DELETES
Updates to a single key are atomic (might return old data but never corrupted / partial data)
be aware of the eventual consistent model (immediate read / query after update may retrieve old data / status)
no locking - two PUT to same key, the one with latest timestamp wins
REST
Style:
Virtual host style
http://bucket.s3-aws-region.amazonaws.com
http://bucket.s3.amazonaws.com (fading out, don't use)
Path style
Region-specific endpoint, http://s3.aws-region.amazonaws.com/bucket
US East (N. Virginia) Region endpoint, http://s3.amazonaws.com/bucket
Access Point
AWS SDK
SOAP - https still supported but deprecating
Supports IPv6
Programs that make requests against buckets created using the <CreateBucketConfiguration> API must support redirects. Additionally, some clients that do not respect DNS TTLs might encounter issues. (request routing)
Between service & on-premise clients:
AWS Site-to-site VPN
Direct connect
Between AWS Resources in same region
VPC Endpoint
Amazon CloudWatch Alarms: watch matrix, trigger alarm
AWS CloudTrail Logs: records actions (requests) by users / roles
Amazon S3 Access Logs:
AWS Trusted Advisor:
One operation, tons of objects...
Basic concepts:
Job - all information of what to do
Operation - single operation the job to do
PUT copy object
PUT object tagging
PUT object ACL
Initiate Glacier restore
Invoke an AWS Lambda function
Manifest (objects to process in job)
CSV-formatted Amazon S3 Inventory report
simple CSV format
Task - represents a single call to a single object: one task for each object
Use set of rules; if versioning used, can be defined on current & non-current objects; NOT support MFA enabled buckets
Logging: actions not captured by CloudTrail (not API call); captured by CloudWatch
Rule:
ID
Status: Enabled / Disabled
Filter
by key prefix
by tag
combination of prefix and tag
all (empty filter)
Action
Transition action - change storage class
Expiration action - expire object (can be deleted)
Non-versioned bucket - permanently remove
Versioned:
non-current version objects not affected
current version:
if is delete marker (already deleted):
if there are other non-current versions - no action
no other version - delete marker only - delete the marker
normal object (not delete marker)
add delete marker as current, mark the object non-current (preserve version)
Versioned - suspended:
place delete-marker with null version ID
delete object with null version ID
NoncurrentVersionTransition (for how long to stay until to another class)
NoncurrentVersionExpiration (for how long to keep until permanently delete)
AbortIncompleteMultipartUpload - abort incomplete upload
ExpiredObjectDeleteMarker - delete marker when no other version exists
Can cross account (original & replica owned differently)
Can cross region.
Can with time control (15 minutes)
Minimum configuration:
destination bucket
role to assume to replicate
Encryption:
If source encrypted with SSE-S3 or SSE-KMS, same encryption setting for replica in destination
If source not encrypted, encrypt replica with destination default encryption (be aware ETag change)
more details - read document
(more details see documentation)
Use CORS to instruct browser allow request from certain origin
Able to do that. Check document
Uses CloudFront infrastructure
Speed comparison tool available
To work:
Enable
Use accelerated end points
If enabled, can apply a write-once-read-many (WORM) model:
Lock object for a period of time (or forever) from overwrite / delete
To meet regulation
Usage:
retention period - for a period
legal holds - lock until unlocked
Enabled at bucket level.
Version ID:
null - if not versioned
random string - system generated
Work with:
Noncurrent expiration lifecycle policy to expire old version
Work with SNS & SQS
Storage (size, duration, storage class)
Request (storage class, type of request, volume)
Transfer in / out (with some exceptions)
Management / replications (inventory, tagging, etc.)
Standard
IA - Infrequent Access
cheaper
charged for access
IA Single Zone
even cheaper
reduced durability
Intelligent-Tiering
can access IA
charge for monitoring
Glacier
Glacier deep archive
Outpost - on-premise