CloudFront

General

CloudFront is a web service that speeds up distribution of static, dynamic web or streaming content to end users.
Delivers the content through a worldwide network of data centers called edge locations.
Easy and cost effective way to distribute content with low latency and high data transfer speeds routing each user request to the edge location that can best serve the content thus providing the lowest latency.
CloudFront dramatically reduces the number of network hops that users’ requests must pass through, which helps improves performance, provide lower latency and higher data transfer rates.
Good choice for distribution of frequently accessed static.
"Cache behavior" allows you to define path patterns to apply for the request. A default (*) pattern is created and multiple cache distributions can be added with patterns to take priority over the default path.
Viewer Protocol policy can be configured to define the access protocol allowed. Can be either HTTP and HTTPS, or HTTPS only or HTTP redirected to HTTPS.
Object expiration determines how long the objects stay in a CloudFront cache before it fetches it again from Origin.
If you plan to increase the cache duration in CloudFront for certain dynamic contents. Modify the application to add a Cache-Control header to control how long the objects stay in the CloudFront cache.
CloudFront allows you to configure caching based on a URL path pattern when you create a new distribution. By partitioning the S3 bucket by folder (month, year, location, and so on), you can create different caching rules for your different files.
You can control the origin and path of the content, time to live (TTL), and control the user access using trusted signers.

Benefits

CloudFront eliminates the expense and complexity of operating a network of cache servers in multiple sites across the internet and eliminates the need to over-provision capacity in order to serve potential spikes in traffic.
CloudFront also provides increased reliability and availability because copies of objects are held in multiple edge locations around the world.
CloudFront keeps persistent connections with the origin servers so that those files can be fetched from the origin servers as quickly as possible.
Uses techniques such as collapsing simultaneous viewer requests at an edge location for the same file into a single request to the origin server reducing the load on the origin.
CloudFront integrates with AWS WAF, a web application firewall that helps protect web applications from attacks by allowing rules configured based on IP addresses, HTTP headers, and custom URI strings.

Configuration & Content Delivery

Configuration;
1. 1. Origin servers need to be configured to get the files for distribution. An origin server stores the original, definitive version of the objects and can be a AWS hosted service for e.g. S3, EC2 or an on premise server.
  2. Files (also called objects) can be added/uploaded to the Origin servers with public read permissions or permissions restricted to OAI.
  3. Create a CloudFront distribution, which tells CloudFront which origin servers to get the files from when users request the files.
  4. CloudFront sends the distribution configuration to all the edge locations.
  5. Website can be used with the CloudFront provided domain name or a custom alternate domain name.
  6. Origin server can be configured to limit access protocols, caching behavior, add headers to the files to add TTL or the expiration time.
Content delivery to Users;
1. 1. When user access the website, file or the object the DNS routes the request to the CloudFront edge location that can best serve the user’s request with the lowest latency.
  2. CloudFront returns the object immediately, if the requested object is present in the cache at the Edge location.
  3. If the requested object does not exist in the cache at the edge location, CloudFront requests the object from the Origin server and returns it to the user as soon as it starts receiving it
  4. When the object reaches it expiration time, for any new request CloudFront checks with the Origin server for any latest versions, if it has the latest it uses the same object.

Delivery Methods

Web distributions
1. 1. Supports both static and dynamic content using HTTP or HTTPS.
  2. Supports multimedia content on demand using progressive download and Apple HTTP Live Streaming (HLS).
  3. Supports a live event, such as a meeting, conference, or concert, in real time. For live streaming, distribution can be created automatically using an AWS CloudFormation stack.
  4. Origin servers can be either an Amazon S3 bucket or an HTTP server, for e.g., a web server or an AWS ELB etc.
RMTP distributions
1. 1. Supports streaming of media files using Adobe Media Server and the Adobe Real-Time Messaging Protocol (RTMP), and must use an S3 bucket as the origin.
  2. To stream media files two types of files are needed; Media files & Media player.
  3. End users view media files using the media player that is provided; not the locally installed on the computer of the device.
  4. When an end user streams the media file, the media player begins to play the file content while the file is still being downloaded from CloudFront.
  5. Media file is not stored locally on the end user’s system.
  6. Two CloudFront distributions are required, Web distribution for media Player and RMTP distribution for media files.

Media player and Media files can be stored in same origin S3 bucket or different buckets

Origin

Each origin is either an S3 bucket or an HTTP server.
For HTTP server as the origin, the domain name of the resource needs to be mapped and files must be publicly readable.
For S3 bucket, use the bucket url or the static website endpoint url and the files either need to be publicly readable or secured using OAI.
Origin restrict access, for S3 only, can be configured using Origin Access Identity to prevent direct access to the S3 objects.
Distribution can have multiple origins for each bucket with one or more cache behaviors that route requests to each origin. Path pattern in a cache behavior determines which requests are routed to the origin (S3 bucket) that is associated with that cache behavior

Origin Policy

The protocol policy that you want CloudFront to use when fetching objects from your origin server.
HTTP Only: CloudFront uses only HTTP to access the origin.
HTTPS Only: CloudFront uses only HTTPS to access the origin.
Match Viewer: CloudFront communicates with your origin using HTTP or HTTPS, depending on the protocol of the viewer request. CloudFront caches the object only once even if viewers make requests using both HTTP and HTTPS protocols.

Viewer Protocol Policy

Choose the protocol policy that you want viewers to use to access your content in CloudFront edge locations:
HTTP and HTTPS: Viewers can use both protocols.
Redirect HTTP to HTTPS: Viewers can use both protocols, but HTTP requests are automatically redirected to HTTPS requests.
HTTPS Only: Viewers can only access your content if they're using HTTPS.

HTTPS Connection

Between CloudFront & Viewers, cache distribution can be configured to either allow HTTP or HTTPS requests, or use HTTPS only, or redirect all HTTP request to HTTPS.
Between CloudFront & Origin, cache distribution can be configured to require that CloudFront fetches objects from the origin by using HTTP, HTTPS or CloudFront uses the protocol that the viewer used to request the objects.
For S3 as origin for website, the protocol has to be HTTP.
For S3 bucket, the default Origin protocol policy is Match Viewer and cannot be changed. So When CloudFront is configured to require HTTPS between the viewer and CloudFront, it automatically uses HTTPS to communicate with S3.
CloudFront can also be configured to work with HTTPS for alternate domain names by using:-
1. Serving HTTPS Requests Using Dedicated IP Addresses
  1. - CloudFront associates the alternate domain name with a dedicated IP address, and the certificate is associated with the IP address. when a request is received from a DNS server for the IP address.
    - Additional monthly charge (of about $600/month) is incurred for using dedicated IP address.
2. Serving HTTPS Requests Using SNI
  1. - SNI custom SSL relies on the SNI extension of the TLS protocol, which allows multiple domains to be served over the same IP address by including the hostname, viewers are trying to connect to.
    - The IP address is not dedicated cloudFront can’t determine, based on the IP address, which domain the request is for as the IP address is not dedicated.
    - Browsers that support SNI automatically gets the domain name from the request URL & adds it to a new field in the request header. Older browsers do not support it
    - When CloudFront receives an HTTPS request from a browser that supports SNI, it finds the domain name in the request header and responds to the request with the applicable SSL/TLS certificate, and viewer and CloudFront perform SSL negotiation.
    - SNI Custom SSL is available at no additional cost beyond standard CloudFront data transfer and request fees
3. For "End-to-End HTTPS connections" certificate needs to be applied both between the Viewers and CloudFront & CloudFront and Origin, with the following requirements

- 1. HTTPS between viewers and CloudFront; Certificate issued by a trusted certificate authority (CA), Certificate provided by AWS Certificate Manager (ACM), or Self-signed certificate.
  2. HTTPS between CloudFront and a custom origin; If the origin is not an ELB load balancer, the certificate must be issued by a trusted CA. For ELB load balancer, certificate provided by ACM can be used.

Allowed HTTP methods

CloudFront supports GET, HEAD, OPTIONS, PUT, POST, PATCH, DELETE to get, add, update, and delete objects, and to get object headers.
GET, HEAD, OPTIONS methods to use CloudFront only to get objects, object headers or retrieve a list of the options supported from your origin.
POST, PUT operations can also be performed for e.g. submitting data from a web form, which are directly proxied back to the Origin server.
CloudFront only caches responses to GET and HEAD requests and, optionally, OPTIONS requests. CloudFront does not cache responses to PUT, POST, PATCH, DELETE request methods and these requests are directed to the origin.
PUT, POST http methods also help for accelerated content uploads, as these operations will be sent to the origin e.g. S3 via the CloudFront edge location, improving efficiency, reducing latency, and allowing the application to benefit from the monitored, persistent connections that CloudFront maintains from the edge locations to the origin servers.

Improving CloudFront Edge Caches

Control the cache max-age; To increase the cache hit ratio, origin can be configured to add a Cache-Control max-age directive to the objects.
Caching Based on Query String Parameters; For Web distributions to forward only the query strings for which your origin will return unique objects using the same case for the parameters and the same parameter order (for e.g. for request a=x&b).
1. - For RTMP distributions, when CloudFront requests an object from the origin server, it removes any query string parameters.
Caching Based on Cookie Values; For Web distributions. By default, it doesn’t consider cookies while caching on edge locations.
1. - Caching performance can be improved by Configure CloudFront to forward only specified cookies instead of forwarding all cookies for e.g. if the request has 2 cookies with 3 possible values, CloudFront would cache all possible combinations even if the response takes into account a single cookie.
    - Cookie names and values are both case sensitive so better to stick with the same case.
    - Create separate cache behaviors for static and dynamic content, and configure CloudFront to forward cookies to the origin only for dynamic content.
    - If possible, create separate cache behaviors for dynamic content for which cookie values are unique for each user (such as a user ID) and dynamic content that varies based on a smaller number of unique values reducing the number of combinations.
  - For RTMP distributions, CloudFront cannot be configured to process cookies. When CloudFront requests an object from the origin server, it removes any cookies before forwarding the request to your origin. If your origin returns any cookies along with the object, CloudFront removes them before returning the object to the viewer.
Caching Based on Request Headers
1. - By default, CloudFront doesn’t consider headers when caching your objects in edge locations.
  - Does not change the headers that CloudFront forwards, only whether CloudFront caches objects based on the header values.
  - Caching performance can be improved by;
    - Forward and cache based only specified headers instead of forwarding and caching based on all headers.
    - Try to avoid caching based on request headers that have large numbers of unique values.
    - CloudFront configured to forward all headers to your origin, CloudFront doesn’t cache the objects associated with this cache behavior.
  - For RTMP distributions, CloudFront cannot be configured to cache based on header values.

Object Caching & Expiration

Low expiration time helps serve content that changes frequently and high expiration time helps improve performance and reduce load on the origin.
After expiration time, CloudFront checks if it still has the latest version,
1. - If the cache already has the latest version, the origin returns a 304 status code (Not Modified).
  - If the CloudFront cache does not have the latest version, the origin returns a 200 status code (OK) and the latest version of the object.
If an object in an edge location isn’t frequently requested, CloudFront might evict the object, remove the object before its expiration date, to make room for objects that have been requested more recently.
By default, each object automatically expires after 24 hours.
For Web distributions, the default behavior can be changed by
1. - For the entire path pattern, cache behavior can be configured by setting of Minimum TTL, Maximum TTL and Default TTL values.
  - For individual objects, origin can be configured to add a Cache-Control max-age or Cache-Control s-maxage directive, or an Expires header field to the object..
  - AWS recommends using Cache-Control max-age directive over Expires header to control object caching behavior.
  - CloudFront uses only the value of Cache-Control max-age, if both the Cache-Control max-age directive and Expires header are specified.
  - HTTP Cache-Control or Pragma header fields in a GET request from a viewer can’t be used to force CloudFront to go back to the origin server for the object.
  - By default, when the origin returns an HTTP 4xx or 5xx status code, CloudFront caches these error responses for five minutes and then submits the next request for the object to the origin to see whether the requested object is available and the problem has been resolved.

- For RTMP distributions
  - Cache-Control or Expires headers can be added to objects to change the amount of time that CloudFront keeps objects in edge caches before it forwards another request to the origin.
  - Minimum duration is 3600 seconds (one hour). If you specify a lower value, CloudFront uses 3600 seconds.

Restrict Access -Serving Private Content

To securely serve private content require the users to access the private content by using special CloudFront signed URLs or signed cookies with following restrictions;
1. 1. - end date and time, after which the URL is no longer valid
    - start date time, when the URL becomes valid
    - ip address or range of addresses to access the URLs

- Require that users access the S3 content only using CloudFront URLs, not S3 URLs. Requiring CloudFront URLs isn’t required, but recommended to prevent users from bypassing the restrictions specified in signed URLs or signed cookies.

Signed URLs or Signed Cookies can used with CloudFront using HTTP server as an origin. It requires the content to be publicly accessible and not share the direct URL of the content.
Restriction for Origin can be applied by

- - For S3, using Origin Access Identity to grant only CloudFront access using Bucket policies or Object ACL, to the content and removing any other access permissions.
  - For HTTP server, custom header can be added by CloudFront which can be used at Origin to verify the request has come from CloudFront.

Trusted Signer

- - To create signed URLs or signed cookies, at least one AWS account (trusted signer) is needed that has an active CloudFront key pair, which should be frequently rotated
  - Once AWS account is added as trusted signer to the distribution, CloudFront starts to require that users use signed URLs or signed cookies to access the objects.
  - Private key from the trusted signer’s key pair to sign a portion of the URL or the cookie. When someone requests a restricted object, CloudFront compares the signed portion of the URL or cookie with the unsigned portion to verify that the URL or cookie hasn’t been tampered with. CloudFront also validates the URL or cookie is valid for e.g, that the expiration date and time hasn’t passed.
  - A maximum of 5 trusted signers can be assigned for each cache behavior or RTMP distribution.

Signed URLs vs Signed Cookies;

- Use signed URLs in the following cases:
  - For RTMP distribution as signed cookies aren’t supported
  - To restrict access to individual files, for e.g., an installation download for your application.
  - Users using a client, for e.g. a custom HTTP client, that doesn’t support cookies.
  - Your users are using a client (for example, a custom HTTP client) that doesn't support cookies.
- Use signed cookies in the following cases:
  - Provide access to multiple restricted files, for e.g., all of the video files in HLS format or all of the files in the subscribers’ area of a website.
  - Don’t want to change the current URLs.

Canned Policy vs Custom Policy

Canned policy or a custom policy is a policy statement, used by the Signed URLs, helps define the restrictions for e.g. expiration date and time
CloudFront validates the expiration time at the start of the event.
If user is downloading a large object, and the url expires the download would still continue and the same for RTMP distribution.
However, if the user is using range GET requests, or while streaming video skips to another position which might trigger an other event, the request would fail.

Serving Compressed Files

CloudFront can be configured to automatically compress files of certain types and serve the compressed files when viewer requests include Accept-Encoding: gzip in the request header.
Downloads are faster because the files are smaller as well as less expensive as the cost of CloudFront data transfer is based on the total amount of data served.
If serving from a custom origin, it can be used to
1. - Configure to compress files with or without CloudFront compression.
  - Compress file types that CloudFront doesn’t compress.
If the origin returns a compressed file, CloudFront detects compression by the Content-Encoding header value and doesn’t compress the file again.
Compression Steps:

1. CloudFront distribution is created and configured to compress content.
2. A viewer requests a compressed file by adding the Accept-Encoding: gzip header to the request.
3. At the edge location, CloudFront checks the cache for a compressed version of the file that is referenced in the request.
4. If the compressed file is already in the cache, CloudFront returns the file to the viewer and skips the remaining steps.
5. If the compressed file is not in the cache, CloudFront forwards the request to the origin server (S3 bucket or a custom origin)
6. Even if CloudFront has an uncompressed version of the file in the cache, it still forwards a request to the origin.
7. Origin server returns an uncompressed version of the file
8. CloudFront determines whether the file is compressible:
  - A type that CloudFront compresses.
  - File size between 1,000 and 10,000,000 bytes.
  - Response must include a Content-Length header for CloudFront to determine the size within valid compression limits. If the Content-Length header is missing, CloudFront won’t compress the file.
  - Value of the Content-Encoding header on the file must not be gzip i.e. the origin has already compressed the file.
9. If the file is compressible, CloudFront compresses it, returns the compressed file to the viewer, and adds it to the cache.
10. The viewer uncompresses the file.

Distribution Details

Price Class
- Cost for each edge location varies and the price charged for serving the requests also varies.
- CloudFront edge locations are grouped into geographic regions, and regions have been grouped into price classes.
  - Default Price Class – includes all the regions
  - Another price class includes most regions (the United States; Europe; Hong Kong, Korea, and Singapore; Japan; and India regions) but excludes the most-expensive regions.
  - A third price class includes only the least-expensive regions (the United States and Europe regions)
- Price class can be selected to lower the cost but this would come only at the expense of performance (higher latency), as CloudFront would serve requests only from the selected price class edge locations.
- CloudFront may, sometimes, serve request from a region not included within the price class, however you would be charged the rate for the least-expensive region in your selected price class.
Alternate Domain Names (CNAMEs)
- CloudFront by default assigns a domain name for the distribution for e.g. d111111abcdef8.cloudfront.net
- An alternate domain name, also known as a CNAME, can be used to use own custom domain name for links to objects.
- Both web and RTMP distributions support alternate domain names.
- CloudFront supports * wildcard at the beginning of a domain name instead of specifying subdomains individually. However, wildcard cannot replace part of a subdomain name for e.g. *domain.example.com, or cannot replace a subdomain in the middle of a domain name for e.g. subdomain.*.example.com

Geo Restriction (Geoblocking)
- Geo restriction can help allow or prevent users in selected countries from accessing the content.
- Allow users in whitelist of specified countries to access the content or to deny users in a blacklist of specified countries to access the content
Geo restriction can be used to restrict access to all of the files that are associated with a distribution and to restrict access at the country level.
CloudFront responds to a request from a viewer in a restricted country with an HTTP status code 403 (Forbidden).
Use a third-party geolocation service, if access is to be restricted to a subset of the files that are associated with a distribution or to restrict access at a finer granularity than the country level.

CloudFront with Amazon S3

For an RTMP distribution, S3 bucket is the only supported origin and custom origins cannot be used.
Using CloudFront over S3 has the following benefits
1. - Can be more cost effective if the objects are frequently accessed as at higher usage, the price for CloudFront data transfer is much lower than the price for S3 data transfer.
  - Downloads are faster with CloudFront than with S3 alone because the objects are stored closer to the users
When the bucket is moved to a different region, CloudFront can take up to an hour to update its records to include the change of region when both of the following are true:
1. - Origin Access Identity (OAI) is used to restrict access to the bucket.
  - Bucket is moved to an S3 region that requires Signature Version 4 for authentication.
Origin Access Identity:

- With S3 as origin, objects in S3 must be granted public read permissions and hence the objects are accessible from both S3 as well as CloudFront.
- Even though, CloudFront does not expose the underlying S3 url, it can be known to the user if shared directly or used by applications.
- For using CloudFront signed URLs or signed cookies to provide access to the objects, it would be necessary to prevent users from having direct access to the S3 objects.
- Users accessing S3 objects directly would bypass the controls provided by CloudFront signed URLs or signed cookies, for e.g., control over the date time that a user can no longer access the content and the IP addresses can be used to access content. CloudFront access logs are less useful because they’re incomplete.
- Origin Access Identity (OAI) can be used to prevent users from directly accessing objects from S3. It's a special CloudFront user, can be created and associated with the distribution.
- S3 bucket/object permissions needs to be configured to only provide access to the Origin Access Identity.
- When users access the object from CloudFront, it uses the OAI to fetch the content on users behalf.

Working with Objects

- CloudFront can be configured to include custom headers or modify to:
  - Validate the user is not accessing the origin directly, bypassing CDN.
  - Identify the CDN from which the request was forwarded, if more than one CloudFront distribution is configured to use the same origin.
  - If users use viewers that don’t support CORS, configure CloudFront to forward the Origin header to the origin. That will cause the origin to return the Access-Control-Allow-Origin header for every request.

Adding & Updating Objects: Objects start distributing them when accessed. Objects can be updated either by:
1. - Overwriting the Original object.
  - Create a different version and updating the links exposed to the user.

- For updating objects, its recommended to use versioning, so the the links can be changed when the objects are updated forcing a refresh.
- With versioning, there is no time wait for an object to expire before CloudFront begins to serve a new version of it, there is no difference in consistency and no cost involved to pay for object invalidation.

Removing/Invalidating Objects

- Objects, by default, would be removed upon expiry (TTL) and the latest object would be fetched from the Origin.
- Objects can also be removed from the edge cache before it expires Changing object name (versioning) or invalidate the object from edge caches.

For Web distributions, if your objects need to be updated frequently, changing Object name (Versioning) is recommended over Invalidating objects.
1. - Enables to control which object a request returns. If an object is invalidated, the user might continue to see the old version until it expires from those caches.
  - Makes it easier to analyze the results of object changes as CloudFront access logs include the names of the objects.
  - Provides a way to serve different versions to different users.
  - Simplifies rolling forward & back between object revisions.
  - Is less expensive, as no charges for invalidating objects.

- Invalidating objects from the cache
  - - Can be invalidated explicitly selected objects (or multiple objects including the * wildcard) before they expire to force a refresh. For a single object for e.g. /js/ab.js or for multiple objects for e.g. /js/*
    - A specified number of invalidation paths (first 1,000) can be submitted each month for free. Any invalidation requests more than the allotted no. per month, fee is charged for each submitted invalidation path.

For RTMP distribution, objects served cannot be invalidated

Access Logs

CloudFront can be configured to create log files that contain detailed information about every user request that CloudFront receives.
Access logs are available for both web and RTMP distributions.
An S3 bucket can be specified where CloudFront would save the files.
CloudFront delivers access logs for a distribution periodically, up to several times an hour.
CloudFront usually delivers the log file for that time period to the S3 bucket within an hour of the events that appear in the log. Note, however, that some or all log file entries for a time period can sometimes be delayed by up to 24 hours.

CloudFront Cost

CloudFront charges are based on actual usage of the service in four areas:
- Data Transfer Out to Internet: charges are applied for the volume of data transferred out of the CloudFront edge locations, measured in GB. Data transfer out from AWS origin (e.g., S3, EC2, etc.) to CloudFront are no longer charged.
- HTTP/HTTPS Requests: Number of HTTP/HTTPS requests made for the content.
- Invalidation Requests: Per path in the invalidation request. A path listed in the invalidation request represents the URL (or multiple URLs if the path contains a wildcard character) of the object you want to invalidate from CloudFront cache.
- Dedicated IP Custom SSL certificates associated with a CloudFront distribution: $600 per month for each custom SSL certificate associated with one or more CloudFront distributions using the Dedicated IP version of custom SSL certificate support, pro-rated by the hour.

Saving Cost

It's free the data transfer from origin to CloudFront edge locations (Amazon CloudFront "origin fetches").

Videos

Cloud Front with S3 Example

Page updated

Report abuse