add "@types/aws-lambda" (not official but ok) into dev dependency
TS Code:
import {APIGatewayProxyHandlerV2} from "aws-lambda"
export const greetingLbFunc : APIGatewayProxyHandlerV2 = async (event, context, cb)=>{
}
Run without managing server/container. By default securely run within a VPC. Scale when needed, from a few/day to thousands per second. Pay for compute time (not up time in EC2, but according to an Article the on-going cost pretty much the same). The relieve of management responsibility is on cost of flexibility - cannot log into instance (OS), or customize OS or language runtime.
Invoke:
in response to events (data change in S3 or dynamoDB table);
API Gateway
with AWS SDKs
It's possible to directly call Lambda from Internet with SDKs without through API-Gateway. See https://forum.serverless.com/t/convince-me-to-use-api-gateway-and-not-call-lambda-direct/3214/12
Building blocks:
Function: code & libraries
Event source: from an AWS service or a custom service, the trigger
Downstream resources: AWS service that Lambda calls
Log stream: automatically monitor and report to CloudWatch; annotate function code to obtain custom logger
AWS Serverless Application Model (SAM): a model to define serverless applications, supported by CloudFormation
Tools to create and test
Console
AWS CLI
SAM CLI - able to develop, test, analyze serverless applications LOCALLY before uploading, based on Docker & a customized Docker image (docker-lambda), very easy to install
Check document for specific environment at the time.
STATELESS environment, and demands STATELESS function. Nothing lasts longer than request (if not persisted).
OS - as of 2018/08 a Linux
Runtime - check document. As of 2018/08 Java 8
Libraries - AWS SDK always available, can access other private service / libraries (see Functions/access resources)
Network access - available
Environment variables - see document, some are reserved (set by AWS and cannot change) and some can be set and used.
Must in stateless style
Authoring: use a list of supported languages, tools: console, IDE (Eclipse, Visual Studio), etc.
Programming Model: check latest document, the model regardless of language, on:
handler - method AWS first call when begins execution
context object - code to interact with environment (such as find out remaining time before timeout)
logging
exceptions - to communicate the result of execution
Handler (Node.js):
Invocation types
Request Response - return the result or use callback
Event or asynchronous - discard any result (investigate Dead Letter Queue)
Examples, when work together with asynchronous calls to other services such as DynamoDb:
exports.myHandler = function(event, context, callback) {... // the handler is a normal function, "=>" syntax also ok
call "put" of DocumentClient asynchronously, like "dynamo.put(params, function (err, data){...})"
when a callback is supplied, such as the above, a request will be sent (but only after this stack exits), and callback will be called
Lambda will wait until everything is done - so handler does not need to wait
exports.createOrUpdateContact = async (event, context, callback) => {
handler is an AsyncFunction, which, upon called, returns a Promise
if I don't use "await" for the dynamodb DocumentClient.put method, result will return with null
why? async function, when called, code block run immediately, so "put" is indeed called, then when return, result is actually null (real result not returned from Dynamodb, actually not sent yet), so is wrapped in a Promise(null), maybe that's the reason?
WRONG EXAMPLE: await dynamo.put(params, function (err, data) {...})promise();
when called put() the request is sent, then "promise()" the request is sent again!!! - this is the reason callback is called twice which confused me very much
CORRECT EXAMPLE: read DocumentClient documentation, if callback is not provided, request is not sent. So correct way to use async function handler is, call "put" without callback, obtain the AWS.Request, register call back handler, call "promise()" to send the request and obtain the promise, await that promise, then call back.
Summary of handler function
when work with normal handler function
normal handler can return first, with request being sent and callback registered
when work with async handler function
async function
definition of async function - AsyncFunction
invoke it, return immediately with a Promise
without "await", when handler exit, result return immediately with null - so DO "await"
if call something that work in a "call back" way (like the dynamodb pub method with call back), it will NOT work out of the box. call "await" way & convert the call to a Promise if necessary
do NOT call action (like "put") & provide callback. DO call "promise()" as explained above
do as explained above
Deploying:
create a deploy package (use tools such as Jenkins for Node.js & python, maven for java etc.); if use console, console do that automatically
uploading - use console, AWS CLI, AWS SDK - all call CreateFunction opertion; also in addition to package, provide configuration
testing - from console, AWS CLI - Invoke method, or LOCALLY use SAM CLI (beta, not guarantee identical behavior); console provides sample event data that can be used
Monitoring & troubleshooting
Cloudwatch - automatic logs all requests & logs generated
SAM-CLI (beta, there are problems with python module system but can be solved)
SAM is template to describe Lambda function and resources
SAM-CLI provide local environment to develop and test serverless application
$ sam local generate-event s3 ... // can generate a json event payload
$ sam local invoke function-name -e event_file.json // test a function locally with a payload
$ sam local start-api // start local API Gateway to test HTTP request/response, with hot reloading
by default uses proxy integration
validate constraints (timeout, memory..), honor security credentials (with remote calls)
$ sam init // command provides full functional SAM application for bootstrap
sam init;cd sam-app;cd hello_world;npm install;cd ..;sam local start-api // works for node 8
sam init --runtime java8;
$ sam validate // the template against official SAM specification
$sam package;sam deploy // call aws CloudFormation package & deploy commands
SAM Local work with AWS Toolkit for Eclipse to debug lambda locally
For javascript it seems that "sam local" with an editor (atom) is good enough. For compiled languages or projects better use an IDE: AWS Cloud9 (great for Node.js at the moment not good for Java), Eclipse, Visual Studio
Configuring:
Resources - memory, 128M~3008M with 64M increments (CPU allocated proportional to memory), use console Configuration->Advanced settings, or CLI
timeout - default 3 seconds
IAM role (execution role) - AWS Lambda resumes when it executes the function
Handler name - entry point
ReservedConcurrentExecutions - to control concurrency for this particular function, and ALL versions/aliases counts.
Access Resources
AWS services (public) - SDK automatically included, no need to include own copy, just use; credential automatically set by AWS with configured role
Non AWS services - can include SDK, use environment variables for storing credential information
Private service / resources - by default, Lambda run in a VPC, cannot access resources within other VPC; additional setup enable Lambda to setup elastic network interfaces (ENIs) to connect to other resources within private VPC; Lambda can NOT connect to Dedicated Tenancy VPC
Execution Model & Cold Start
Execution context : the temporary runtime environment that initializes any external dependencies (database connections, HTTP end points), when setting up adds some latency (bootstrapping, cold start). Will attempted to be reused (freezes / thaws) for some time in anticipation of another invocation after first. Then
Declarations in function code (outside of handler) remains initialized. So check existence of a database connection before creating one.
/tmp has 500M for cache during invocations if context reused
If background process/call back did not complete, will resume if context reused from frozen. Complete them before code exits.
Do NOT assume context reused
Event Source - AWS service can generate events. In addition, user applications can also generate events - build own event sources. Custom event sources can use Lambda Invoke operation. User application can publish events using AWS SDK. More to see user guide for event source & sample event data.
On-demand Invocation (calling Invoke operation, sync/async determined at calling time):
Synchronous
Asynchronous
Event source mapping - what events to publish and lambda function to invoke when occur; can create custom applications to include AWS source events.
AWS Service, Push model (all other services except for poll based) - mapping maintained with source (example: API-Gateway, S3); need to grant event source the permission via resource-based policy associated with the Lambda function ("Lambda Function Policy". To do this, on console, check "designer" - the event source should show up there, and "view permissions" (the key icon) -> "Function policy", or with CLI: "aws lambda get-policy --function-name xxx"
AWS Service, Poll based (Kinesis, DynamoDB, SQS)- mapping with Lambda; Lambda needs permission to poll the streams (but does not need permission to invoke function)
Custom application - just call the Lambda with Invoke operation, no additional permission if custom application is using the same account credentials; otherwise require cross-account permissions;
Retry behavior - on exception (timeout, fail to parse data, resource run out...)
On-demand invocation (non-event)
Synchronous - receive 429 error (too many requests)
Asynchronous - events are first queued before invocation; will retry twice with delays if unable to process; then goes to Dead Letter Queue if configured one, or dropped.
Poll-based & events are stream based (in Kinesis or DynamoDB) , when fail, Lambda attempts to process the erring batch until the data expires which can be up to 7 days. Blocking - will not process new records until old one successfully processed (order preserved).
Poll-based & not stream based (SQS), will NOT block other messages (order NOT preserved) and retry, until (1) either ultimate success, or (2) message retention period expires - message discarded or goes to Dead Letter Queue
Scaling Behavior
Concurrent execution count:
On-demand invocation (non-event) each event is a unit of concurrency, auto scale, up to account limits
Poll-based & events are stream based (in Kinesis or DynamoDB) concurrency = number of shards
Poll-based & not stream based (SQS), automatic scale, for details check the developer guide
Request rate: except stream based, request rate (at which function is invoked) = event generation rate; for stream based, rate is how many executions and how fast lambda processes
Scaling:
automatic scale, subject to Account Level Concurrent Execution Limit; in response to burst, immediate increase by predetermined amount (depends on region), then at rate 500/minute. NOTE: VPC based lambda also subject to EC2's rate limit to provide Elastic Network Interfaces, see guide
beyond the rate, client expect (502 EC2 ThrottledException) and should retry or backoff
Function level concurrency control: can use to throttle lambda execution, can set to achieve serialization, ALL version/aliases counts. see https://www.jeremydaly.com/serverless-consumers-with-lambda-and-sqs-triggers/
My own test: positive. I created a test lambda that (1) increase an index from a dynamodb table (2) loop, with some sleep in between, and fetch the same index repeatedly during the maximum 3 minutes lambda execution, and check if the index is the same as the first fetched. RESULT: I set the function level concurrency to 1. If the lambda service is loose on the concurrency control, it would have allowed another instance to run at the same time, and allow the index to be increased (by another lambda instance) during the loop. The actual result shows the index does not loop, and another invocation returned with "An error occurred (TooManyRequestsException) when calling the Invoke operation (reached max retries: 4): Rate Exceeded."
On the same test, if I increase concurrency to 2, the contender will get invoked and main worker will read the change made by contender.
This means if I set worker concurrency to 1, I can forget about contention condition and no need to lock before write, since all operations are serialized.
Manners (the automatic ones seem all based on SAM specification):
Lambda API upload package (not very automatic)
AWS CLI "aws cloudformation deploy" command;
AWS Cloudformation;
AWS SAM
CodePipeline (CI/CD) & CodeBuild
Versioning & Aliases
Version - each has a unique ARN, immutable once published
Always there's $LATEST version
ARN: qualified .....:$LATEST, or unqualified (without version); unqualified can be used anywhere but cannot create alias
Publication: explicitly; together with create or update operation (recommended when multiple developer work on one function to avoid race condition);
Delete a version
Alias - each has a ARN, mutable pointer to a version; use in configuration so changing function does not need to change every configuration occurrence - only alias
About resource policy: grant access to function grant access to any version; depending on the ARN;
Alias routing-config : able to point to two versions with a percentage of routing to gradually shift traffic
Layers:
a bunded .zip -> extracted to /opt dir in execution env
in execution env certain paths include specific folders in /opt directory
Node.js: "nodejs/node_modules" "nodejs/node14/node_modules"
PATH: bin
create layer:
versioned
permissions: by default private to AWS account
use (with cloud-formation)
in function (AWS::Serverless::Function) (SAM) properties, Layers array
in layer (AWS::Serverless::LayerVersion)
SAM
Relationship with CloudFormation
Supported by CloudFormation
"Can also use the conventional CloudFormation syntax" (transform?)
They are alternative to each other
Specification: https://github.com/awslabs/serverless-application-model
Examples given: basic function, dynamodb table, simple table, events, api, permissions
Example given: for Node.js
Automating
CodePipeline - workflow from code change to testing to deployment, automated; there's a tutor to follow
CodeBuild - compile & unit-test, integrated with CodePipeline
CloudFormation
CodeDeploy - automatically deploy to EC2 instances, on-premises instance & serverless Lambda functions, integrate with various systems, avoid down time, deploy to a huge fleet of instances and automatically test the results
Gradual deployment - when deploy with SAM, automatically use CodeDeploy, and with some configuration do (1) deploy new version and create alias point to new (2) gradually shift traffic to new version until satisfaction, or roll back (3) define pre/post traffic test functions to verify correct result (4) rollback if CloudWatch alarm triggered, see developer guide
CloudWatch
Automatically tracks: number of requests, latency, errors
Set alarms with the metrics
Logs invocations (no need to config)
Insert logging statements for CloudWatch to track
X-Ray
See trail of invocations from event to downstream calls to identify issues and opportunity to optimize
Tagging - when there are large number of functions, use tag to group and filter them in administration tools, and to allocate cost to tags
CloudTrail API logging - log API calls (to lambda, from lambda...) (sounds like CloudWatch is about metrics and CloudTrail is about API calls)
Authentication & Access control - access as account root, IAM user, IAM role (for federated user, AWS service, EC2 instance); more to see user guide
Concurrency - account and function level concurrent execution limit control for cost and/or rate matching or whatever reason; so, even Lambda can be configured somehow.
Environment variables - able to pass settings without changing code; can be set with console, CLI or Lambda SDK; some variables are given from the environment (like region)
Dead letter queues