Queue services:
- Standard: at-least-once delivery, best-effort ordering
- Prepare to tolerate possible out-of-order messages, and on rare occasion (server unavailable for a period and become available later) duplicate messages. Design to be idempotent.
- FIFO: exactly-once processing, FIFO, available in certain regions only
- Message Deduplication ID: if a message with an ID is sent successfully, any messages sent with the same ID are accepted successfully but aren't delivered during the 5-min deduplication interval; apply to entire queue
- Message Group ID: messages in same group always processed one by one in a strict order
- Receive Request Attempt ID: used for deduplication of ReceiveMessage calls
- Sequence Number
Queue Configuration (both):
- Visibility Timeout (time period, up to 12 hours, start from message received, during which message is invisible to other receiving components, to avoid duplicate handling)
- Can also set this for single or multiple messages, but only with SDK
- Message retention period (up to 14 days)
- Maximum message size (256k)
- Delivery delay: can be set on queue level, or message level (message timers)
- Receive message wait time (maximum time a long pool waits for a message become available)
- Dead letter queue settings (re-drive policy)
- Dead letter queue must be the same type
- maxReceiveCount controls when to move
- time stamp not changed for message - so retention will be calculated from queue time of original queue
- Server-Side Encryption settings
Queue ID & other properties:
- Queue name (for FIFO must end with .fifo)
- Queue URL
Queue Operations:
- Create, configure, list, add permissions, manage tags, delete queue
- Send messages to queue
- Receive messages (pooling)
- Delete message
- Subscribe a queue to a SNS topic
- Configure Queue as Lambda trigger (upon new message)
- Both queue and lambda requires certain permissions
- Purge queue
- Get approximate number of messages in each state (queue)
Short vs. Long Polling:
- Short (default, when WaitTimeSeconds of ReceiveMessage request = 0 or by queue default = 0):
- query a subset of servers
- Long Polling
- reduce cost, eliminate number of empty responses and false empty responses
Capability: 3k messages/s with batching, 300/s without batching. Throughput depends on horizontal scale producer/consumers. To scale, ensure the client has enough connections.
Message:
- Message ID: system assigned, return in SendMessage response, max 100 char, not for delete (which uses receipt handle, this mean message cannot be recalled after sent)
- Receipt handle: associated with receipt, may use to delete message (so must receive then delete), single message can have multiple receipt handle if received more than once, then the most recent one must be provided, otherwise message may not be deleted
- Message body
- Message attributes: name, type (string, number, binary, custom: existing_type.custom), value, up to 10,
- Message timer: initial invisible period (0~15 minutes), delayed delivery
Message consumption:
- Cannot specify message, can set max number to get (up to 10)
- Don't automatically delete after retrieving. Must send separate request to delete.
Batching: one round trip to service with multiple works. Bach operations are: send, delete, change visibility to messages.
Metrics - send, receive, delete metrics with CloudWatch
Read this: https://www.jeremydaly.com/serverless-consumers-with-lambda-and-sqs-triggers/ and this (the official announcement) https://aws.amazon.com/blogs/aws/aws-lambda-adds-amazon-simple-queue-service-to-supported-event-sources/
Also this https://serverless.com/blog/aws-lambda-sqs-serverless-integration/#batch-size-and-error-handling-with-the-sqs-integration
https://aws.amazon.com/blogs/aws/aws-lambda-adds-amazon-simple-queue-service-to-supported-event-sources/
It seems that:
- Lambda service (instead of your lambda function or some hacks) poll the SQS queue
- Such polling counts as Receive Counts on the message
- The Lambda Service then try to bring up your lambda function to handle the messages (number of messages configurable). However this is subject to lambda function's concurrency setting.
- This means if there's a redrive policy on the queue, the message could be put to Dead Letter Queue before your lambda function get a chance to handle it (because Lambda Service pulled the message too many times but has no chance to give it to your lambda function)
- Error handling: returning error from lambda function doesn't tell the lambda service to Dead Letter Queue the message. So the lambda function should handle the error and move the message to DLQ or log to somewhere.
- So DLQ is tricky to work with in this use case. Messages can be put to DLQ without being handled by lambda FUNCTION. Or lambda function may leave message to be repeatedly triggered without being sent to DLQ. In the end, MAKE SURE handler remove message from queue either success or fail, after proper handling.
- (quoting the official announcement) "Just as a quick note here, our Lambda function timeout has to be lower than the queue’s visibility timeout in order to create the event mapping from SQS to Lambda" - interpretation: after receiving the visibility timeout starts to count. If visibility timeout expires first, while lambda function is still working on the messages, the messages become visible again and can be received again, and cause the messages be received more than once. Good point!
- Works with SAM (see example in official announcement, or https://github.com/becloudway/aws-lambda-sqs-sam
- No additional charges. Lambda service continuously long-poll the SQS queue and the long-poll calls will be charged.
- The last article (referred to in the comments of the first article) mentioned although Lambda Service automatically removes messages from queue if lambda returns successfully, better remove the messages manually in function if successfully processed, so the batch does not need to success or fail together.
- Also there's a 4th post referred to in the comment (and the comment itself) saying throttling & polling not well integrated. When throttling, the "in flight" messages gets high and polling becomes more frequent (instead less frequent because lambda function is not handling them quick enough)