"Loosely coupled systems"
The looser they are coupled, the bigger they will scale, the more fault tolerant they will be, the less dependencies they will have,and the faster you will innovate!
Event sourcing
Capture all changes to an application state as a sequence of events.
Application designers should build fault tolerance into the architecture and not expect the infrastructure to provide it for them.
It is impossible for a distributed system to simultaneously provide all three of the following guarantees:
Consistency
Availability
Partition tolerance
In the presence of a network partition event we need to choose between consistency and availability
An event bus
Allows publish-subscribe-style communication between components without requiring the components to explicitly register with one another (and thus be aware of each other).
Content
Microservice owns rollback
Every microservice exposes its own rollback method
When designing your microservices, it’s best practice to not only have your core function but to also implement a rollback method, and if you implement staged commit, expose a commit method.
That way, in the event of a failure, the Transaction Manager service can initiate a rollback for all microservices.
Create a “Transaction Manager” microservice that notifies all relevant microservices to rollback or take action.
Solution:
If asynchronous, use event-driven approach with DynamoDB Streams
Focus on the data of interest
Use clean-up functions
Kinesis
Amazon SQS
Amazon SNS
Preventing I/O explosion
One way to overcome an I/O explosion is to introduce a cache.
Another solution is to use an inversion of control/dependency injection pattern.
The initial call contains everything that the service needs to action the request and it doesn’t have to call the other APIs.
To handle the complexity we need coordination and visibility
Solution: Step Functions
Refers to a performance-limiting phenomenon that occurs when a line of packets is held up by the first packet.
It can occur in network devices, transport protocols (for eg, in HTTP pipelining), or distributed systems (for eg, slow consumption of a message from a kafka topic partition).
Example:
In HTTP/1.0, by default, only one HTTP request could be sent over a single TCP connection and its response received unless Connection: keep-alive header is used.
This changed with HTTP/1.1 which made persistent connections a default behavior without the need to specify the Connection: keep-alive header.
Queues can be used to buffer these requests so that they can be completed after the congestion is cleared.
Avoid single point of failure (SPoF)
Implement throttling
Design for failure
Implement caching
Ensure idempotency
Use data sharding
Apply decoupling
Implement circuit breakers
Manage statelessness
Highly available systems
Redundancy at the component level
Content