A MicroService should consist of services/Apps which are compliant with 12-Factor App. These factors are:
I. Codebase: One codebase tracked in revision control, many deploys. If you have many codebases then it is a distributed system, not an App – for this case, every component in the distributed system should be an App. If multiple codebases need to share same code, then create libraries for these, and follow the next point.
II. Dependencies: Explicitly declare and isolate dependencies. Never rely on system packages/libraries/tools or on any un-declared/secret resources.
III. Config: Store config in the environment – such that the entire codebase can be mae opensource without leaking any secret configuration value.
IV. Backing services: Treat backing services (or simply services) as attached resources. Don’t differentiate between local/in-house and third-party services – any service is just a resource accessed by a URL/credentials/locators specified in config. Since config is isolated by code replacing one service by another should just be a config change and code should remain untouched.
V. Build, release, run: Strictly separate build and run stages. In build phase only the codebase should be built to create a binary. Config is kept separate – release phase combines these two. If anything goes wrong after release (during run), we can rollback to a previous release or give hotfix and make another release – to make it possible every release must have an immutable incremental version.
VI. Processes: Execute the App as one or more stateless processes. To communicate with each other the processes may share data via a stateful-persistent backing service (like a database). The App must not assume any data that is cached in any non-persistent medium to be available in future. Also concepts like sticky-session are not allowed as this results in creation of stateful processes.
VII. Port binding: Export services via port binding – the port must vary in runtime and outside user can send requests to this port.
VIII. Concurrency: Scale out via the process model. Although each process can have internal threads, but mostly for each type of work create a new process – this will help in horizontal scaling in future. Don’t demonize the processes, always keep them running in foreground.
IX. Disposability: Maximize robustness with fast startup and graceful shutdown – both startup and shutdown should be quick. Also shutdown should be graceful, that means even if a sudden crash occurs, the App should be able to pick-up the unfinished job it was doing once rebooted.
X. Dev/prod parity: Keep development, staging, and production as similar as possible. Reduce the development-to-deployment time gap. Reduce people-gap by engaging developer also in Production deployment activity. Reduce tool/library-gap by asking developers to use same tools/libraries during development as in Production.
XI. Logs: Treat logs as event streams. Ideally the App should just write events in stdout (or in any logging framework) and should not bother about the creation of a logfile. The execution environment will collect the logs from stdout (or from the framework) and channelize to specific destinations. Logs should be written in a fashion that these can be fed to event-stream analysers.
XII. Admin processes: Run admin/management tasks as one-off/one-time processes. These admin tasks (like DB migration, fixing corrupt records/files, creating directories etc) should be run using the same codebase and config as the App. Also, the scripts/codes for these admin tasks should be part of the App codebase.
The goal of microservices is to increase the velocity of application releases, by decomposing the application into small autonomous services that can be deployed independently. A microservices architecture also brings some challenges. The design patterns shown here can help mitigate these challenges.
MicroService comes with a few challenges: Availability (av), Data Management (dm), Design and Implementation (di), Messaging (msg), Management and Monitoring (mm), Performance and Scalability (ps), Resiliency (rs), Security (sc).
Ambassador (di, mm): can be used to offload common client connectivity tasks such as monitoring, logging, routing, authentication and security (such as TLS), retrials, circuit-breaking in a language agnostic way. Ambassador services are often deployed in the same host with the main application.
Anti-corruption Layer (di, mm): implements a façade between new and legacy applications, to ensure that the design of a new application is not limited by dependencies on legacy systems. It also rectifies/purifies/augments data/information coming from legacy system. It can also help to perform ETL task on legacy system data.
Asynchronous Request-Reply (msg): Decouple backend processing from a frontend host, where backend processing needs to be asynchronous, but the frontend still needs a clear response. In such cases client sends request to service, and service quickly responds HTTP-202 (Accepted). The response holds a location reference pointing to an endpoint that the client can poll to check for the result of the long running operation. While the work is still pending, the status endpoint returns HTTP 202. Once the work is complete, the status endpoint can either return a resource that indicates completion, or redirect to another resource URL. For example, if the asynchronous operation creates a new resource, the status endpoint would redirect to the URL for that resource.
Backends for Frontends (di): creates separate backend services for different types of clients, such as desktop and mobile. That way, a single backend service doesn't need to handle the conflicting requirements of various client types. This pattern can help keep each microservice simple, by separating client-specific concerns.
Bulkhead (rs): isolates critical resources, such as connection pool, memory, and CPU, for each workload or service. By using bulkheads, a single workload (or service) can't consume all of the resources, starving others. This pattern increases the resiliency of the system by preventing cascading failures caused by one service.
Choreography (msg, ps): Have each component of the system participate in the decision-making process about the workflow of a business transaction, instead of relying on a central point of control. Usually the client puts a message in the queue, and services consume that message and write back their responses in the queue (may be the same queue, or a different one). Other services can further consume these new messages and act accordingly. Here each service independently decides what to do on the basis of the message – no single Orchestrator dictates the services here.
Circuit Breaker (rs): Handle faults that might take a variable amount of time to recover from, when connecting to a remote service or resource. This can improve the stability and resiliency of an application. A circuit breaker acts as a proxy for operations that might fail. The circuit breaker receives the client requests and propagates to the actual service an monitors the response – this is the regular (Closed) state of the circuit. If the service fails to respond, and if the failure keeps on mounting and goes beyond a threshold the circuit goes to the Open state (for a specific time interval). Any request coming during this interval will be immediately responded by an error message by the broker itself (call will not be transferred to the actual service). After the time-interval expires, the circuit goes to a Half-Open state, and now it allows a few selected requests and closely observes the responses. If the responses are good the circuit moves to the good (Closed) state, otherwise again moves to Open state.
CQRS (dm, di, ps): Command Query Responsibility Segregation patterns asks for segregating the database-write (command) and database-read (query) services. Sometimes the data-models used for write and read can be completely different. Sometimes, even the database to write can be separated out from the database to read – in this case as soon as a data is written into a write-db, in the same transaction an event is published, and the service(s) responsible for read-db will listen to this event and asynchronously update the read-db. This type of segregation also promotes eventual-consistency and usage of different technologies for write and read (i.e. write-db can be RDBMS whereas read-db can be NoSql or some kind of Materialized Views).
Deployment Stamp (av, ps): The deployment stamp pattern involves deploying multiple independent copies of application components, including data stores. Each individual copy is called a stamp, or sometimes a service unit or scale unit. This approach can improve the scalability of your solution, allow you to deploy instances across multiple regions, and separate your customer data – this pattern helps horizontal scaling.
Distributed Tracing (mm): In a microservice architecture, requests often span multiple services. Each service handles a request by performing one or more operations across multiple services. While in troubleshoot it is worth to have trace ID, we trace a request end-to-end. The solution is to introduce a transaction ID. Follow approach can be used;
Assigns each external request a unique external request-id.
Passes the external request id to all services.
Includes the external request-id in all log messages.
Event Sourcing (dm, ps): Instead of storing just the current state of the data in a domain, use an append-only store to record the full series of actions taken on that data. The store acts as the system of record and can be used to materialize the domain objects. This can simplify tasks in complex domains, by avoiding the need to synchronize the data model and the business domain, while improving performance, scalability, and responsiveness. It can also provide consistency for transactional data, and maintain full audit trails and history that can enable compensating actions.
Federated Identity (sc): Delegate authentication to an external identity provider. This can simplify development, minimize the requirement for user administration, and improve the user experience of the application.
Gateway Aggregation (di, mm): aggregates requests to multiple individual microservices into a single request, reducing chattiness between consumers and services. This pattern is useful when a client must make multiple calls to different backend systems to perform an single business operation. Suppose a client has to call service A, and then get its response and send it as a request to service B, and then get its response and send that as a request to service 3, and finally get the final response from this. We can apply a Gateway Aggregator to do this response-request propagation stuffs, so that the client will only make one call to the gateway an get final response from it.
Gateway Offloading (di, mm): enables each microservice to offload shared service functionality, such as the use of SSL certificates, authentication, authorization, throttling etc. to an API gateway.
Gateway Routing (di, mm): routes requests to multiple microservices using a single endpoint, so that consumers don't need to manage many separate endpoints. For example a page in an e-commerce website may contain information from multiple services (product, seller, review, payment, search, cart, order history etc). To render the page we must send individual request to these many services – but sending so many requests is cumbersome, thus the page should be redesigned in such a way it will send only one request to a Gateway Router and it will in turn send separate calls to those individual services, and after that collect and combine their responses and send the single response to the requester.
Health Endpoint Monitoring (av, mm, rs): Implement functional checks in an application that external tools can access through exposed endpoints at regular intervals. This can help to verify that applications and services are performing correctly.
Observability Pattern: Log Aggregation Consider a use case where an application consists of multiple services. Requests often span multiple service instances. Each service instance generates a log file in a standardized format. We need a centralized logging service that aggregates logs from each service instance. Users can search and analyze the logs. They can configure alerts that are triggered when certain messages appear in the logs. For example, PCF does have Log aggregator, which collects logs from each component (router, controller, diego, etc…) of the PCF platform along with applications. AWS Cloud Watch also does the same. Performance Metrics When the service portfolio increases due to a microservice architecture, it becomes critical to keep a watch on the transactions so that patterns can be monitored and alerts sent when an issue happens. A metrics service is required to gather statistics about individual operations. It should aggregate the metrics of an application service, which provides reporting and alerting. There are two models for aggregating metrics:
· Push — the service pushes metrics to the metrics service e.g. NewRelic, AppDynamics.
· Pull — the metrics services pulls metrics from the service e.g. Prometheus.
Queue based Load Leveling (av, msg, ps, rs): Use a queue that acts as a buffer between a task and a service it invokes in order to smooth intermittent heavy loads that can cause the service to fail or the task to time out. This can help to minimize the impact of peaks in demand on availability and responsiveness for both the task and the service.
Retry (rs): Enable an application to handle transient failures when it tries to connect to a service or network resource, by transparently retrying a failed operation. This can improve the stability of the application. If a service fails to respond to a request, the system can have one of 3 strategies—either it can cancel the request and return with error response, or the system can keep on retrying the request (keeping equal intervals in between) for a specific number of times, or can retry requests with adding increasing delay between requests – may also add increasing/exponential random delays in between.
Sidecar (di, mm): deploys helper components of an application as a separate container or process to provide isolation and encapsulation. Applications and services often require related functionality, such as monitoring, logging, configuration, and networking services. These peripheral tasks can be implemented as separate components or services. Often Ambassador pattern is implemented as a side-car.
Strangler (di, mm): supports incremental refactoring/migration of an application, by gradually replacing specific pieces of functionality with new services. Completely replacing a complex system can be a huge undertaking. Often, you will need a gradual migration to a new system, while keeping the old system to handle features that haven't been migrated yet. However, running two separate versions of an application means that clients have to know where particular features are located. Every time a feature or service is migrated, clients need to be updated to point to the new location.
Throttling/Rate Limiting: control the consumption resources by clients. It can be done by – either blocking a client temporarily, or by stopping a (non-severe) service temporarily in order to free more system resources for other important services, or by adding load balancers/message queues to distribute load more evenly across services.
Recreate: Most classical way. Version A is shutdown, then version B is rolled out – needs downtime and server restart.
Ramped (also known as rolling-update or incremental): Version B is slowly rolled out and replacing version A – that means in intermediate days both versions will be live, count of A gradually reducing and B gradually increasing.
Blue/Green: Version B is released alongside version A, then the traffic is switched to version B in a single shot at load balancer level.
Canary: Version B is released to a subset of users, then proceed to a full rollout. Here also for a brief period time both versions will be live together like Ramped deployment – but the difference is that here the percentage of requests going to A gradually reduced and percentage of requests going to B gradually increased (request volume in manipulated, not the server instances like Ramped).
A/B testing: Version B is released to a subset of users under specific condition. If statistics collected from Version B users look ok, then all users are moved to it. A practical scenario would be to deploy Version B to only mobile-phone users, and desktop users continue with Version A. Or, may be release Version B only for Europe, and rest of the world continues with Version A – and finally when results from users of Version B look ok, then switch everybody version B.
Shadow: Version B receives real-world traffic alongside version A and doesn’t impact the response. This technique is fairly complex to setup and needs special requirements, especially with egress traffic. For example, given a shopping cart platform, if you want to shadow test the payment service you can end-up having customers paying twice for their order. In this case, you can solve it by creating a mocking service that replicates the response from the provider.
Service Mesh
In a MicroService world, many instances of many services communicate with each other – as these instances are variable in number, it becomes very difficult for every instance to keep track of other instances (as any instance may go down anytime, or a new instance may come up anytime, or a path in the network may become unstable anytime, or an existing instance may get slow in responding due to heavy load anytime etc).
A service mesh is an architectural form that addresses these challenges. It aims to dynamically connect these microservices in a way that reduces administrative and programming overhead. A service mesh automatically takes care of discovering and connecting services on a moment to moment basis so that both human developers and individual microservices don’t have to. Think of a service mesh as the equivalent of software-defined networking (SDN).
A service-mesh can do the following:
Authenticate and authorize requests coming from outside world, as well as coming from other services.
Keep track of the health of the services and the traffic load, and do the software-level load balancing.
Encrypt traffic.
Service discovery.
Circuit break if needed.
Sometimes Kubernetes is confused with a service mesh—in fact Kubernetes’ “service” resources are very basic kind of service-mesh as these can provide dynamic service-discovery and round-robin load balancing for requests.
Sometimes API Gateways are also confused with service-mesh. But an API gateway stands between a group of microservices and the “outside” world, routing service requests as necessary so that the requester doesn’t need to know that it’s dealing with a microservices-based application. A service mesh, on the other hand, mediates requests “inside” the microservices app, with the various components being fully aware of their environment.
Most popular service-mesh mechanism used these days is called “sidecar-proxy” – Every microservices container in a service mesh of this type has another proxy container corresponding to it. All of the logic required for service-to-service communication is abstracted out of the microservice and put into the sidecar.
A few popular service-mesh softwares:
Linkerd (pronounced “linker-dee”)—Released in 2016, and thus the oldest of these offerings. Created by Twitter.
Envoy – Created by Lyft. Serves in the “data-plane”. To fully utilize it you need a mesh that would work in the “control-plane”.
Istio – Created by Lyft, IBM and Google – works in “control-plane”. Works naturally with Envoy.
CAP Theorem simplified
If you ever worked with any NoSQL database, you must have heard about CAP theorem. Mr. Brewer spoke about this theorem at Symposium on Principles of Distributed Computing many years way back in 2000.
Let’s start the story. Srinivas started a restaurant. After careful examination, he started delivery by a phone call. He hired few delivery boys whom he got at very cheaper rates.
Day5: Srinivas chose to operate the call himself while sitting at billing counter. Morning it was lull period. But from 7 pm onwards he started getting many calls. Whatever the order he gets, he writes on a paper, gives it to kitchen and …boom… it is cooked (well, not every time) and delivered to the customer. Around 8:30 pm he saw one customer walking to him. He is gasping for breath and apparently looks angry (maybe hungry inside). “I have been calling for last 30 minutes. Your phone is always engaged. I had to walk for 20 mins to come here to place the order. I am not happy.”
Idea time: Srinivas apparently was not happy and shaken to the core. After some disturbed sleep and thinking time, he got a brilliant idea. “Let me hire one more operator who can take the calls. If one line is engaged, another person will pick up.” It took a week to onboard new person while he dealt with fuming customers for a week. This is improving “Availability”.
Day 15: Now new employee Raj is on-boarded and Srinivas is delighted. Customer’s waiting time in the call drastically reduced. If one line is engaged, calls are automatically transferred to second line. Between Srinivas and Raj things are working well. They were able to take orders and process them.
Day 27: At 8:00 pm Srinivas got a call from a customer. “I placed an order 45 minutes back. What is the status?” Srinivas took his phone number and name and tried looking at his order list. He doesn’t have it. He looked at Raj who is next to him. Raj is busy in taking other orders. He can’t disturb him. Srinivas apologised and asked the customer to wait for 2 mins. Customer is already unhappy and making him wait made him furious. He said “Cancel my order” and disconnected the phone. God fearing Srinivas is again distressed.
Idea time: Srinivas thought bit more about it. He also realized this kind of situation will come to Raj as well. After some thinking time…….Eureka — he found a solution. Next day he agreed with Raj to exchange order details as soon as they take orders. For example order number 223 was taken by Srinivas. He will have the original order and pass the copy of the order details to Raj. Similarly order number 224 was taken by Raj and he will pass the copy to Srinivas. Now they both have all the order details. Later if a customer asks the status, they can answer without keeping the customer in waiting. This is having “Consistency”
Day 283: Everything is going on well so far. The business increased multifold. Now he has 3 people taking orders and he built 1 kitchen. Both Srinivas and Raj are not doing this work anymore. New team is Suma, Ramesh and Supriya. They are young, vibrant and nonchalant. As for the previous process, each of them updates other two on the orders.
Day 289: All well and good until one fine day. Like in a bollywood movie, Supriya fell in love with Ramesh and Ramesh fell in love with Suma. And things started becoming complicated and Supriya started feeling like a loser. Things became worse with time. Both Ramesh and Suma stopped communicating the order details to Supriya and Supriya also did same. This led to broken communication. Now pretty much things went back to day 1. There is no “Partition tolerance” The only way service can be made available and consistent is by getting rid of either Ramesh and Suma or Supriya or making them work together. Or otherwise you can make system “Available” but with inconsistent data.
Lets come back to reality of our IT world.
CAP stands for Consistency, Availability and Partition Tolerance.
Consistency (C ): All nodes see the same data at the same time. What you write you is what you get to read.
Availability (A): A guarantee that every request receives a response about whether it was successful or failed. Whether you want to read or write you will get some response back.
Partition tolerance (P): The system continues to operate despite arbitrary message loss or failure of part of the system. Irrespective of communication cut down among the nodes, system still works.
Often CAP theorem is misunderstood. It is not any 2 out of 3. Key point here is P is not visible to your customer. It is Technology solution to enable C and A. Customer can only experience C and A.
P is driven by wires, electricity, software and hardware and none of us has any control and often P may not be met. If P is existing, there is no challenge with A and C (except for latency issues). The problem comes when P is not met. Now we have two choices to make.
AP: When there is no partition tolerance, the system is available but with inconsistent data.
CP: When there is no partition tolerance, system is not fully available. But the data is consistent.
Following is the famous CAP triangle and some popular databases: