If you have ever faced
Redis timeouts issue, you probably know that there is no single reason for it. For this, you have to investigate and understand your particular situation. I have seen
Redis timeouts in all my projects and every time it was a different cause and a different fix needed. In this article, I provide the most probable causes and solutions. I'm going to update this article if I find something new.
1. Burst of traffic
In my opinion, this one is the most common reason and it's related to the way how
ThreadPool is implemented in
ThreadPool provides two types of threads:
IOCP. Each has defined minimum. By default, the minimum is set to the number of processors on a system. So for example on Azure P2v2 which has two cores, the minimum is set to 2.
ThreadPool provides new threads on demand until it reaches a minimum. Then it will throttle the rate at which it injects new threads to one thread per 500 milliseconds. The
StackExchange.Redis has dedicated internal pool of threads, and if it's failing to keep up, additional work will be offered to the global
Redis exceptions contain useful information about a number of minimum and busy threads, for example:
Timeout performing EVAL, inst: 187, mgr: Inactive, err: never, queue: 74, qu: 1, qs: 73, qc: 0, wr: 0, wq: 0, in: 65536, ar: 0, IOCP: (Busy=1,Free=999,Min=2,Max=1000), WORKER: (Busy=80,Free=32687,Min=2,Max=32767), clientName: RD281878737F80
The number of busy worker threads is 80 and the minimum is set to 2. So 78 threads * 500 ms = 39s. 80th thread waited at least 39s to be created. This particular exception has been thrown because
Redis operation waited too long for the free thread. Additionally, when a new thread is added to the
ThreadPool, or existing one becomes free to process work, there is no guarantee that it will pick the
Redis operation. It can pick the incoming ASP.NET request instead.
For all these reasons, when application gets burst of traffic that needs more threads than the minimum,
Redis can hit timeouts.
We can change the minimum
IOCP threads for the system. We can do that in
<processModel> (https://docs.microsoft.com/en-us/previous-versions/dotnet/netframework-4.0/7w2sway1(v=vs.100)) but on Azure, we need to call
ThreadPool.SetMinThreads API during application startup. In Sitecore, the best place is to put a processor to the
Fortunately, Sitecore provided a patch with a processor that allows configuring these values: https://kb.sitecore.net/articles/464570. It also reduces the max size of
ThreadPool to prevent CPU over-usage and it limits the max concurrent requests at one time, to prevent
What values are correct?
The Microsoft documentation recommends to set a minimum to 200 or 300 threads and then test and tweak as needed. These values are split across all cores. On the other hand, Sitecore's patch set's the minimum to 15 Worker threads per core and 10 IOCP threads per core. My preference is to use Sitecore's default and then tweak as needed.
2. High CPU
If your CD server has periods of time with CPU around 100% utilization, then there could be not enough CPU power to process
Redis threads in those periods of time. In one of my projects where we use
jpegoptim processor, and at that time we were at S2 plan, the CPU during image optimization with
jpegoptim went to 100% for even 60s for some very large, high-quality images. After looking at logs and reading metrics, we have found a correlation between high CPU, image optimization and
In our case, we scaled up the plan from S2 to P1V2 (which costs the same amount of money) and our problems disappeared. The P1V2 has only one core, but the CPU is much better, so it handles image optimization much more quickly.
3. Bandwidth limit on Redis
If you transfer a large number of data to and from
Redis you can hit a bandwidth limit. Microsoft provides a nice table with observed bandwidth limit on a various pricing tiers https://docs.microsoft.com/en-us/azure/azure-cache-for-redis/cache-faq#cache-performance. To see how many data is transferred to and from
Redis, browse to your cache instance in the Azure portal, go to the Metrics blade and add two metrics:
Cache Read and
Cache Write with
The obvious solution is to scale your
Redis up to have more bandwidth. Another solution is to review your application and check if you can store fewer data in