Troubleshooting Redis Connection Reset Error: Causes and Solutions in Spring Boot with Lettuce Client

Last month, several interfaces in the online production environment experienced abnormal responses. Upon reviewing the production logs, the following error was found: Redis connection reset.

Redis connection reset

The online Redis client uses the default Lettuce client from SpringBoot and does not specify a connection pool. The error connection reset by peer occurs when the current client connection is unexpectedly terminated by the server, meaning that the server has terminated the current Redis connection, but the client is unaware. When a request comes in and Lettuce continues using the current Redis connection to request data, it will prompt connection reset by peer.

Generally, the server will send a FIN packet to notify the client when disconnecting. However, when I monitored the server’s TCP transmission using tcpdump, I found that the Redis server’s TCP connection receives a RST packet from the client after a period of inactivity, such as 10 minutes. However, my client was also using Wireshark for packet capture and did not send a RST packet to the server. This peculiar behavior suggests that it may be due to the server’s restrictions on TCP connections, forcibly disconnecting those inactive for extended periods. Thus, the occasional connection reset by peer error in the online environment’s Redis connection was reproduced by me.

Now that it’s clear the error is caused by Redis connections being interrupted after prolonged inactivity, how do we resolve this bug?

The author’s initial thought was to solve it with retries, but it turned out not to be that simple. Here’s the code:

   // Query Redis    public  T getCacheObject(final String key) {        try {            ValueOperations<String, T> operation = redisTemplate.opsForValue();            return operation.get(key);        } catch (Exception e) {            log.error(e.getMessage(), e);            return retryGetCacheObject(key, 3);        }    }   // Retry querying Redis    public  T retryGetCacheObject(final String key, int retryCount) {        try {            log.info("retryGetCacheObject, key:{}, retryCount:{}", key, retryCount);            if (retryCount <= 0) {                return null;            }            Thread.sleep(200L);            retryCount--;            ValueOperations<String, T> operation = redisTemplate.opsForValue();            return operation.get(key);        } catch (Exception e) {            log.error(e.getMessage(), e);            return retryGetCacheObject(key, retryCount);        }    }

The code above means that after an exception occurs during the first query to Redis, it retries every 200 milliseconds up to 3 times. During actual operation, it was noted that this would prompt the connection reset by peer error three times without acquiring a new Redis connection.

At this point, my solution to this issue is essentially about how to create a new connection to replace one that fails.

Let’s cut to the chase with the code:

    // Lettuce connection factory    @Autowired    private LettuceConnectionFactory lettuceConnectionFactory;    /**     * Retrieve the basic cache object.     *     * @param key Cache key     * @return The data corresponding to the cache key     */    public  T getCacheObject(final String key) {        try {            ValueOperations<String, T> operation = redisTemplate.opsForValue();            return operation.get(key);        } catch (Exception e) {            log.error(e.getMessage(), e);            return retryGetCacheObject(key, 1);        }    }    public  T retryGetCacheObject(final String key, int retryCount) {        try {            log.info("retryGetCacheObject, key:{}, retryCount:{}", key, retryCount);            if (retryCount <= 0) {                return null;            }            lettuceConnectionFactory.resetConnection();            Thread.sleep(200L);            retryCount--;            ValueOperations<String, T> operation = redisTemplate.opsForValue();            return operation.get(key);        } catch (Exception e) {            log.error(e.getMessage(), e);            return retryGetCacheObject(key, retryCount);        }    }

When an exception occurs in obtaining data with the current Redis connection after exceeding the timeout interval, it throws an exception and enters the retry method using lettuceConnectionFactory.resetConnection() to reset the connection. It creates a new connection to continue obtaining data, thus normally responding to the client. The lettuceConnectionFactory object is an implementation of a non-pooled connection factory in Lettuce, providing methods to obtain, initialize, and reset connections, such as lettuceConnectionFactory.getConnection(); lettuceConnectionFactory.initConnection(); lettuceConnectionFactory.resetConnection();. Configuring timeout in springboot sets the data retrieval timeout to 2 seconds, effectively controlling the interface request time to around 2 seconds.

  redis:    xx: xx    timeout: 2000

With this, the bug of occasional disconnection in Lettuce client non-pooled connections under the SpringBoot project in the production environment is considered solved.

Finally, here’s the project link in action newbeemall, a mybatis plus version of the newbee-mall platform, implementing coupon collection, Alipay sandbox payment, back-end search addition, RedisSearch word segmentation retrieval