Introduction
At Meesho, we pride ourselves in running our tech infrastructure in the most cost-efficient manner while achieving high availability and performance. We recently focused on optimizing Redis for one of our use cases where infra costs were ballooning.
We drove a 90% reduction in our Redis infrastructure costs. Continue reading to learn about the strategies we employed to achieve this outcome and how you can apply these cost-saving techniques to your own projects!
Problem Statement
At Meesho, we provide suppliers with the opportunity to showcase advertisements on products, enabling them to increase the visibility of their products that might otherwise go unnoticed by users. The primary objective of this feature is to enhance their Return on Investment (ROI). Nonetheless, we acknowledge the significance of preserving a smooth user experience. Striking a balance, we developed a filtering mechanism that screens products for users based on their previous interactions over some set time period. This ensures that users are not inundated with repetitive content, all the while empowering suppliers to effectively reach their target audience.
To successfully implement the Filter Mechanism, we required the following essential components:
- Increment Method: A method capable of incrementing and setting the interactions in a single call to minimize Input/Output Operations Per Second (IOPS) and optimize efficiency.
- TTL (Time-to-Live): The ability to set a time-based expiration for keys, ensuring that interaction data is automatically removed after the set time period, preventing unnecessary data buildup.
- High Throughput and Low Response Time Data Source: Given the significant volume of traffic we encounter, real-time filtering is crucial during service. Therefore, obtaining the necessary data with minimal response time and ensuring high throughput support is essential.
To meet the specified requirements, Redis proved to be an ideal choice, offering all the necessary components. Additionally, since the stored data is not crucial to the mission, it can be stored as a cache without the need for persistent storage
Solutions
Generation - 1: Implementation using Redis Key Value
In the initial stages of development, the following algorithm was constructed:
- A combined Redis Key consisting of User & Product, with a Time-To-Live (TTL) set at 24 hours.
- The number of interactions for a product by a user is stored as the associated Value.
- Whenever an interaction event for a product by a user is received, the corresponding Value is incremented.
- When serving, products with a value exceeding a specified threshold are filtered.
Redis Schema for this algorithm:
The subsequent section outlines the memory utilization in Redis using the aforementioned algorithm for a single user:
In the above graph X-axis shows the number of products an user has interacted with and Y-axis shows the memory consumption on Redis. From this, it is evident that as user product interactions increase, there is a noticeable sharp rise in memory usage. This poses a concern, particularly at scale, where the rate of memory consumption becomes substantial.
After conducting an in-depth analysis of the implemented algorithm, we discovered a significant increase in the number of keys in Redis with the increase in interaction. This occurs from our choice of using a combined key comprising both ProductId and UserId. Consequently, the total number of keys in Redis equals the product of the number of users and the number of products. In Redis, each key incurs an overhead of approximately 40 bytes due to its maintenance within the Redis bucket. As a result, a higher number of keys directly translates to increased memory consumption.
Now our objective was to devise a new data structure that can effectively reduce the number of keys in Redis while still meeting our operational requirements.
Generation - 2: Implementation using Redis Hash
In our quest to devise a novel data structure aimed at reducing the sheer volume of keys, we planned on breaking the concatenation of UserId and ProductId in the key since having only UserId or ProductId in the primary key would not increase the total number of keys in redis with the increase in interaction between fixed set of products and users hence reduce the memory consumption.
Now redis has few data structures where one can directly access the individual items stored inside it like Redis Set and Redis Hash. For our use case we picked Redis Hash as we needed to store the mapping of ProductId and Interaction count against the primary key which is UserId. Details on Redis Hash can be found here.
In essence, Redis Hash functions is a data type for values stored within Redis. It bears a resemblance to a dictionary stored as a value, where individual keys can be directly accessed via Redis commands.
We decided to make UserId as primary key and store mapping of ProductId and Interaction Count as value because of following challenges.
Challenges with using Redis Hash as data type in Redis
Challenge - 1: hash-max-ziplist-entries
Redis Hashes are internally optimized using ziplist which essentially a specially encoded doubly linked list that is optimized for memory savings and redis preserves this structure until the number of keys in the Redis Hash reaches or exceeds the configuration parameter known as hash-max-ziplist-entries in Redis. More details on internals can be found in a blog post here.
Once this threshold is crossed, Redis Hash keys transition to being maintained in a Redis bucket, much like the primary Redis keys themselves. Consequently, it becomes paramount to carefully adjust the hash-max-ziplist-entries parameter to align with your specific requirements. Setting it too high can lead to heightened response times for Redis commands, given that keys continue to be stored in list format for hash sizes up to the hash-max-ziplist-entries limit. We have collected data that highlights the impact on space consumption in Redis as the hash key size fluctuates across different hash-max-ziplist-entries values.
For additional insights on this configuration, please refer to the official Redis documentation and an informative blog post by Peter Bengtsson.
Following are the graphs depicting increase in memory against the increase in Redis hash size for different hash-max-ziplist-entries.
As we can see in the above graphs as the number of products stored in Redis hash increases beyond the hash-max-ziplist-entries memory consumption shoots up.
Now with UserId as the primary key user can interact with limited products in a specified time period.
We opted to retain the 'hash-max-ziplist-entries' parameter at its default value of 512. This choice aligns with our data analysis, which indicates that 95% of users on our platform interact with no more than 512 ad products in the required time period. Consequently, any potential impact from the remaining 5% of users exceeding this limit is negligible, and it ensures that our Redis operations continue to deliver optimal response times.
Challenge- 2: Time-to-live(TTL)
Keys inside Redis Hash cannot have its individual TTL. TTL can only be assigned for the primary Redis key and once it expires the entire Redis hash linked as value to this key gets removed. Now this will be the problem where we want to individually control the expiry of keys inside hash and not at the primary key level as no matter when the new item was added in the Redis Hash all stored items in hash will get evicted when primary key expires hence correctly choosing the primary key is important.
With UserId as the primary key we were able to control the expiry of all their interactions after the given time period on the contrary with productId as primary key we would not be able to control the expiry at User level.
Implementation Approach
In light of the challenges mentioned above, we have implemented the following algorithm:
- Redis Key now corresponds to the UserId, set with a TTL (Time-To-Live) until specified time period.
- The value associated with this Redis Key is a Redis Hash, wherein the product id serves as the hash key, and its value represents the interactions on the product.
- Whenever an user interacts with a particular product, the respective Redis Hash value is incremented.
- During product retrieval, we filter out products with values surpassing a predefined threshold.
Redis Schema for this algorithm:
Redis Lua Script to Increment Hash Field and Set Expiry
We opted for Lua Script to increment hash fields and set expiry because, otherwise, we would need to make two separate calls to Redis for each interaction: one for incrementing and another for setting the expiry of the primary key. However, using a large Lua script in Redis presents a challenge; when the script contains numerous commands, it can impact the performance of your application's Redis, causing other commands to be halted until the Lua script is executed. To mitigate this, we designed our script to include only two commands, preventing such performance issues.
In the above Lua script, we use the Redis hincrby command to increment a specified field within a hash data structure. If the incremented value becomes 1 (indicating that the field was previously non-existent), we set an expiry using the expire command on the hash key to control its lifetime in Redis.
Configuring RedisScript in Spring Boot
In your Spring Boot application, create a configuration class to load and configure the Lua script as a RedisScript bean.
Lua Script Execution
Conclusion
When we implemented the Final Approach with no alteration in data volume, our Redis total size consumption decreased significantly, going from 1 terabyte to under 100 gigabytes. This resulted in a remarkable 90% cost reduction on Redis. This illustrates the substantial space optimization achievable when using Redis Hash in the appropriate context.
In the above graph, X-axis shows the time and Y-axis shows the memory consumption on Redis. Memory consumption while using Redis Hash is significantly dropped compared to Redis key value.
When utilizing Redis Hash or zip-list, it's essential to consider the challenges we discussed, such as the trade-off between performance and space optimization, which requires careful tuning, and the constraint of having a common TTL for entries within a single Hash.
Appendix
- https://redis.io/docs/data-types
- https://scalegrid.io/blog/introduction-to-redis-data-structures-hashes
- https://redis.io/docs/management/optimization/memory-optimization/
- https://www.peterbe.com/plog/understanding-redis-hash-max-ziplist-entries