I know quick-commerce is the rage nowadays, but I still prefer buying my groceries at the supermarket. That’s just me 🙏🏻
When I went shopping last weekend, the sabjiwala offered me the usual: cauliflower, spinach, brinjal, etc. But I was hovering near avocados, so I was immediately offered broccoli, kiwis, and other imported fruits and vegetables.
In essence, this is also how product recommendations work on our app (with a 100 more complications 😝).
We recommend products to a user (either a direct customer or a reseller) based on previous interactions on the app. But that’s on the user end. On the other hand, we also onboard thousands of sellers and their catalogue of products every day.
The fresh catalogues haven’t had any interaction with a user. There is no relevance to a user yet — so these catalogues won’t be recommended to a user.
This phenomenon is commonly referred to as the, ‘cold start recommendation problem’.
This is the story of how our Data Science team solved it.
Catalogue Discovery Pipeline
We have a set system for new catalogues to be onboarded on our platform — without following this predefined process, a catalogue won’t be shown to user.
Sellers use our user-friendly dashboard to upload catalogues.
Sellers also provide images, description, and other relevant properties like primary category and secondary categories for the catalogue.
Quality Assurance & Quality Control
We conduct various quality checks on uploaded catalogues to ensure the right images, description, and details. Catalogues that don’t meet our criteria don’t pass through this step.
Catalogue Uniqueness Assessment
Uploaded catalogues vary in degrees of similarity to products already available on the platform. But the new catalogue may contain one or more products similar to existing items.
We make an assessment for newness and uniqueness. A corresponding label is assigned to the catalogue. This step helps in identifying the new trends and brand new types of products provided by our suppliers, which users on the platform may not have seen before.
Of course, we’re assisted by machine learning algorithms during this process, but that’s for another blog!
Catalogue Activation & Recommendation
After the cataloguing and quality checks are completed, the catalogue is activated on the platform and can be picked up by downstream systems for various recommendation, search and ads use cases.
These downstream tasks ingest the new activated catalogues in batch mode with different frequencies. In practice, some of the steps described above can happen in parallel, e.g. QA & QC and Catalogue Uniqueness Assessment, but are presented linearly for simplicity of explanation.
Current Recommendation Approach
All activated catalogues in a predetermined historical window are passed to the candidate generator. This creates a list of candidate catalogue sets.
The candidate sets are assigned to users through a scalable match-making algorithm. This algorithm heuristically maximises a fairness objective.
Point to note: catalogue sets may have overlapping catalogues, but since they are uniquely assigned to users, the catalogues are not duplicated for the same users in a batch.
Finally, the assigned catalogues are ranked based on recency, weighted on catalogue uniqueness and are then served to the users along with recommendations on the Home Page feed.
In a nutshell, the new catalogue recommendation problem is defined as:
Sum over p_ij*x_ij is the catalogue assignment objective we want to maximise, subject to target engagement constraints (which can be one or many), number of recommendations per user and the age of catalogue on the platform.
When the same catalogue pool can be served through multiple real estates, the engagement constraints can be at the platform level.
In our case, we have a view based business driven fairness objective for different uniqueness groups defined during the catalogue assessment process.
We maximise this fairness objective globally using heuristics.
While the objective is defined at (user, catalog) level, we found scalable ways of achieving our objective at (user, catalog_set) level.
This assignment of catalogue sets to users runs as a scheduled batch job throughout the day on our spark clusters. It also prioritises the new catalogues, which are then activated for recommendations.
At the time of serving, the catalogue engagement constraints are checked. In case violations, the catalogues are filtered out by the system.
For our new catalogue discovery, we measure our recommendation performance as the percentage of catalogues which meet our fairness objectives.
The plot in the following figure shows this trend against the catalogue activation date:
There is a drop towards the end because catalogues created on a particular date may not be able to meet the objective on the same day. However, as time progresses, the performance with respect to the objectives improve.
Our current approach provides new catalogues a fair chance to be represented on the platform. But, our new catalogue recommendations are not personalised for the end users just yet.
As a next step, we are exploring approaches where we can give personalised recommendations while meeting the fairness criteria for the newly onboarded catalogues.
In the ideal case, we want to have a personalisation objective which aligns with our fairness objectives rather than work against it as a trade-off.
There’s a balance between discovery and leading our users down the path of what they would like. Managing this is what our team tries to do — we want to take this art and make it into a form of science.
Sounds challenging? Want to help all this personalisation for our millions of users? Head to our careers site to check our openings!