Interpretable Attribute-based Action-aware Bandits for Within-Session Personalization in E-commerce

Authors: Xu Liu, Congzhe Su, Amey Barapatre, Xiaoting Zhao, Diane Hu, Chu-Cheng Hsieh and Jingrui He.

Presented at: Knowledge Management in eCommerce Workshop, in conjunction with The Web Conference (WWW) 2021 (Ljubljana, Slovenia)

Abstract: When shopping online, buyers often express and refine their pur- chase preferences by exploring different items in the product catalog based on varying attributes, such as color, size, shape, and mate- rial. As such, it is increasingly important for e-commerce ranking systems to quickly learn a buyer’s fine-grained preferences and re- rank items based on their most recent activity within the session. In this paper, we propose an 𝑂nline 𝑃ersonalized 𝐴ttribute-based 𝑅e-ranker (OPAR), a light-weight, within-session personalization approach using multi-arm bandits (MAB). As the buyer continues on their shopping mission and interacts with different products in an online shop, OPAR learns which attributes the buyer likes and dislikes, forming an interpretable user preference profile and im- proving re-ranking performance over time, within the same session. By representing each arm in the MAB as an attribute, we reduce the complexity space (compared with modeling preferences at the item level) while offering more fine-grained personalization (com- pared with modeling preferences at the product category level). We naturally extend this formulation to weight attributes differently in the reward function, depending on how the buyer interacts with the item (e.g. click, add-to-cart, purchase). We train and evaluate OPAR on a real-world e-commerce search ranking system and benchmark it against 4 state-of-the-art baselines on 8 datasets and show an improvement in ranking performance across all tasks.

Previous
Previous

Debiasing Grid-based Product Search in E-commerce

Next
Next

Personalization in E-commerce Product Search by User-Centric Ranking