Qing Da (Q. Da),   www.daqings.net

 / linkedin / Google scholar / DBLP


本职工作之外,对内承担了一些算法新人培训的工作,2019年集团新人课程《搜索推荐广告-算法体系概论》作者之一,杭州场主讲人,亦承担了次年(2020年)搜索推荐事业部算法新人班的班主任;对外则担任了TNNLS、AAAI、IJCAI、ICML等领域期刊会议的审稿人,2016 中国计算机学会第一届强化学习论坛主讲嘉宾,天池深度学习课程强化学习章节讲师等。


本科:南京大学,计算机科学与技术系,2006-09 至 2010-06


硕士:南京大学,计算机科学与技术系,机器学习与数据挖掘研究所(LAMDA),2010-09 至 2013-06


博士:南京大学,计算机科学与技术系,机器学习与数据挖掘研究所(LAMDA),2013-09 至 2015-01,肄业



2015.01 – 2016.06:    资深算法工程师                           搜索事业部                       阿里巴巴


2016.07 – 2017.12:     算法专家                                       搜索事业部                        阿里巴巴


2018.01 – 2020.07:    高级算法专家                            AI国际事业部                    阿里巴巴


2020.08 –    现在   :   资深算法专家                               AE技术部                           阿里巴巴



1. VirtualTaobao[AAAI’19], Github.

This project provides VirtualTaobao simulators trained from the real-data of Taobao, one of the largest online retail platforms. In Taobao, when a customer entered some query, the recommondation system returns a list of items according to the query and the customer profile. The system is expected to return a good list such that customers will have high chances of clicking the items.

Using VirtualTaobao simulator, one can access a “live” environment just like the real Taobao environment. Virtual                              customers will be generated once at a time, the virtual customer starts a query, and the recommendation system needs to                 return a list of items. The virtual customer will decide if it would like to click the items in the list, similar to a real customer.

How VirtualTaobao was trained is described in:


2. SESim, Github.

SESim is an E-Commerce search engine simulation platform for model examinations, which was a missing piece to connect evaluations of LTR researches and business objectives of real-world applications. SESim can examine models in the simulation E-commerce environment with dynamic responses, and its framework can be easily extended to other scenarios that items and users have differ- ent features. We hope to see the development of a dynamic dataset that facilitates industrial LTR researches in the future.

A typical process of industrial search engines contains three stages to produce a display list from a user query. A search engine first retrieves related items with intend of the user (i.e. the user query), then the ranker ranks these items by a fine-tuned deep LTR model, finally, the re-ranker rearranges the order of items to achieve some businesses goals such as diversity and advertising. Our proposed simulation platform SESim contains these three stages. We replace queries with category indices in our work, therefore SESim can retrieve items from a desensitized items database by the category index. After that, a customizable ranker and a customizable re-ranker produce the final item list. SESim allows us to study joint learning of multiple models, we left it as future work and focus on the correct evaluation for a single model.

Besides the set of real items, two important modules make SESim vividly reflect the behaviors of real users. Virtual user module aims at generating embeddings of virtual users and their query, and it follows the paradigm of Wasserstein Generative Adversarial Network with Gradient Penalty (WGAN-GP). Feedback module inputs the display list and the information of the user, then outputs the feedback of users on the display list. To model the decision process of users, we train the feedback module by Generative Adversarial Imitation Learning (GAIL). For diversifying behaviors, we consider clicking and purchasing, which are two of the most important feedback of users in E-commerce.

How SESim was trained is described in:


  1. Wen-Ji Zhou, Yunan Ye, Qing Da, Yinfu Feng, Anxiang Zeng, Han Yu and Chunyan Miao. Policy Gradient Matching for Recommendation Systems. Submitted to CIKM’21.
  2. Chenlin Shen, Guangda Huzhang, Yu-Hang Zhou, Shen Liang, Qing Da. A General Traffic Shaping Protocol in E-Commerce. Submitted to CIKM’21.
  3. Xuesi Wang, Guangda Huzhang, Qianying Lin, Qing Da. Learning-To-Ensemble by Contextual Rank Aggregation in E-Commerce. Submitted to CIKM’21.
  4. Yongqing Gao, Guangda Huzhang, Weijie Shen, Yawen Liu, Wen-Ji Zhou, Qing Da, Yang Yu. Imitate The World: A Search Engine Simulation Platform. Submitted to CIKM’21, CORR abs/2107.07693.
  5. Wenya Zhu, Yinghua Zhang, Yu Zhang, Yu-Hang Zhou, Yinfu Feng, Qing Da, Yuxiang Wu and Xiaoyu Lv. DHA: Product Title Generation with Discriminative Hierarchical Attention for E-commerce. Submitted to CIKM’21.
  6. Wenya Zhu, Xiaoyu Lv, Baosong Yang, Yinghua Zhang, xu yong, Dayiheng Liu, Linlong Xu, yinfu feng, Haibo Zhang,Qing Da and Weihua Luo. CLPR-9M: An E-Commerce Dataset for Cross-Lingual Product Retrieval. Submitted to EMNLP’21.
  7. Junmei Hao, Jingcheng Shi, Qing Da, Anxiang Zeng, Yujie Dun, Xueming Qian, Qianying Lin. Diversity Regularized Interests Modeling for Recommender Systems. Submitted to TMM. CORR abs/2103.12404
  8. Yanshi Wang, Jie Zhang, Qing Da, Anxiang Zeng. Delayed Feedback Modeling for the Entire Space Conversion Rate Prediction, 2020, CORR abs/2011.11826
  9. Guangda Huzhang, Zhen-Jia Pang, Yongqing Gao, Yawen Liu, Weijie Shen, Wen-Ji Zhou, Qianying Lin, Qing Da, An-Xiang Zeng, Han Yu, Yang Yu, Zhi-Hua Zhou. AliExpress Learning-To-Rank: Maximizing Online Model Performance without Going Online. IEEE Transactions on Knowledge and Data Engineering (TKDE). CORR abs/2003.11941
  10. Yu-Hang Zhou, Peng Hu, Chen Liang, Huan Xu, Guangda Huzhang, Yinfu Feng, Qing Da, XInshang Wang, An-Xiang Zeng. A Primal-Dual Online Algorithm for Online Matching Problem in Dynamic Environments. In: Proceedings of the 34rd AAAI Conference on Artificial Intelligence (AAAI-21)
  11. Anxiang Zeng, Han Yu, Qing Da, Yusen Zhan, Chun-yanMiao, Accelerating E-Commerce Search Engine Ranking by Contextual Factor Selection. In: Proceedings of the 34rd AAAI Conference on Artificial Intelligence (AAAI-20 / IAAI-20), New York, USA, 2020. PDF
  12. Pengcheng Li, Runze Li, Qing Da, An-Xiang Zeng, Lijun Zhang. Improving Multi-Scenario Learning to Rank in E-commerce by Exploiting Task Relationships in the Label Space. In: Proceedings of the 29th International Conference on Information and Knowledge Management (CIKM’20), Virtual Event, Ireland, 2020. PDF
  13. Jing-Cheng Shi, Yang Yu, Qing Da, Shi-Yong Chen, An-Xiang Zeng. Virtual-Taobao: Virtualizing real-world online retail environment for reinforcement learning. In: Proceedings of the 33rd AAAI Conference on Artificial Intelligence (AAAI’19), Honolulu, HI, 2019. PDF
  14. Feiyang Pan, Qingpeng Cai , An-Xiang Zeng , Chun-Xiang Pan, Qing Da, Hualin He, Qing He, Pingzhong Tang. Policy Optimization with Model-based Explorations. In: Proceedings of the 33rd AAAI Conference on Artificial Intelligence (AAAI’19), Honolulu, HI, 2019. PDF
  15. Hua-Lin Hei, Chun-Xiang Pan, Qing Da, An-Xiang Zeng. Speeding up the Metabolism in E-commerce by Reinforcement Mechanism Design. In: “Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD’18)“, Dublin, Ireland, 2018. PDF
  16. Shi-Yong Chen, Yang Yu, Qing Da, Jun Tan, Hai-Kuan Huang and Hai-Hong Tang. Stablizing reinforcement learning in dynamic environment with application to online recommendation. In: Proceedings of the 24th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD’18) (Research Track), London, UK, 2018. PDF
  17. Yujing Hu, Qing Da, Anxiang Zeng, Yang Yu, Yinghui Xu, Reinforcement Learning to Rank in E-Commerce Search Engine: Formalization, Analysis, and Application. In: Proceedings of the 24th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD’18) (Applied Track), London, UK, 2018. PDF
  18. Yang Yu, Shi-Yong Chen, Qing Da, Zhi-Hua Zhou. Reusable reinforcement learning via shallow trails. IEEE Transactions on Neural Networks and Learning Systems, 2018, 29(6): 2204-2215. PDF
  19. Yang Yu, Peng-Fei Hou, Qing Da, and Yu Qian. Boosting nonparametric policies. In: Proceedings of the 2016 International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS’16), Singapore, 2016, pp.477-484. PDF
  20. Yang Yu and Qing Da, PolicyBoost: Functional policy gradient with ranking-based reward objective. In: Proceedings of AAAI Workshop on AI and Robotics (AIRob’14), Quebec City, Canada, 2014. PDF
  21. Qing Da, Yang Yu, and Zhi-Hua Zhou. Learning with Augmented Class by Exploiting Unlabeled Data. In: Proceedings of the 28th AAAI Conference on Artificial Intelligence (AAAI’14), Québec city, Canada, 2014. PDF
  22. Qing Da, Yang Yu, and Zhi-Hua Zhou. Napping for Functional Representation of Policy. In: Proceedings of the 2014 International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS’14), Paris, France, 2014. PDF
  23. Qing Da, Yang Yu, and Zhi-Hua Zhou. Self-Practice Imitation Learning from Weak Policy. In: Proceedings of the 2nd IAPR International Workshop on Partially Supervised Learning (PSL’13), Nanjing, China, 2013, pp.9-20. PDF