Career Profile

I co-founded and currently serve as CTO of Soda Technology, a data-driven fashion e-commerce platform. In Soda, we are excited about the opportunity to evolve the fashion shopping experience and revolutionize fashion inventory management.

In my capacity as CTO of Soda, I manage Soda’s technology roadmap and computing infrastructure, and lead the engineering team. I also take an active role as a product manager and developer in software and data product developments.

Prior to co-founding Soda, I was a Member of Technical Staff Data Scientist at Bell Labs and worked in the Data Monetization Group. I received my Ph.D. in statistics from Carnegie Mellon University, advised by Stephen E. Fienberg. My thesis work focused on differential privacy and privacy-preserving data analysis.

I am passionate about developing practical solutions to help weave machine learning and statistics into technology products. I am often fascinated by how we interact with data in our daily lives.

I am authorized to work in the US.


Co-founder, CTO

2016 - Present
Soda Technology, Hangzhou, China

Member of Technical Staff, Data Scientist

2015 - 2016
Nokia Bell Labs, Murray Hill, NJ, USA


Background and Technical Overview
Soda Technology ( provides a styling service that sends clothes directly to customers based on their preferences. It is similar to styling services like Stitch Fix, but it aims to further optimize operational efficiency and emphasize on branding.

I am proud that Soda is one of the first adopters of lean and agile startup principles in China. With only a small engineering team, we are able to focus on developing new products and features while maintaining highly reliable and scalable of production environments by (1) practicing DevOps and DataOps, (2) being cloud-native, and (3) using the serverless framework for most of the projects.

Recommendation System as A Service

Soda’s business model relies heavily on the recommendation system I built to intelligently select clothes for customers with the objective of maximizing the customers’ spending. The recommendation system employs a fine-tuned collaborative filtering model, and it outperforms human selections by 50% in terms of revenue.

Data Lake, ETL, and Business Intelligence

In Soda, we put considerable efforts into organizing data and improving data literacy. We developed our own data lake and ETL solutions. We educate decision-makers on how to use Google Data Studios for business intelligence, helping them produce more accessible and sophisticated analyses and reports.

ERP System

Soda’s entire operation runs on the in-house ERP system, which helps manage orders, shipping, inventory, marketing, and payment. The all-encompassing ERP system also features extensive permission management and exception handling. The ERP system plays a crucial role in ensuring Soda’s management and operational efficiency.

iOS & Wechat Apps

The Soda app is much more complex than traditional e-commerce apps because of the multi-stage nature of Soda’s service and fast-changing market trends in the Chinese market. The iOS app is developed in React Native and the Wechat miniprogram is developed in Vue.js.


  • "Scalable privacy-preserving data sharing methodology for genome-wide association studies." In: Journal of Biomedical Informatics. DOI:10.1016/j.jbi.2014.01.008. arXiv:1401.5193.
  • Yu, F., Fienberg, S.E., Slavković, A.B., Uhler, C.

  • "Differentially-private logistic regression for detecting multiple-SNP association in GWAS databases." In: Privacy in Statistical Databases. DOI:10.1007/978-3-319-11257-2_14. arXiv:1407.8067.
  • Yu, F., Rybar, M., Uhler, C., Fienberg, S.E.

  • "Scalable privacy-preserving data sharing methodology for genome-wide association studies: an application to iDASH healthcare privacy protection challenge." In: BMC Medical Informatics and Decision Making. DOI:10.1186/1472-6947-14-S1-S3.
  • Yu, F., Ji, Z.

  • "O Privacy, Where Art Thou?: Genomics and Privacy." In: CHANCE. DOI:10.1080/09332480.2015.1042736.
  • Slavković, S.E., Yu, F.

  • "A unified framework for evaluating online user treatment effectiveness, with advertising applications." In: KDD’ 2014: Proceedings of the 2nd Workshop of User Engagement Optimization.
  • Wang, P, Meytlis, M, Yu, F., Yang, J

  • "Whole exome sequencing reveals minimal differences between cell line and whole blood derived DNA." In: Genomics. DOI:10.1016/j.ygeno.2013.05.005.
  • Schafer, C.M., [and 13 others, including Yu, F.]

    Invited Talks

  • Practical methods for privacy-preserving genome-wide association study data sharing. Joint Statistical Meetings. Seattle. 2015.
  • Differentially-private logistic regression for detecting multiple-SNP association in GWAS databases. Privacy in Statistical Databases. Eivissa. 2014.
  • Privacy-preserving data sharing methodology for genome-wide association studies. Joint Statistical Meetings. Boston. 2014.
  • Healthcare data privacy protection competition. UC San Diego. San Diego. 2014.