Career Profile

I am a data scientist, ML engineer, and academically trained statistician. I currently work in the exciting field of ML-platform-as-a-service, bringing my experience and insight to building tools for data scientists, MLOps engineers and ML infra engineers. In my latest effort, I am building a vector-native database that will power the next generation of massive scale machine learning applications.

I am experienced in machine learning, statistical modeling, MLOps, data engineering, and distributed systems. I pride myself on my sensibility for user experience and product usability, and I excel in abstracting technical work to cater diverse audiences. I strive to build data products and data ecosystems that enable people of all technical backgrounds to make data-informed decisions.

Experiences

HyperCube

Research Engineer
2020 - Present

Machine Learning Database: I am part of the R&D team developing the next generation massive scale machine learning database, optimized for deep learning applications and complex MLOps workloads. We have built a flexible and scalable ML platform with low latency and high throughput networking and storage layers. I am also part of the effort in developing state-of-the-art approximate nearest neighbor (ANN) search algorithms and implementations. I am the main contributor of Jupyter notebooks showcasing new machine learning paradigms made possible by vector-native databases.

Product Lead: I lead the effort in creating a delightful user experience for data scientists, ML engineers, and infrastructure engineers. I am responsible for scoping the product’s UX for different personae, building the product’s Python client, writing documentations, and creating demos.

DS and MLOps Consultant: I actively participate in sales motions and after-sales support.

Soda Technology

Co-founder, CTO
2016 - 2019

Recommendation Service: I build the recommendation system that powers Soda’s daily operation, which sends merchandise directly to customers based on their preferences. The recommendation system employs fine-tuned collaborative filtering models, and it outperforms human selections by 50% in terms of revenue.

Data Engineering, and Business Intelligence: I architect and lead Soda’s data lake and ETL development. I also increase the company’s data literacy by helping decision-makers learn how to use Google Data Studios for better business intelligence analyses.

ERP & CRM Systems: We build in-house ERP & CRM systems for managing warehouse operations and user relations, which prove crucial in ensuring the company’s operational efficiency and hence cost savings.

iOS & Wechat Apps: I develop and maintain Soda’s iOS app (React Native) and Wechat miniprogram (Vue.js). In addition, I am responsible for the UX of the app.

Nokia Bell Labs

Member of Technical Staff, Data Scientist
2015 - 2016

I am part of the data monetization team. We build POCs to demonstrate novel applications using telecom data. Projects include (1) user segmentation analysis based on cell phone browsing data and (2) city bike lane planning.

Publications

  • "Scalable privacy-preserving data sharing methodology for genome-wide association studies." In: Journal of Biomedical Informatics. DOI:10.1016/j.jbi.2014.01.008. arXiv:1401.5193.
  • Yu, F., Fienberg, S.E., Slavković, A.B., Uhler, C.

  • "Differentially-private logistic regression for detecting multiple-SNP association in GWAS databases." In: Privacy in Statistical Databases. DOI:10.1007/978-3-319-11257-2_14. arXiv:1407.8067.
  • Yu, F., Rybar, M., Uhler, C., Fienberg, S.E.

  • "Scalable privacy-preserving data sharing methodology for genome-wide association studies: an application to iDASH healthcare privacy protection challenge." In: BMC Medical Informatics and Decision Making. DOI:10.1186/1472-6947-14-S1-S3.
  • Yu, F., Ji, Z.

  • "O Privacy, Where Art Thou?: Genomics and Privacy." In: CHANCE. DOI:10.1080/09332480.2015.1042736.
  • Slavković, S.E., Yu, F.

  • "A unified framework for evaluating online user treatment effectiveness, with advertising applications." In: KDD’ 2014: Proceedings of the 2nd Workshop of User Engagement Optimization.
  • Wang, P, Meytlis, M, Yu, F., Yang, J

  • "Whole exome sequencing reveals minimal differences between cell line and whole blood derived DNA." In: Genomics. DOI:10.1016/j.ygeno.2013.05.005.
  • Schafer, C.M., [and 13 others, including Yu, F.]

    Invited Talks

  • Practical methods for privacy-preserving genome-wide association study data sharing. Joint Statistical Meetings. Seattle. 2015.
  • Differentially-private logistic regression for detecting multiple-SNP association in GWAS databases. Privacy in Statistical Databases. Eivissa. 2014.
  • Privacy-preserving data sharing methodology for genome-wide association studies. Joint Statistical Meetings. Boston. 2014.
  • Healthcare data privacy protection competition. UC San Diego. San Diego. 2014.