Career Profile

I started out as an academically trained statistician, then a machine learning applied scientist, and most recently I architect ML infrastructure and design MLOps workflows. On a typical day, my job is to address the the following challenges:

Data is messy. Users of ML systems are experimentalists. Data pipelines and ML models break. Training-serving skew is always lurking. Legacy code and infrastructure never fail to get in the way.

Here’s my preparatory work: I architect ML infrastructure and design MLOps workflows in my current and previous jobs; I worked at an ML-platform-as-a-service company building a distributed database for massive scale machine learning applications; I co-founded a recommendation system powered retail startup, building the ML models and serving infrastructure, and fusing machine learning with the internal ERP and warehouse management systems.

Experience

Upstart

Staff Software Engineer
2021 - Present

Keywords: MLOps, ML platform.

Drive efforts to modernize the ML platform and MLOps practices within the company.

Pinecone

Founding Engineer
2020 - 2021

Keywords: MLOps, recommendation system, ANN search, technical marketing.

As a founding engineer, I contribute to developing cutting-edge approximate nearest neighbor (ANN) search algorithms that outperform state-of-the-art open-source solutions. In addition, I am responsible for designing the user experience (UX) and ensuring Pinecone is a delightful tool to use for data scientists and machine learning engineers.

Soda Technology

Co-founder, CTO
2016 - 2019

Keywords: recommendation system, business intelligence, MLOps, data engineering.

I build a retail business based on a recommendation system, which outperforms humans by 50% in terms of revenue. I lead development of complex yet efficient in-house ERP & CRM systems, and infuse them with machine learning, which translates into substantial cost savings.

I foster the company’s culture of data-driven decision-making by encouraging the employees’ to improve data literacy, helping decision-makers learn how to use Google Data Studios for better business intelligence analyses.

Nokia Bell Labs

Member of Technical Staff
2015 - 2016

I am part of the data monetization team. We build POCs to demonstrate novel applications using telecom data. Projects include (1) user segmentation analysis based on cell phone browsing data and (2) city bike lane planning.

Publications

  • "Scalable privacy-preserving data sharing methodology for genome-wide association studies." In: Journal of Biomedical Informatics. DOI:10.1016/j.jbi.2014.01.008. arXiv:1401.5193.
  • Yu, F., Fienberg, S.E., Slavković, A.B., Uhler, C.

  • "Differentially-private logistic regression for detecting multiple-SNP association in GWAS databases." In: Privacy in Statistical Databases. DOI:10.1007/978-3-319-11257-2_14. arXiv:1407.8067.
  • Yu, F., Rybar, M., Uhler, C., Fienberg, S.E.

  • "Scalable privacy-preserving data sharing methodology for genome-wide association studies: an application to iDASH healthcare privacy protection challenge." In: BMC Medical Informatics and Decision Making. DOI:10.1186/1472-6947-14-S1-S3.
  • Yu, F., Ji, Z.

  • "O Privacy, Where Art Thou?: Genomics and Privacy." In: CHANCE. DOI:10.1080/09332480.2015.1042736.
  • Slavković, S.E., Yu, F.

  • "A unified framework for evaluating online user treatment effectiveness, with advertising applications." In: KDD’ 2014: Proceedings of the 2nd Workshop of User Engagement Optimization.
  • Wang, P, Meytlis, M, Yu, F., Yang, J

  • "Whole exome sequencing reveals minimal differences between cell line and whole blood derived DNA." In: Genomics. DOI:10.1016/j.ygeno.2013.05.005.
  • Schafer, C.M., [and 13 others, including Yu, F.]

    Invited Talks

  • Practical methods for privacy-preserving genome-wide association study data sharing. Joint Statistical Meetings. Seattle. 2015.
  • Differentially-private logistic regression for detecting multiple-SNP association in GWAS databases. Privacy in Statistical Databases. Eivissa. 2014.
  • Privacy-preserving data sharing methodology for genome-wide association studies. Joint Statistical Meetings. Boston. 2014.
  • Healthcare data privacy protection competition. UC San Diego. San Diego. 2014.