About Me
I am Cheng, currently based in Tokyo as a Data and AI Consultant at arcbricks, leading cross-border teams to harness the potential of Databricks and data to foster digital transformation.
I am a Master’s graduate in Information Science from Cornell University. Previously, I received my Bachelor’s degree in International Business from National Taiwan University with a minor in Sociology.
My experiences span data analysis, management consulting, brand strategy, and academic research, and I am interested in how to leverage data-driven insights in diverse contexts.
Education
- Master’s in Information Science, Cornell University, 2024 - 2025
- Bachelor’s in International Business, National Taiwan University, 2018 - 2023
- Bachelor’s in Sociology (minor), National Taiwan University, 2018 - 2023
- Business Analytics Program, National Taiwan University, 2018 - 2023
Work Experience
- Data and AI Consultant at arcbricks, Oct. 2025 to present (Tokyo, Japan)
- Consultant Intern (part-time assistant) at beBit TECH, Feb. 2022 - Jun. 2022 (Taipei, Taiwan)
- Strategy Intern at Redpeak, Aug. 2021 - Dec. 2021 (Taipei, Taiwan)
Research Experience
- Research Assistant under the supervision of Professor Hsuan-Wei Lee at Lehigh University, Nov. 2024 - Present
- Research Assistant under the supervision of Professor Nanyi Bi at National Taiwan University, Oct. 2023 - Jul. 2024
- Research Assistant under the supervision of Professor Yen-Hsin Cheng at the Institute of Sociology of Academia Sinica, Oct. 2022 - Dec. 2022
Project Experience
Ecolab Food Safety Audit Question Recommendation System
- Constructed a recommendation system that provides actions for over 600 FDA audit questions for Ecolab, one of the industry leaders in offering water, hygiene, and infection prevention solutions
- Leveraged data parsing and embedding techniques to preprocess data from over five different data sources
- Utilized Databricks to build a scalable Zero-Shot recommendation system that provides recommended actions, reasoning, citations, and sources for each corresponding audit question
Estimation and Visualization of Flood Damage in Taiwan (link)
- Aggregated data from five different souces, compiling over seven million flood observation entries
- Utilized spatial and temporal data to identify 84 unique flood events across Taiwan
- Developed a dynamic Tableau dashboard to visualize estimated flood damage over five years
Customer Segmentation and Business Analysis Report Based on Transaction Data (link)
- Generated a business analysis report from 541,909 entries of transaction data of an online retailer based in England
- Defined four behavioral indexes (RFM Indexes, Activity Index) as features for K-Means Clustering and identified three distinct clusters
- Delivered three levels of next-step business strategy suggestions based on the segmentation results and the overall patterns in the data
Prediction of League of Legends Game Outcome with Supervised Learning Methods (link)
- Predicted the outcome of League of Legends games based on in-game data from a dataset with 28 features across 24,224 entries.
- Engineered supervised learning methods such as Logistic Regression, Linear Discriminant Analysis, Random Forest, and XGBoost, yielding a final prediction accuracy of over 78% across all four models.
- Leveraged hyperparameter tuning methods such as Grid Search and Bayesian Optimization to further optimize model performance.
News Category Classification with Natural Language Processing and Deep Learning (link)
- Categorized the category of news entries based on the headline and description from a dataset with over 200,000 entries, deploying both classical and deep learning machine learning models.
- Implemented Principle Component Analysis to perform dimension reduction and feature transformation.
- Applied Logistic Regression, Support Vector Machine, and Multilayer Perceptron with Bayesian Optimization hyperparameter tuning, achieving a final prediction accuracy of 74% on the best-performing model.
Impact of Xiaohongshu Usage on Young Female’s Body Dissatisfaction (link)
- Analyzed how social media influences females’ body dissatisfaction with Xiaohongshu as an instance under the instruction of Professor Nanyi Bi from the Department of Information Management at National Taiwan University.
- Conducted bottom-up thematic analysis and constructed 9 code groups with over 100 codes across 11 in-depth interviews with MAXQDA.
- Crafted a conceptual model depicting the interplay between social media usage, body dissatisfaction, and body comparison while considering the characteristics of Xiaohongshu’s content and usage patterns.
Software, Skills, and Languages
-
Programming Languages
- Python (excellent), R (excellent), SQL (excellent), C++ (intermediate), VBA (intermediate)
-
Skills
- Supervised Learning Methods: Regression, Discriminant Analysis, Support Vector Machine, K-nearest Neighbors, Multilayer Perceptron, Random forest, XGBoost
- Unsupervised Learning Methods: Principal Component Analysis, K-means Clustering, Latent Dirichlet Allocation, Latent Class Analysis
- Generative AI Engineering: Embedding Pipelines and Evaluation, RAG Architecture Design, Prompt Engineering, Evaluation and Experiment Management with MLflow
- Data Engineering: Data modeling, ETL/ELT, Workflow Orchestration, Medallion Architecture Design, SCD, Quarantining
-
- Data Platforms: Databricks (Data Modeling, Data Pipeline Engineering, Model Management with MLflow, Feature Store, Dashboards, Feature Store)
- Project Management: Agile/Waterfall Project Management with Azure DevOps, Jira
- Version Control: GitHub
- Business Intelligence: WhereScape RED (ETL and Data Modeling), Tableau (Data Visualization)
- Statistical Software: STATA, SPSS
-
Credentials
-
Languagues
- Mandarin (native), English (native/bilingual proficiency, C2), Japanese (professional working proficiency, JLPT N1)