Franz Louis Cesista leloykun

Hi! I'm Franz Louis Cesista フランズ

Building something 👨‍🍳🚀 • Former Machine Learning (AI) Research Scientist, Full-Stack Software Engineer, & Data Engineer at Expedock Software Inc. • 2x IOI & 2x ICPC World Finalist • Mathematics at the Ateneo de Manila University

At Expedock, I was in charge of researching, building, training, and managing the deploy of hundreds of multi-modal machine learning models fine-tuned for information extraction on semi-structured documents in the logistics industry. More generally, I was also responsible for improving our entire ML system--from making our data collection jobs more robust, to managing our data warehouse and feature stores, to building the charts and dashboards we present to our customers.

Recently, I've also been exploring model inference optimizations on more lower-level abstractions. I know how to implement most machine learning building blocks in C++ (see my implementation of Meta's LLamaV2 in C++ and Flash Attention 1 & 2 in CUDA). At Expedock, I also worked on reducing the memory consumption of PyTorch (and its CUDA kernels) so we could run more inference jobs in parallel per GPU instance. Tl;dr: I'm very comfortable working on every level of abstraction in machine learning.

Before Expedock, I studied Mathematics at the Ateneo de Manila University. I also dabbled a lot in competitive programming. In fact, I managed to be a 2-time IOI and a 2-time ICPC World Finalist representing the Philippines.

Tech Stack

Layer	Tools
Cloud
Infra
DB
Backend
API
Frontend
ML Platform
ML Inference Server
ML APIs
ML Frameworks
Data Viz

Research Interests

Information Retrieval from Semi-Structured Documents. Research on information retrieval (colloquially, "Search") mostly focus on purely text-based documents and structured documents--both of which are now largely solved problems. For context, structured documents are PDFs, scanned documents, screenshots of excel sheets, etc. where (1) the borders of the tables (if present) and (2) the ordering of the word-blocks are very clear. But most real-world documents, especially in the logistics industry, are semi-structured. That is, documents where either (1) the tables don't have very clear borders (or may even be implicit tables) and/or (2) the word-blocks are scattered all over the place. This is surprisingly a very difficult problem and even the big cloud platforms (GCP, AWS, & Azure) are having difficulty handling such documents. But it can be very profitable if you can get it right--hence why Expedock is now a multi-million $$ startup.
ML on Non-Euclidean Geometry. More specifically, I'm interested in embedding high-dimensional data into lower-dimensional non-euclidean spaces. Although embedding into euclidean spaces, $\mathbb{R}^n$, is good enough for most cases, there are cases where non-euclidean spaces might be more appropriate. For example:
- Embedding hierarchical data such as the phylogenetic tree-representation of single-cell specialization data. Real-world hierarchical data are usually tree-like with near-constant branching factors. Thus, they grow exponentially with respect to the depth (e.g. the $k^{th}$-level of a binary tree has $2^{k}$ nodes). However, euclidean spaces, $\mathbb{R}^n$, only grow polynomially with respect to $n$. On the other hand, negatively-curved spaces such as the poincare disc grow exponentially. Thus, it's better to embed hierarchical data into them - we just need to be careful with floating-point errors.
- Embedding complex cyclical data. In my stint at ExoraPH, I used to uncover the lower-dimensional, torus-like structure of the Philippine's energy supply-and-demand curves.
Geometric Deep Learning. I'm interested in unifying various concepts in machine learning through the lens of the Erlangen Program. I'm especially fascinated with the following:
- How we can derive linear regression, convolution, the attention mechanism, and message-passing from the geometric transformations we want our models to preserve. For example:
  - If we want translation-invariance, then we have to use convolutions as they're the only family of transformations that are translation-invariant.
  - If we want color- and shade-invariance, then we can use batch-normalization.
  - If we let the weights of the convolutions to be learnable (and depend on the neighbors' weights), then we'd end up with the attention mechanism. And
  - If we generalize the attention mechanism to all graph structures (not just regular graphs), then we'd end up with message-passing.
- In almost all unsupervised learning models, we just fix two of (a) the manifold $X$, (b) the metric on the manifold $d_X$, and (c) the probability measure $\mu_X$ over the metric space $(X, d_X)$ and then try to estimate the remaining one of the three. For example:
  - In dimensional reduction, we usually fix $d_{X, p}(x, y) = \sqrt[p]{\sum_i (x_i - y_i)^p}$ and $\mu_X =$ the uniform distribution (such as in UMAP) then try to find a low-dimenional manifold $X$ that preserves the local distances in the original graph as much as possible.
  - In metric learning, we usually fix $X = \mathbb{R}^n$ and $\mu_X =$ the uniform distribution then try to find $d_X$ such that similar datapoints are close together and dissimilar datapoints are far way from each other. And, finnaly,
  - In density estimation, we usually fix $X = \mathbb{R}^n$ and $d_{X, p}(x, y) = \sqrt[p]{\sum_i (x_i - y_i)^p}$ then try to find the probability distribution $\mu_X$ of our dataset.

If you're interested in collaborating on a research project with me, just email me at [email protected]

Porfolio [WIP]

Please visit my personal website at leloykun.github.io for a more detailed portfolio.

Personal Projects

Project	Description
	ProgVar Library is a collection of algorithms, data structures, and other useful information for competitive programming. It also contains the team notebook our team used to reach the ICPC World Finals twice-in-a-row. I lead the team in maintaining the project.

Open-source Contributions

Project	Description
	A C++ implementation of Meta's Llama2 generative large-language model. I also optimized the original C implementation by adding parallelization on the multi-head attention component.
	A minimal implementation of Flash Attention 1 & 2 in just ~350 lines of CUDA code. This is still a work-in-progress, but the ultimate goal is to implement the various variations of Hyperbolic Attention in CUDA.
	An AutoML tool that supports multi-modal inputs. I helped trace and squash a bug which prevented interpretability metrics (such as permutation importances) on quantile regressors from being calculated.
	An automated topic modelling tool. I added customizability options to the visualizations.
	NOI.PH's (the Philippine's National Olympiad of Informatics) online judge and problem manager. I added developer tools. I also help out younger developers in our private discord server.