Tabular Foundation Models and In-Context Learning

Foundation models — large models pretrained on broad data and adaptable to diverse downstream tasks — have transformed NLP and computer vision. With the emergence of tabular foundation models, this trend is beginning to reshape tabular data analysis, traditionally dominated by classical statistics.

A leading example is TabPFN v2: a transformer pretrained entirely on synthetic data, claimed to outperform all previous methods for regression and classification on datasets up to 10,000 samples, while also supporting data generation, density estimation, and fine-tuning.

What is in-context learning?

In-context learning (ICL) can be reframed statistically as meta-learning: instead of fitting a separate model per dataset, one pretrains a model to learn a mapping from datasets to quantities of interest. Given a new dataset, the model applies this mapping immediately — with no retraining required.

TabPFN implements ICL as amortized Bayesian inference:

A prior is placed on joint distributions over covariate–label pairs.
A dataset is assumed drawn i.i.d. from a fixed joint distribution.
For a new test point x, the model approximates the posterior predictive distribution by integrating over all plausible data-generating distributions consistent with the observed data.

Because pretraining uses synthetic data generated from a prior, TabPFN is called a prior-fitted network (PFN).

Evaluating TabPFN beyond supervised prediction

In recent work, we evaluated TabPFN’s capabilities beyond supervised prediction and found it outperforms specialized methods in:

Semi-supervised parameter estimation
Prediction under covariate shift
Heterogeneous treatment effect estimation

It even surpasses LASSO in sparse regression and breaks robustness–efficiency trade-offs in classification. These findings suggest the ICL/PFN paradigm has the potential to supersede existing approaches across a wide range of statistical tasks.

Key paper: Zhang, Q.^*, Tan, Y. S.^*, Tian, Q.^*, and Li, P. (2025). TabPFN: One Model to Rule Them All? Major revision at JASA. [arXiv] [Code]

PFN for clustering

Clustering becomes substantially more challenging when both the number of clusters and the cluster assignments are unknown. Most classical methods treat these two problems separately — fixing the number of clusters in advance, or selecting it through BIC, cross-validation, or heuristic elbow methods — but there is no universally reliable criterion.

In TabClustPFN, we use the PFN framework to address this from a different angle. Our key idea: reformulate clustering as a joint structured prediction problem, where the model simultaneously infers both the number of clusters and the cluster memberships. Rather than a separate preprocessing step, the number of clusters is embedded directly into the prediction target.

We construct prior datasets where both the number of clusters and cluster structures vary, enabling the PFN to learn a mapping from raw data to latent partition structure in a fully amortized manner. The result: a unified inference procedure that outputs both the number of clusters and assignments in a single forward pass, without iterative optimization or repeated model fitting.

Key paper: Zhao, T., Wang, G., Tan, Y. S., and Zhang, Q. (2025). TabClustPFN: A Prior-Fitted Network for Tabular Data Clustering. Preprint. [arXiv] [Code]