Efficient Foundation Models

Structuring context improves inference within foundation models, just as it does in classical statistical models.

Structuring context improves inference—not only in statistical graphical models, but also within foundation models. This project investigates how explicit contextual structure can make foundation models more efficient, modular, and interpretable.

We focus on making large models practical for real-world deployment by aligning their internal mechanisms with the same principles that make classical models statistically efficient: modularity, conditional independence, and context-aware adaptation.

This work supports:

  • Structured contextual inference by building architectural and training constraints that reflect known structure.
  • Modular reasoning by decomposing complex predictions into composable parts.
  • Deployment-readiness through faster, lower-latency models that retain contextual sensitivity.

Recent work includes FastCache(Liu et al., 2025), Memory-Keyed Attention(Liu et al., 2025), and ongoing development of FastLM , a framework for efficient, composable LLMs.



References

2025

  1. FastCache: Fast Caching for Diffusion Transformer Through Learnable Linear Approximation
    Dong Liu, Jiayi Zhang, Yifan Li, and 3 more authors
    CVPR Another Brick in the AI Wall: Building Practical Solutions from Theoretical Foundations (CVPR BASE 2025), 26–28 aug 2025
  2. MKA: Memory-Keyed Attention for Efficient Long-Context Reasoning
    Dong Liu, Yanxuan Yu, Xuhong Wang, and 2 more authors
    ICML Long Context Foundation Models (LCFM), 26–28 aug 2025