The large-scale structure of the universe is governed by the gravitational evolution of dark matter, forming an intricate cosmic web of filaments, expansive voids, and massive galaxy clusters. High-resolution N-body simulations, such as the Quijote suite, are the standard method for producing these theoretical predictions, but they demand millions of CPU hours to compute, making extensive parameter exploration computationally prohibitive.To overcome this bottleneck, our group has previously leveraged deep generative modeling to act as a highly efficient emulator. The foundational project demonstrated that Equivariant Score-Based Generative Models can learn to distribute dark matter halos across varying cosmological parameter configurations. By treating cosmologies as graph-based point clouds and incorporating domain-specific knowledge like periodic boundary conditions and physical symmetries, the model successfully recovered clustered distributions from a uniform noise prior. However, the previous approach faced scaling limits and focused primarily on generating spatial coordinates, presenting a clear path for expansion.

Simulations realized by Andrey Kravtsov (The University of Chicago) and Anatoly Klypin (New Mexico State University) at the National Center for Supercomputer Applications. Visualizations are created by Andrey Kravtsov.
Project Description and Objectives
This MSc project builds directly upon the existing equivariant score-based framework, aiming to significantly scale up the generation of cosmological structures. The successful candidate will upgrade the generative paradigm to state-of-the-art Flow Matching models. Flow Matching offers a highly efficient, simulation-free framework for training Continuous Normalizing Flows (CNFs) by constructing deterministic optimal transport paths, often yielding faster inference, better computational scalability, and easier training dynamics than traditional diffusion or score-based models.The new project will push the boundaries of machine learning in cosmology through two major extensions:Massive Graph Scaling: Extend the architecture to process the full 100,000 (100k) dark matter halos available in the standard Quijote cosmological simulations. Modeling such vast point clouds will require researching and implementing highly scalable Graph Neural Network (GNN) or Graph Transformer backbones that can handle the computational complexities of much larger graphs.Full Range of Halo Properties: While the prior work primarily focused on modeling 3D spatial coordinates to evaluate simple spatial clustering, this project will expand the generative scope to include the full set of physical dark matter halo properties. The model will co-generate halo masses, velocity vectors, angular momentum (spin), and concentration metrics alongside their physical positions.
[1] Onuțu, D. A., Zhao, Y., Vanschoren, J., & Menkovski, V. (2025, September). Score Matching on Large Geometric Graphs for Cosmology Generation. In International Conference on Discovery Science (pp. 17-31). Cham: Springer Nature Switzerland. (arxiv link)
Vlado Menkovski