GeLoRA: A Theoretical Framework for Efficient Fine-Tuning of Large Language Models

Large language models (LLMs) have become indispensable in natural language processing (NLP), excelling at practical tasks like summarization, question answering, and sentiment analysis. Despite their success, they lack a solid theoretical foundation: we still poorly understand their generalization, reasoning, and representation capabilities. The complexity of training dynamics, particularly in their high-dimensional parameter spaces, makes it challenging to clearly understand how these models work, ensure their reliability, or develop improved fine-tuning methods.

In our work, Geometric Low-Rank Adaptation (GeLoRA), we aim to bridge this gap by establishing a theoretical framework that connects the geometry of data representations to the training dynamics of parameter spaces. Our approach provides new insights into how fine-tuning works, shedding light on the inner mechanisms of LLMs and offering a more efficient and principled way to optimize their performance.


GeLoRA – Methodology

Fine-tuning LLMs is resource-intensive, as traditional methods require updating millions of parameters. LoRA (Low-Rank Adaptation) [1] enhances efficiency by focusing updates on low-rank matrices, which are smaller and computationally less demanding compared to the full parameter matrices of the model. These low-rank matrices approximate the original parameter matrices by capturing the most significant patterns or features in a compressed form. However, LoRA does not provide a systematic method for determining the optimal rank of these matrices for each layer, leaving practitioners to rely on trial-and-error or heuristic methods to decide the best rank configuration [2, 3, 4, 5]. This lack of guidance can lead to suboptimal performance and increased experimentation time.

GeLoRA introduces a principled rank selection mechanism that dynamically adapts the LoRA ranks based on the intrinsic geometry of data representations. Specifically:

Step 1 – Data Geometry Analysis: Using the Two-Nearest-Neighbors (TwoNN) [6] method, GeLoRA estimates the intrinsic dimensions of the input and output representations at each layer of the model. These intrinsic dimensions describe how the data manifold evolves as it passes through each layer of the model.

Step 2 – Dynamic Rank Adjustment: GeLoRA determines the LoRA rank for each layer by analyzing the difference in intrinsic dimensions between its input and output. Layers requiring higher expressive power are assigned greater ranks, while those with simpler representations are given lower ranks, reducing the number of parameters. To maintain stability and ensure optimal performance, a consistent scaling factor is applied across all layers.

Step 3 – Efficient Fine Tuning: GeLoRA performs efficient fine-tuning relying on the pre-computed rank pattern to achieve an optimal balance between computational efficiency and model expressivity.


GeLoRA – Advantages

By linking the geometry of the data manifold to the training dynamics, GeLoRA provides a theoretically grounded explanation for how fine-tuning impacts LLM performance. This connection enables:

  • Efficiency: GeLoRA uses far fewer trainable parameters than traditional methods, making it more scalable for large models.

  • Consistency: While some methods excel on specific tasks, GeLoRA consistently delivers robust average performance across diverse benchmarks, striking a balance between accuracy and efficiency.

  • Rigor: GeLoRA provides a theoretically grounded method for parameter tuning, eliminating much of the trial-and-error typically involved in fine-tuning.


GeLoRA – Empirical Results

GeLoRA’s performance has been validated on a variety of NLP benchmarks:

  1. GLUE Benchmark [7]:

    • For a similar number of trainable parameters, GeLoRA achieved an average score of 87.92, outperforming methods like LoRA (maximum average score 86.95) and adapters (maximum average score 86.74). 

    • GeLoRA used only 0.1M–0.13M parameters per task, compared to 0.65M parameters for adapters, 0.22M parameters for AdaLoRA, and 184M parameters for full fine-tuning.

  2. SQuAD (Question Answering) [8]:

    • On SQuAD v1.1, GeLoRA achieved an F1 score of 92.84 and an Exact Match (EM) score of 86.72, surpassing alternative methods while using fewer trainable parameters.

    • On SQuAD v2,  GeLoRA achieved an F1 score of 86.25 and an Exact Match (EM) score of 83.15, outperforming alternative methods such as AdaLoRA, while using fewer trainable parameters.

  1. Computational Savings:

    • GeLoRA cut training times by up to 50% compared to other adaptive LoRA variants like SoRA [2] and AdaLoRA [5], offering significantly better computational efficiency.


GeLoRA – Limitations

Although GeLoRA reduces the computational load during fine-tuning by shifting it to a preprocessing step—where it calculates intrinsic dimensions—this preprocessing can be very resource-intensive for large datasets. The TwoNN method used for computing intrinsic dimensions has a computational complexity that grows either linearly with a logarithmic factor (linearithmic) or quadratically with the size of the dataset. This means that as the dataset becomes larger, the time and resources required increase significantly, making it difficult to apply GeLoRA directly to large-scale tasks. To address this limitation, we propose the following solutions:

  • Data Subsampling: Compute the intrinsic dimensions on a carefully chosen subset of the dataset, which provides a reliable approximation while reducing computational cost.

  • Mini-Batch Estimation: Perform intrinsic dimension computations on mini-batches of data. However, care must be taken to control the variance introduced by smaller sample sizes and the scaling effects of the data.

  • Alternative Estimators: Leverage faster intrinsic dimension estimators like persistent homology [9], which can operate in linear time by exploiting GPU parallelism for significant speedups. Persistent homology captures both local and global data geometry, making it a promising alternative to TwoNN for scaling GeLoRA to larger datasets.


Broader Vision

Our primary aim with GeLoRA is not only to optimize the fine-tuning of LLMs but also to establish a solid theoretical foundation for understanding their behavior. By examining the relationship between the geometry of the data (the data manifold) and the model's parameters (the parameter space), we hope to gain deeper insights into how LLMs operate and how data influences their training dynamics.

Although GeLoRA may slightly underperform on certain specific tasks, it consistently achieves strong average results across a wide range of benchmarks. Its blend of efficiency, scalability, and robust theoretical grounding makes GeLoRA a promising advancement toward developing better fine-tuning methods and enhancing our understanding of LLMs.

Looking forward, we plan to improve the process of estimating intrinsic dimensions, explore other techniques like persistent homology, and apply GeLoRA to more complex scenarios involving multiple tasks or specialized domains. We believe that integrating geometric insights with scalable optimization is key for creating efficient AI systems.


References

[1] Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, & Weizhu Chen. (2021). LoRA: Low-Rank Adaptation of Large Language Models.

[2] Ning Ding, Xingtai Lv, Qiaosen Wang, Yulin Chen, Bowen Zhou, Zhiyuan Liu, & Maosong Sun. (2023). Sparse Low-rank Adaptation of Pre-trained Language Models.

[3] Hu, Y., Xie, Y., Wang, T., Chen, M., & Pan, Z. (2023). Structure-Aware Low-Rank Adaptation for Parameter-Efficient Fine-Tuning. Mathematics, 11(20), 4317. https://doi.org/10.3390/math11204317

[4] Zequan Liu, Jiawen Lyn, Wei Zhu, Xing Tian, & Yvette Graham. (2024). ALoRA: Allocating Low-Rank Adaptation for Fine-tuning Large Language Models.

[5] Qingru Zhang, Minshuo Chen, Alexander Bukharin, Nikos Karampatziakis, Pengcheng He, Yu Cheng, Weizhu Chen, & Tuo Zhao. (2023). AdaLoRA: Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning.

[6] Facco, E., d’Errico, M., Rodriguez, A., & Laio, A. (2017). Estimating the intrinsic dimension of datasets by a minimal neighborhood information. Scientific Reports, 7(1).

[7] Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, & Samuel R. Bowman. (2019). GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding.

[8] Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, & Percy Liang. (2016). SQuAD: 100,000+ Questions for Machine Comprehension of Text.

[9] Michael G. Rawson. (2022). Linear Run Time of Persistent Homology Computation with GPU Parallelization.

Experience the power of real-time AI

See how real-time AI can accelerate your workflows.

Get hands-on with a guided demo

Navi is a trademark by Nace.AI © 2025

Experience the power of real-time AI

See how real-time AI can accelerate your workflows.

Get hands-on with a guided demo

Navi is a trademark by Nace.AI © 2025

Experience the power of real-time AI

See how real-time AI can accelerate your workflows.

Get hands-on with a guided demo

Navi is a trademark by Nace.AI © 2025