publications | Mojtaba Valipour

2024

QDyLoRA: Quantized Dynamic Low-Rank Adaptation for Efficient Large Language Model Tuning

Hossein Rajabzadeh, Mojtaba Valipour, Marzieh Tahaei, and 4 more authors

2024

Abs Website

Finetuning large language models requires huge GPU memory, restricting the choice to acquire Leger language models. While the quantized version of the Low-Rank Adaptation technique, named QLoRA, significantly alleviates this issue, finding the efficient LoRA rank is still challenging. Moreover, QLoRA is trained on a pre-defined rank and, therefore, cannot be reconfigured for its lower ranks without requiring fine-tuning steps. This paper proposes QDyLoRA-Quantized Dynamic Low-Rank Adaptation-, as an efficient quantization approach for dynamic low-rank adaptation. QDyLoRA combines the advantages of QLoRA with Dynamic LoRA to efficiently finetune LLMs on a set of pre-defined LoRA ranks. QDyLoRA enables fine-tuning Falcon-40b for ranks 1 to 64 on a single 32GiG V100-GPU through one round of fine-tuning. Experimental results show that QDyLoRA is competitive to QLoRA and outperforms when employing its optimal rank.

2023

DyLoRA: Parameter-Efficient Tuning of Pre-trained Models using Dynamic Search-Free Low-Rank Adaptation

Mojtaba Valipour, Mehdi Rezagholizadeh, Ivan Kobyzev, and 1 more author

In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, 2023

Abs arXiv Code Website

With the ever-growing size of pretrained models (PMs), fine-tuning them has become more expensive and resource-hungry. As a remedy, low-rank adapters (LoRA) keep the main pretrained weights of the model frozen and just introduce some learnable truncated SVD modules (so-called LoRA blocks) to the model. While LoRA blocks are parameter-efficient, they suffer from two major problems: first, the size of these blocks is fixed and cannot be modified after training (for example, if we need to change the rank of LoRA blocks, then we need to re-train them from scratch); second, optimizing their rank requires an exhaustive search and effort. In this work, we introduce a dynamic low-rank adaptation (DyLoRA) technique to address these two problems together. Our DyLoRA method trains LoRA blocks for a range of ranks instead of a single rank by sorting the representation learned by the adapter module at different ranks during training. We evaluate our solution on different natural language understanding (GLUE benchmark) and language generation tasks (E2E, DART and WebNLG) using different pretrained models such as RoBERTa and GPT with different sizes. Our results show that we can train dynamic search-free models with DyLoRA at least 4 to 7 times (depending to the task) faster than LoRA without significantly compromising performance. Moreover, our models can perform consistently well on a much larger range of ranks compared to LoRA.
Sortednet, a place for every network and every network in its place: Towards a generalized solution for training many-in-one neural networks

Mojtaba Valipour, Mehdi Rezagholizadeh, Hossein Rajabzadeh, and 3 more authors

arXiv preprint arXiv:2309.00255, 2023

arXiv
Sorted LLaMA: Unlocking the Potential of Intermediate Layers of Large Language Models for Dynamic Inference Using Sorted Fine-Tuning (SoFT)

Parsa Kavehzadeh, Mojtaba Valipour, Marzieh Tahaei, and 3 more authors

arXiv preprint arXiv:2309.08968, 2023

arXiv

2021

Fine-tuning and training of densenet for histopathology image representation using tcga diagnostic slides

Abtin Riasatian, Morteza Babaie, Danial Maleki, and 8 more authors

Medical Image Analysis, 2021

Abs arXiv Code Website

Feature vectors provided by pre-trained deep artificial neural networks have become a dominant source for image representation in recent literature. Their contribution to the performance of image analysis can be improved through fine-tuning. As an ultimate solution, one might even train a deep network from scratch with the domain-relevant images, a highly desirable option which is generally impeded in pathology by lack of labeled images and the computational expense. In this study, we propose a new network, namely KimiaNet, that employs the topology of the DenseNet with four dense blocks, fine-tuned and trained with histopathology images in different configurations. We used more than 240,000 image patches with pixels acquired at 20magnification through our proposed “high-cellularity mosaic” approach to enable the usage of weak labels of 7126 whole slide images of formalin-fixed paraffin-embedded human pathology samples publicly available through The Cancer Genome Atlas (TCGA) repository. We tested KimiaNet using three public datasets, namely TCGA, endometrial cancer images, and colorectal cancer images by evaluating the performance of search and classification when corresponding features of different networks are used for image representation. As well, we designed and trained multiple convolutional batch-normalized ReLU (CBR) networks. The results show that KimiaNet provides superior results compared to the original DenseNet and smaller CBR networks when used as feature extractor to represent histopathology images.
Symbolicgpt: A generative transformer model for symbolic regression

Mojtaba Valipour, Bowen You, Maysum Panju, and 1 more author

arXiv preprint arXiv:2106.14131, 2021

Abs arXiv Code

Symbolic regression is the task of identifying a mathematical expression that best fits a provided dataset of input and output values. Due to the richness of the space of mathematical expressions, symbolic regression is generally a challenging problem. While conventional approaches based on genetic evolution algorithms have been used for decades, deep learning-based methods are relatively new and an active research area. In this work, we present SymbolicGPT, a novel transformer-based language model for symbolic regression. This model exploits the advantages of probabilistic language models like GPT, including strength in performance and flexibility. Through comprehensive experiments, we show that our model performs strongly compared to competing models with respect to the accuracy, running time, and data efficiency.

2016

Using Machine Learning approaches to detect opponent formation

Ehsan Asali, Mojtaba Valipour, Nader Zare, and 3 more authors

In 2016 Artificial Intelligence and Robotics (IRANOPEN), 2016

2015

Assessing the role of AR-based content in improving learning performance considering Felder-Silverman learning style

Maryam Tayefeh Mahmoudi, Kambiz Badie, and Mojtaba Valipour

In 2015 International Conference on Interactive Collaborative Learning (ICL), 2015