Skip to content

Latest commit

 

History

History

mfac

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

Matrix-Free Approximate Curvature (M-FAC) Pruning

The paper Efficient Matrix-Free Approximations of Second-Order Information, with Applications to Pruning and Optimization written by Elias Frantar, Eldar Kurtic, and Assistant Professor Dan Alistarh of IST Austria introduces the Matrix-Free Approximate Curvature (M-FAC) method of pruning. M-FAC builds on advances from the WoodFisher pruning paper to efficiently use first-order information (gradients) to determine optimal weights to prune by approximating the corresponding second-order information. This algorithm is shown to outperform magnitude pruning as well as other second-order pruning techniques on a variety of one-shot and gradual pruning tasks.

Using M-FAC with SparseML

SparseML makes it easy to use the M-FAC pruning algorithm as part of sparsification recipes to improve pruning recovery by providing an MFACPruningModifier. The MFACPruningModifier contains the same settings as the magnitude pruning modifiers and contains extra settings for the M-FAC algorithm including num_grads, fisher_block_size, and available_gpus. Ideal values will depend on the system available to run on and model to be pruned.

Example M-FAC Recipe

The following is an example MFACPruningModifier to be used in place of other pruning modifiers in a recipe:

pruning_modifiers:
  - !MFACPruningModifier
    params: __ALL_PRUNABLE__
    init_sparsity: 0.05
    final_sparsity: 0.85
    start_epoch: 1.0
    end_epoch: 61.0
    update_frequency: 4.0
    num_grads: {0.0: 256, 0.5: 512, 0.75: 1024, 0.83: 1400}
    fisher_block_size: 10000
    available_gpus: ["cuda:0"]

num_grads

To approximate the second order information in the M-FAC algorithm, first order gradients are used. num_grads specifies the number of recent gradient samples to store of a model while training.

This value can be an int where that constant value will be used throughout pruning. Alternatively the value can be a dictionary of float sparsity values to the number of gradients that should be stored when that sparsity level (between 0.0 and 1.0) is reached. If a dictionary is used, then 0.0 must be included as a key for the base number of gradients to store (i.e. {0: 64, 0.5: 128, 0.75: 256}).

Storing gradients can be expensive, as for a dense model, each additional gradient sample stored requires about the same memory that the entire model needs. This is why the dictionary option allows for more gradients to be stored as the model gets more sparse.

If a M-FAC pruning run is unexpectedly killed, the reason could likely be that the gradient storage requirements exceeded the system's RAM. A safe rule of thumb for initial number of gradients is the number should be no greater than 1/4 of the available CPU RAM divided by the model size.

fisher_block_size

To limit the computational cost of calculating second order information, the M-FAC algorithm may compute a block diagonal matrix of a certain block size that is sufficient for generating the necessary information for pruning.

The fisher_block_size specifies this block size. If using GPUs to perform the M-FAC computations, the GPUs should have num_grads * fisher_block_size extra memory during training so each block can be stored and computed sequentially on a GPU.

The default block size is 2000, and generally block sizes between 1000 and 10000 may be ideal. If None is provided, the full matrix will be computed without blocks.

available_gpus

available_gpus is a list of GPU devices names to perform the WoodFisher computation with. If not provided, computation will be done on the CPU.

Tutorials

Tutorials for using M-FAC with SparseML are provided in the tutorials directory. Currently there are tutorials available for one-shot and gradual pruning with M-FAC.

Need Help?

For Neural Magic Support, sign up or log in to our Neural Magic Community Slack. Bugs, feature requests, or additional questions can also be posted to our GitHub Issue Queue.