Is `consolidated.safetensors` needed when using sharded weights for inference?

#11
by bugwei - opened

I noticed that the repository includes both sharded and consolidated versions of the model weights (model-0000n-of-00010.safetensors and consolidated.safetensors).
I’m using Hugging Face Transformers to run inference, and as far as I know, Transformers usually loads sharded weights by default.
Could anyone clarify in what scenario the consolidated file is needed?
Given my use case (inference with Transformers using the sharded weights), would it be safe to remove the consolidated file from the repo?
The reason I’m asking is that the disk usage appeared to be significantly larger than expected based on the model size, and I'd like to reduce the repository size if possible.

Thanks in advance!

Sign up or log in to comment