facebook
/

sapiens-pretrain-0.6b-torchscript

Image Feature Extraction

Model card Files Files and versions

sapiens-pretrain-0.6b-torchscript / README.md

rawalkhirodkar's picture

Update model card for Sapiens

5b44a45 verified about 1 year ago

|

1.51 kB

	---
	language: en
	license: cc-by-nc-4.0
	---

	# Sapiens-0.6b-torchscript

	## Model Card for Sapiens

	Sapiens is a family of vision transformers pretrained on 300 million human images at 1024 x 1024 image resolution. The pretrained models, when finetuned for human-centric vision tasks, generalize to in-the-wild conditions.

	## Model Details

	### Model Description
	Sapiens-0.6b natively support 1K high-resolution inference and are extremely easy to adapt for individual tasks by simply fine-tuning models pretrained on over 300 million in-the-wild human images. The resulting models exhibit remarkable generalization to in-the-wild data, even when labeled data is scarce or entirely synthetic. Our simple model design also brings scalability - model performance across tasks improves as we scale the parameters from 0.3 to 2 billion. Sapiens consistently surpasses existing baselines across various human-centric benchmarks.

	- Developed by: Meta
	- Model type: Vision Transformers
	- License: Creative Commons Attribution-NonCommercial 4.0
	- Model Size: 0.6b
	- Task: pretrain
	- Format: torchscript
	- File: sapiens_0.6b_epoch_1600_torchscript.pt2

	### Model Sources

	- Repository: [https://github.com/facebookresearch/sapiens](https://github.com/facebookresearch/sapiens)
	- Paper: [https://arxiv.org/abs/2408.12569](https://arxiv.org/abs/2408.12569)

	## Uses

	Pretrained 0.6b model can be used for feature extraction, fine-tuning, or as a starting point for training new models.