Update model card for Sapiens with architecture details

aca92cf verified about 1 year ago

1.5 kB

	---
	language: en
	license: cc-by-nc-4.0
	---

	# Sapiens-1b-torchscript

	## Model Card for Sapiens

	Sapiens is a family of vision transformers pretrained on 300 million human images at 1024 x 1024 image resolution. The pretrained models, when finetuned for human-centric vision tasks, generalize to in-the-wild conditions.

	## Model Details

	### Model Description
	Sapiens-1b natively support 1K high-resolution inference and are extremely easy to adapt for individual tasks by simply fine-tuning models pretrained on over 300 million in-the-wild human images. The resulting models exhibit remarkable generalization to in-the-wild data, even when labeled data is scarce or entirely synthetic. Our simple model design also brings scalability - model performance across tasks improves as we scale the parameters from 0.3 to 2 billion. Sapiens consistently surpasses existing baselines across various human-centric benchmarks.

	- Developed by: Meta
	- Model type: Vision Transformer
	- License: Creative Commons Attribution-NonCommercial 4.0
	- Model Size: 1b
	- Task: pretrain
	- Format: torchscript
	- File: sapiens_1b_epoch_173_torchscript.pt2



	### Model Sources

	- Repository: [https://github.com/facebookresearch/sapiens](https://github.com/facebookresearch/sapiens)
	- Paper: [https://arxiv.org/abs/2408.12569](https://arxiv.org/abs/2408.12569)

	## Uses

	Pretrained 1b model can be used for feature extraction, fine-tuning, or as a starting point for training new models.