Update README.md (#2)
Browse files- Update README.md (03264fce1fb06e8e03e95f2d80e578b2da0667d6)
- Update README.md (57e3f36085a4c38e0fc67cb4ff092483292b589f)
    	
        README.md
    CHANGED
    
    | @@ -1,25 +1,23 @@ | |
| 1 | 
             
            ---
         | 
| 2 | 
            -
            license: apache-2.0
         | 
| 3 | 
             
            tags:
         | 
| 4 | 
             
            - computer_vision
         | 
| 5 | 
             
            - pose_estimation
         | 
|  | |
|  | |
| 6 | 
             
            ---
         | 
| 7 |  | 
| 8 | 
            -
             | 
| 9 |  | 
|  | |
| 10 |  | 
| 11 | 
            -
            -  | 
| 12 | 
            -
             | 
| 13 | 
            -
            please contact EPFL-TTO (https://tto.epfl.ch/) for a full commercial license.
         | 
| 14 |  | 
| 15 | 
            -
             | 
| 16 |  | 
| 17 | 
            -
             | 
| 18 | 
            -
             | 
| 19 | 
            -
             | 
| 20 | 
            -
            This model was trained a dataset called "Quadrupred-40K." It was trained in Tensorflow 2 within the [DeepLabCut framework](www.deeplabcut.org). 
         | 
| 21 | 
            -
            Full training details can be found in Ye et al. 2023, but in brief, this was trained with **DLCRNet** as introduced in [Lauer et al 2022 Nature Methods](https://www.nature.com/articles/s41592-022-01443-0).
         | 
| 22 | 
            -
            You can use this model simply with our light-weight loading package called [DLCLibrary](https://github.com/DeepLabCut/DLClibrary). Here is an example useage:
         | 
| 23 |  | 
| 24 | 
             
            ```python
         | 
| 25 | 
             
            from pathlib import Path
         | 
| @@ -31,60 +29,128 @@ model_dir.mkdir() | |
| 31 | 
             
            download_huggingface_model("superanimal_quadruped", model_dir)
         | 
| 32 | 
             
            ```
         | 
| 33 |  | 
| 34 | 
            -
             | 
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
| 35 |  | 
| 36 | 
             
            It consists of being trained together on the following datasets:
         | 
| 37 |  | 
| 38 | 
            -
            - **AwA-Pose** Quadruped dataset, see full details at ( | 
| 39 | 
            -
            - **AnimalPose** See full details at ( | 
| 40 | 
            -
            - **AcinoSet** See full details at ( | 
| 41 | 
            -
            - **Horse-30** Horse-30 dataset, benchmark task is called Horse-10; See full details at ( | 
| 42 | 
            -
            - **StanfordDogs** See full details at ( | 
| 43 | 
            -
            - **AP-10K** See full details at ( | 
| 44 | 
             
            - **iRodent** We utilized the iNaturalist API functions for scraping observations
         | 
| 45 | 
            -
            with the taxon ID of Suborder Myomorpha ( | 
| 46 | 
             
            ones with photos under the CC BY-NC creative license. The most common types of rodents from the collected observations are
         | 
| 47 | 
             
            Muskrat (Ondatra zibethicus), Brown Rat (Rattus norvegicus), House Mouse (Mus musculus), Black Rat (Rattus rattus), Hispid
         | 
| 48 | 
             
            Cotton Rat (Sigmodon hispidus), Meadow Vole (Microtus pennsylvanicus), Bank Vole (Clethrionomys glareolus), Deer Mouse
         | 
| 49 | 
             
            (Peromyscus maniculatus), White-footed Mouse (Peromyscus leucopus), Striped Field Mouse (Apodemus agrarius). We then
         | 
| 50 | 
             
            generated segmentation masks over target animals in the data by processing the media through an algorithm we designed that
         | 
| 51 | 
            -
            uses a Mask Region Based Convolutional Neural Networks(Mask R-CNN) ( | 
| 52 | 
            -
            pretrained on the COCO datasets ( | 
| 53 | 
            -
            segmentation masks.
         | 
| 54 | 
            -
             | 
| 55 | 
            -
            Here is an image with the keypoint guide, the distribution of images per dataset, and examples from the datasets inferenced with a model trained with less data for benchmarking as in Ye et al 2023. 
         | 
| 56 | 
            -
            Thereby note that performance of this model we are releasing has comporable or higher performance. 
         | 
| 57 | 
            -
             | 
| 58 | 
            -
            Please note that each dataest was labeled by separate labs & seperate individuals, therefore while we map names
         | 
| 59 | 
            -
            to a unified pose vocabulary, there will be annotator bias in keypoint placement (See Ye et al. 2023 for our Supplementary Note on annotator bias). 
         | 
| 60 | 
            -
            You will also note the dataset is highly diverse across species, but collectively has more representation of domesticated animals like dogs, cats, horses, and cattle. 
         | 
| 61 | 
            -
            We recommend if performance is not as good as you need it to be, first try video adaptation (see Ye et al. 2023), 
         | 
| 62 | 
            -
            or fine-tune these weights with your own labeling. 
         | 
| 63 |  | 
|  | |
| 64 | 
             
            <p align="center">
         | 
| 65 | 
             
            <img src="https://images.squarespace-cdn.com/content/v1/57f6d51c9f74566f55ecf271/1690988780004-AG00N6OU1R21MZ0AU9RE/modelcard-SAQ.png?format=1500w" width="95%">
         | 
| 66 | 
             
            </p>
         | 
| 67 |  | 
| 68 |  | 
| 69 | 
            -
             | 
| 70 | 
            -
             | 
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
| 71 | 
             
            2019 IEEE/CVF International Conference on Computer Vision (ICCV), pages 9497–9506, 2019.
         | 
| 72 | 
            -
             | 
| 73 | 
             
            A 3d pose estimation dataset and baseline models for cheetahs in the wild. 2021 IEEE International Conference on Robotics and Automation
         | 
| 74 | 
             
            (ICRA), pages 13901–13908, 2021.
         | 
| 75 | 
            -
             | 
| 76 | 
             
            boosts out-of-domain robustness for pose estimation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision,
         | 
| 77 | 
             
            pages 1859–1868, 2021.
         | 
| 78 | 
            -
             | 
| 79 | 
             
            on Fine-Grained Visual Categorization, IEEE Conference on Computer Vision and Pattern Recognition, Colorado Springs, CO, June 2011.
         | 
| 80 | 
            -
             | 
| 81 | 
             
            animals from video. In Asian Conference on Computer Vision, pages 3–19. Springer, 2018.
         | 
| 82 | 
            -
             | 
| 83 | 
             
            Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2), 2021.
         | 
| 84 | 
            -
             | 
| 85 | 
            -
             | 
| 86 | 
             
            vision, pages 2961–2969, 2017.
         | 
| 87 | 
            -
             | 
| 88 | 
            -
             | 
| 89 | 
            -
            and C. Lawrence Zitnick. Microsoft COCO: common objects in context. CoRR, abs/1405.0312, 2014
         | 
| 90 | 
            -
             | 
|  | |
| 1 | 
             
            ---
         | 
|  | |
| 2 | 
             
            tags:
         | 
| 3 | 
             
            - computer_vision
         | 
| 4 | 
             
            - pose_estimation
         | 
| 5 | 
            +
            - animal_pose_estimation
         | 
| 6 | 
            +
            - deeplabcut
         | 
| 7 | 
             
            ---
         | 
| 8 |  | 
| 9 | 
            +
            # MODEL CARD:
         | 
| 10 |  | 
| 11 | 
            +
            ## Model Details
         | 
| 12 |  | 
| 13 | 
            +
            • SuperAnimal-Quadruped model developed by the [M.W.Mathis Lab](http://www.mackenziemathislab.org/) in 2023, trained to predict quadruped pose from images. 
         | 
| 14 | 
            +
            Please see [Shaokai Ye et al. 2023](https://arxiv.org/abs/2203.07436) for details.
         | 
|  | |
| 15 |  | 
| 16 | 
            +
            • The model is an HRNet-w32 trained on our Quadruped-80K dataset.
         | 
| 17 |  | 
| 18 | 
            +
            • It was trained within the DeepLabCut framework. Full training details can be found in Ye et al. 2023.
         | 
| 19 | 
            +
            You can use this model simply with our light-weight loading package called [DLCLibrary](https://github.com/DeepLabCut/DLClibrary). 
         | 
| 20 | 
            +
            Here is an example useage:
         | 
|  | |
|  | |
|  | |
| 21 |  | 
| 22 | 
             
            ```python
         | 
| 23 | 
             
            from pathlib import Path
         | 
|  | |
| 29 | 
             
            download_huggingface_model("superanimal_quadruped", model_dir)
         | 
| 30 | 
             
            ```
         | 
| 31 |  | 
| 32 | 
            +
            ## Intended Use
         | 
| 33 | 
            +
            • Intended to be used for pose estimation of quadruped images taken from side-view. The model serves a better starting
         | 
| 34 | 
            +
            point than ImageNet weights in downstream datasets such as AP-10K.
         | 
| 35 | 
            +
             | 
| 36 | 
            +
            • Intended for academic and research professionals working in fields related to animal behavior, such as neuroscience
         | 
| 37 | 
            +
            and ecology.
         | 
| 38 | 
            +
             | 
| 39 | 
            +
            • Not suitable as a zeros-shot model for applications that require high keypiont precision, but can be fine-tuned with
         | 
| 40 | 
            +
            minimal data to reach human-level accuracy. Also not suitable for videos that look dramatically different from those
         | 
| 41 | 
            +
            we show in the paper.
         | 
| 42 | 
            +
             | 
| 43 | 
            +
            ## Factors
         | 
| 44 | 
            +
             | 
| 45 | 
            +
            • Based on the known robustness issues of neural networks, the relevant factors include the lighting, contrast and
         | 
| 46 | 
            +
            resolution of the video frames. The present of objects might also cause false detections and erroneous keypoints.
         | 
| 47 | 
            +
            When two or more animals are extremely close, it could cause the top-down detectors to only detect only one animal,
         | 
| 48 | 
            +
            if used without further fine-tuning or with a method such as BUCTD (Zhou et al. 2023 ICCV).
         | 
| 49 | 
            +
             | 
| 50 | 
            +
            ## Metrics
         | 
| 51 | 
            +
            • Mean Average Precision (mAP)
         | 
| 52 | 
            +
             | 
| 53 | 
            +
            ## Evaluation Data
         | 
| 54 | 
            +
            • In the paper we benchmark on AP-10K, AnimalPose, Horse-10, and iRodent using a leave-one-out strategy. Here,
         | 
| 55 | 
            +
            we provide the model that has been trained on all datasets (see below), therefore it should be considered “fine-tuned"
         | 
| 56 | 
            +
            on all animal training data listed below. This model is meant for production and evaluation in downstream scientific
         | 
| 57 | 
            +
            applications.
         | 
| 58 | 
            +
             | 
| 59 | 
            +
            ## Training Data:
         | 
| 60 |  | 
| 61 | 
             
            It consists of being trained together on the following datasets:
         | 
| 62 |  | 
| 63 | 
            +
            - **AwA-Pose** Quadruped dataset, see full details at (1).
         | 
| 64 | 
            +
            - **AnimalPose** See full details at (2).
         | 
| 65 | 
            +
            - **AcinoSet** See full details at (3).
         | 
| 66 | 
            +
            - **Horse-30** Horse-30 dataset, benchmark task is called Horse-10; See full details at (4).
         | 
| 67 | 
            +
            - **StanfordDogs** See full details at (5, 6).
         | 
| 68 | 
            +
            - **AP-10K** See full details at (7).
         | 
| 69 | 
             
            - **iRodent** We utilized the iNaturalist API functions for scraping observations
         | 
| 70 | 
            +
            with the taxon ID of Suborder Myomorpha (8). The functions allowed us to filter the large amount of observations down to the
         | 
| 71 | 
             
            ones with photos under the CC BY-NC creative license. The most common types of rodents from the collected observations are
         | 
| 72 | 
             
            Muskrat (Ondatra zibethicus), Brown Rat (Rattus norvegicus), House Mouse (Mus musculus), Black Rat (Rattus rattus), Hispid
         | 
| 73 | 
             
            Cotton Rat (Sigmodon hispidus), Meadow Vole (Microtus pennsylvanicus), Bank Vole (Clethrionomys glareolus), Deer Mouse
         | 
| 74 | 
             
            (Peromyscus maniculatus), White-footed Mouse (Peromyscus leucopus), Striped Field Mouse (Apodemus agrarius). We then
         | 
| 75 | 
             
            generated segmentation masks over target animals in the data by processing the media through an algorithm we designed that
         | 
| 76 | 
            +
            uses a Mask Region Based Convolutional Neural Networks(Mask R-CNN) (8) model with a ResNet-50-FPN backbone (9),
         | 
| 77 | 
            +
            pretrained on the COCO datasets (10). The processed 443 images were then manually labeled with both pose annotations and
         | 
| 78 | 
            +
            segmentation masks. iRodent data is banked at https://zenodo.org/record/8250392.
         | 
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
| 79 |  | 
| 80 | 
            +
            Here is an image with the keypoint guide:
         | 
| 81 | 
             
            <p align="center">
         | 
| 82 | 
             
            <img src="https://images.squarespace-cdn.com/content/v1/57f6d51c9f74566f55ecf271/1690988780004-AG00N6OU1R21MZ0AU9RE/modelcard-SAQ.png?format=1500w" width="95%">
         | 
| 83 | 
             
            </p>
         | 
| 84 |  | 
| 85 |  | 
| 86 | 
            +
            ## Ethical Considerations
         | 
| 87 | 
            +
             | 
| 88 | 
            +
            • No experimental data was collected for this model; all datasets used are cited.
         | 
| 89 | 
            +
             | 
| 90 | 
            +
            ## Caveats and Recommendations
         | 
| 91 | 
            +
             | 
| 92 | 
            +
            • The model may have reduced accuracy in scenarios with extremely varied lighting conditions or atypical animal
         | 
| 93 | 
            +
            characteristics not well-represented in the training data.
         | 
| 94 | 
            +
             | 
| 95 | 
            +
            • Please note that each dataest was labeled by separate labs & separate individuals, therefore while we map names to a
         | 
| 96 | 
            +
            unified pose vocabulary, there will be annotator bias in keypoint placement (See Ye et al. 2023 for our Supplementary
         | 
| 97 | 
            +
            Note on annotator bias). 
         | 
| 98 | 
            +
             | 
| 99 | 
            +
            • Note the dataset is highly diverse across species, but collectively has more
         | 
| 100 | 
            +
            representation of domesticated animals like dogs, cats, horses, and cattle. 
         | 
| 101 | 
            +
             | 
| 102 | 
            +
            • We recommend if performance is not as
         | 
| 103 | 
            +
            good as you need it to be, first try video adaptation (see Ye et al. 2023), or fine-tune these weights with your own
         | 
| 104 | 
            +
            labeling.
         | 
| 105 | 
            +
             | 
| 106 | 
            +
            ## License
         | 
| 107 | 
            +
             | 
| 108 | 
            +
            Modified MIT.
         | 
| 109 | 
            +
             | 
| 110 | 
            +
            Copyright 2023 by Mackenzie Mathis, Shaokai Ye, and contributors. 
         | 
| 111 | 
            +
             | 
| 112 | 
            +
            Permission is hereby granted to you (hereafter "LICENSEE") a fully-paid, non-exclusive,
         | 
| 113 | 
            +
            and non-transferable license for academic, non-commercial purposes only (hereafter “LICENSE”)
         | 
| 114 | 
            +
            to use the "MODEL" weights (hereafter "MODEL"), subject to the following conditions:
         | 
| 115 | 
            +
             | 
| 116 | 
            +
            The above copyright notice and this permission notice shall be included in all copies or substantial
         | 
| 117 | 
            +
            portions of the Software:
         | 
| 118 | 
            +
             | 
| 119 | 
            +
            This software may not be used to harm any animal deliberately.
         | 
| 120 | 
            +
             | 
| 121 | 
            +
            LICENSEE acknowledges that the MODEL is a research tool. 
         | 
| 122 | 
            +
            THE MODEL IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING 
         | 
| 123 | 
            +
            BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
         | 
| 124 | 
            +
            IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
         | 
| 125 | 
            +
            WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE MODEL
         | 
| 126 | 
            +
            OR THE USE OR OTHER DEALINGS IN THE MODEL.
         | 
| 127 | 
            +
             | 
| 128 | 
            +
            If this license is not appropriate for your application, please contact Prof. Mackenzie W. Mathis 
         | 
| 129 | 
            +
            ([email protected]) and/or the TTO office at EPFL ([email protected]) for a commercial use license.
         | 
| 130 | 
            +
             | 
| 131 | 
            +
            Please cite **Ye et al** if you use this model in your work https://arxiv.org/abs/2203.07436v2.
         | 
| 132 | 
            +
             | 
| 133 | 
            +
             | 
| 134 | 
            +
            ## References
         | 
| 135 | 
            +
             | 
| 136 | 
            +
            1. Prianka Banik, Lin Li, and Xishuang Dong. A novel dataset for keypoint detection of quadruped animals from images. ArXiv, abs/2108.13958, 2021
         | 
| 137 | 
            +
            2. Jinkun Cao, Hongyang Tang, Haoshu Fang, Xiaoyong Shen, Cewu Lu, and Yu-Wing Tai. Cross-domain adaptation for animal pose estimation.
         | 
| 138 | 
             
            2019 IEEE/CVF International Conference on Computer Vision (ICCV), pages 9497–9506, 2019.
         | 
| 139 | 
            +
            3. Daniel Joska, Liam Clark, Naoya Muramatsu, Ricardo Jericevich, Fred Nicolls, Alexander Mathis, Mackenzie W. Mathis, and Amir Patel. Acinoset:
         | 
| 140 | 
             
            A 3d pose estimation dataset and baseline models for cheetahs in the wild. 2021 IEEE International Conference on Robotics and Automation
         | 
| 141 | 
             
            (ICRA), pages 13901–13908, 2021.
         | 
| 142 | 
            +
            4. Alexander Mathis, Thomas Biasi, Steffen Schneider, Mert Yuksekgonul, Byron Rogers, Matthias Bethge, and Mackenzie W Mathis. Pretraining
         | 
| 143 | 
             
            boosts out-of-domain robustness for pose estimation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision,
         | 
| 144 | 
             
            pages 1859–1868, 2021.
         | 
| 145 | 
            +
            5. Aditya Khosla, Nityananda Jayadevaprakash, Bangpeng Yao, and Li Fei-Fei. Novel dataset for fine-grained image categorization. In First Workshop
         | 
| 146 | 
             
            on Fine-Grained Visual Categorization, IEEE Conference on Computer Vision and Pattern Recognition, Colorado Springs, CO, June 2011.
         | 
| 147 | 
            +
            6. Benjamin Biggs, Thomas Roddick, Andrew Fitzgibbon, and Roberto Cipolla. Creatures great and smal: Recovering the shape and motion of
         | 
| 148 | 
             
            animals from video. In Asian Conference on Computer Vision, pages 3–19. Springer, 2018.
         | 
| 149 | 
            +
            7. Hang Yu, Yufei Xu, Jing Zhang, Wei Zhao, Ziyu Guan, and Dacheng Tao. Ap-10k: A benchmark for animal pose estimation in the wild. In Thirty-fifth
         | 
| 150 | 
             
            Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2), 2021.
         | 
| 151 | 
            +
            8. iNaturalist. OGBIF Occurrence Download. https://doi.org/10.15468/dl.p7nbxt. iNaturalist, July 2020
         | 
| 152 | 
            +
            9. Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. Mask r-cnn. In Proceedings of the IEEE international conference on computer
         | 
| 153 | 
             
            vision, pages 2961–2969, 2017.
         | 
| 154 | 
            +
            10. Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. Feature pyramid networks for object detection, 2016.
         | 
| 155 | 
            +
            11. Tsung-Yi Lin, Michael Maire, Serge J. Belongie, Lubomir D. Bourdev, Ross B. Girshick, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll’ar,
         | 
| 156 | 
            +
            and C. Lawrence Zitnick. Microsoft COCO: common objects in context. CoRR, abs/1405.0312, 2014
         | 
|  | 
