Tim77777767 commited on
Commit
1e86821
·
1 Parent(s): 75346d6

REAME.md created

Browse files
Files changed (2) hide show
  1. .gitattributes +33 -0
  2. README.md +102 -0
.gitattributes CHANGED
@@ -1,3 +1,36 @@
1
  *.bin filter=lfs diff=lfs merge=lfs -text
2
  *.pth filter=lfs diff=lfs merge=lfs -text
3
  *.png filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  *.bin filter=lfs diff=lfs merge=lfs -text
2
  *.pth filter=lfs diff=lfs merge=lfs -text
3
  *.png filter=lfs diff=lfs merge=lfs -text
4
+ *.7z filter=lfs diff=lfs merge=lfs -text
5
+ *.arrow filter=lfs diff=lfs merge=lfs -text
6
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
7
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
8
+ *.ftz filter=lfs diff=lfs merge=lfs -text
9
+ *.gz filter=lfs diff=lfs merge=lfs -text
10
+ *.h5 filter=lfs diff=lfs merge=lfs -text
11
+ *.joblib filter=lfs diff=lfs merge=lfs -text
12
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
13
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
14
+ *.model filter=lfs diff=lfs merge=lfs -text
15
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
16
+ *.npy filter=lfs diff=lfs merge=lfs -text
17
+ *.npz filter=lfs diff=lfs merge=lfs -text
18
+ *.onnx filter=lfs diff=lfs merge=lfs -text
19
+ *.ot filter=lfs diff=lfs merge=lfs -text
20
+ *.parquet filter=lfs diff=lfs merge=lfs -text
21
+ *.pb filter=lfs diff=lfs merge=lfs -text
22
+ *.pickle filter=lfs diff=lfs merge=lfs -text
23
+ *.pkl filter=lfs diff=lfs merge=lfs -text
24
+ *.pt filter=lfs diff=lfs merge=lfs -text
25
+ *.rar filter=lfs diff=lfs merge=lfs -text
26
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
27
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
29
+ *.tar filter=lfs diff=lfs merge=lfs -text
30
+ *.tflite filter=lfs diff=lfs merge=lfs -text
31
+ *.tgz filter=lfs diff=lfs merge=lfs -text
32
+ *.wasm filter=lfs diff=lfs merge=lfs -text
33
+ *.xz filter=lfs diff=lfs merge=lfs -text
34
+ *.zip filter=lfs diff=lfs merge=lfs -text
35
+ *.zst filter=lfs diff=lfs merge=lfs -text
36
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,102 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # SegFormer++
2
+
3
+ Paper: [Segformer++: Efficient Token-Merging Strategies for High-Resolution Semantic Segmentation](https://arxiv.org/abs/2405.14467)
4
+
5
+ ![image](docs/figures/segmentation.png)
6
+
7
+ ![image](docs/figures/pose.png)
8
+
9
+ ## Abstract
10
+
11
+ Utilizing transformer architectures for semantic segmentation of high-resolution images is hindered by the attention's quadratic computational complexity in the number of tokens. A solution to this challenge involves decreasing the number of tokens through token merging, which has exhibited remarkable enhancements in inference speed, training efficiency, and memory utilization for image classification tasks. In this paper, we explore various token merging strategies within the framework of the SegFormer architecture and perform experiments on multiple semantic segmentation and human pose estimation datasets. Notably, without model re-training, we, for example, achieve an inference acceleration of 61% on the Cityscapes dataset while maintaining the mIoU performance. Consequently, this paper facilitates the deployment of transformer-based architectures on resource-constrained devices and in real-time applications.
12
+
13
+ ## Results and Models
14
+
15
+ Memory refers to the VRAM requirements during the training process.
16
+
17
+ ### Inference on Cityscapes (MiT-B5)
18
+
19
+ The weights of the Segformer (Original) model were used to get the inference results.
20
+
21
+ | Method | mIoU | Speed-Up | config | download |
22
+ |-----------------------------------|------:|---------:|--------------------------------------------------------------------------------------------|----------------------------------------------------------------|
23
+ | Segformer (Original) | 82.39 | - | [config](mmsegmentation/local_configs/cityscapes/B5/segformer-cityscapes-b5-default.py) | [model](https://mediastore.rz.uni-augsburg.de/get/yzE65lzm6N/) |
24
+ | Segformer++<sub>HQ</sub> (ours) | 82.31 | 1.61 | [config](mmsegmentation/local_configs/cityscapes/B5/segformer-cityscapes-b5-bsm-hq.py) | [model](https://mediastore.rz.uni-augsburg.de/get/yzE65lzm6N/) |
25
+ | Segformer++<sub>fast</sub> (ours) | 82.04 | 1.94 | [config](mmsegmentation/local_configs/cityscapes/B5/segformer-cityscapes-b5-bsm-fast.py) | [model](https://mediastore.rz.uni-augsburg.de/get/yzE65lzm6N/) |
26
+ | Segformer++<sub>2x2</sub> (ours) | 81.96 | 1.90 | [config](mmsegmentation/local_configs/cityscapes/B5/segformer-cityscapes-b5-n2d-2x2.py) | [model](https://mediastore.rz.uni-augsburg.de/get/yzE65lzm6N/) |
27
+ | Segformer (Downsampling) | 77.31 | 6.51 | [config](mmsegmentation/local_configs/cityscapes/B5/segformer-cityscapes-b5-downsample.py) | [model](https://mediastore.rz.uni-augsburg.de/get/yzE65lzm6N/) |
28
+
29
+ ### Training on Cityscapes (MiT-B5)
30
+
31
+ | Method | mIoU | Speed-Up | Memory (GB) | config | download |
32
+ |-----------------------------------|------:|---------:|-------------|--------------------------------------------------------------------------------------------|-----------------------------------------------------------------|
33
+ | Segformer (Original) | 82.39 | - | 48.3 | [config](mmsegmentation/local_configs/cityscapes/B5/segformer-cityscapes-b5-default.py) | [model](https://mediastore.rz.uni-augsburg.de/get/yzE65lzm6N/) |
34
+ | Segformer++<sub>HQ</sub> (ours) | 82.19 | 1.40 | 34.0 | [config](mmsegmentation/local_configs/cityscapes/B5/segformer-cityscapes-b5-bsm-hq.py) | [model](https://mediastore.rz.uni-augsburg.de/get/i8fY8uXJrV/ ) |
35
+ | Segformer++<sub>fast</sub> (ours) | 81.77 | 1.55 | 30.5 | [config](mmsegmentation/local_configs/cityscapes/B5/segformer-cityscapes-b5-bsm-fast.py) | [model](https://mediastore.rz.uni-augsburg.de/get/cmG974iAxt/ ) |
36
+ | Segformer++<sub>2x2</sub> (ours) | 82.38 | 1.63 | 31.1 | [config](mmsegmentation/local_configs/cityscapes/B5/segformer-cityscapes-b5-n2d-2x2.py) | [model](https://mediastore.rz.uni-augsburg.de/get/p0uMKbw531/) |
37
+ | Segformer (Downsampling) | 79.24 | 2.95 | 10.0 | [config](mmsegmentation/local_configs/cityscapes/B5/segformer-cityscapes-b5-downsample.py) | [model](https://mediastore.rz.uni-augsburg.de/get/73zkKSO21t/) |
38
+
39
+ ### Training on ADE20K (640x640) (MiT-B5)
40
+
41
+ | Method | mIoU | Speed-Up | Memory (GB) | config | download |
42
+ |-----------------------------------|------:|---------:|------------:|---------------------------------------------------------------------------------------|----------------------------------------------------------------|
43
+ | Segformer (Original) | 49.72 | - | 33.7 | [config](mmsegmentation/local_configs/ade20k/B5/segformer-ade20k640-b5-default.py) | [model](https://mediastore.rz.uni-augsburg.de/get/nKEjUHNAfK/) |
44
+ | Segformer++<sub>HQ</sub> (ours) | 49.77 | 1.15 | 29.2 | [config](mmsegmentation/local_configs/ade20k/B5/segformer-ade20k640-b5-bsm-hq.py) | [model](https://mediastore.rz.uni-augsburg.de/get/Odyie8usgj/) |
45
+ | Segformer++<sub>fast</sub> (ours) | 49.10 | 1.20 | 28.0 | [config](mmsegmentation/local_configs/ade20k/B5/segformer-ade20k640-b5-bsm-fast.py) | [model](https://mediastore.rz.uni-augsburg.de/get/K0IGkx4O2s/) |
46
+ | Segformer++<sub>2x2</sub> (ours) | 49.35 | 1.26 | 27.2 | [config](mmsegmentation/local_configs/ade20k/B5/segformer-ade20k640-b5-n2d-2x2.py) | [model](https://mediastore.rz.uni-augsburg.de/get/w5_Pxx4Q5C/) |
47
+ | Segformer (Downsampling) | 46.71 | 1.89 | 12.4 | [config](mmsegmentation/local_configs/ade20k/B5/segformer-ade20k640-b5-downsample.py) | [model](https://mediastore.rz.uni-augsburg.de/get/dFVvZQL6iL/) |
48
+
49
+ ### Training on JBD
50
+
51
+ | Method | [email protected] | [email protected] | Speed-Up | Memory (GB) | config | download |
52
+ |-----------------------------------|--------:|---------:|---------:|------------:|---------------------------------------------------------------------|----------------------------------------------------------------|
53
+ | Segformer (Original) | 95.20 | 90.65 | - | 40.0 | [config](mmpose/local_configs/jbd/B5/segformer-jump-b5-default.py) | [model](https://mediastore.rz.uni-augsburg.de/get/psolrWXLLp/) |
54
+ | Segformer++<sub>HQ</sub> (ours) | 95.18 | 90.51 | 1.19 | 36.0 | [config](mmpose/local_configs/jbd/B5/segformer-jump-b5-bsm-hq.py) | [model](https://mediastore.rz.uni-augsburg.de/get/jx1eyecMLF/) |
55
+ | Segformer++<sub>fast</sub> (ours) | 94.58 | 89.87 | 1.25 | 34.6 | [config](mmpose/local_configs/jbd/B5/segformer-jump-b5-bsm-fast.py) | [model](https://mediastore.rz.uni-augsburg.de/get/K0IGkx4O2s/) |
56
+ | Segformer++<sub>2x2</sub> (ours) | 95.17 | 90.16 | 1.27 | 33.4 | [config](mmpose/local_configs/jbd/B5/segformer-jump-b5-n2d-2x2.py) | [model](https://mediastore.rz.uni-augsburg.de/get/HumKbSB1vI/) |
57
+
58
+ ### Training on MS COCO
59
+
60
+ | Method | [email protected] | [email protected] | Speed-Up | Memory (GB) | config | download |
61
+ |-----------------------------------|--------:|---------:|---------:|------------:|----------------------------------------------------------------------|----------------------------------------------------------------|
62
+ | Segformer (Original) | 95.16 | 87.61 | - | 13.5 | [config](mmpose/local_configs/coco/B5/segformer-coco-b5-default.py) | [model](https://mediastore.rz.uni-augsburg.de/get/ZOgj2NmQLy/) |
63
+ | Segformer++<sub>HQ</sub> (ours) | 94.97 | 87.35 | 0.97 | 13.1 | [config](mmpose/local_configs/coco/B5/segformer-coco-b5-bsm-hq.py) | [model](https://mediastore.rz.uni-augsburg.de/get/oAH5IlPxG8/) |
64
+ | Segformer++<sub>fast</sub> (ours) | 95.02 | 87.37 | 0.99 | 12.9 | [config](mmpose/local_configs/coco/B5/segformer-coco-b5-bsm-fast.py) | [model](https://mediastore.rz.uni-augsburg.de/get/3E2mMNLAAn/) |
65
+ | Segformer++<sub>2x2</sub> (ours) | 94.98 | 87.36 | 1.24 | 12.3 | [config](mmpose/local_configs/coco/B5/segformer-coco-b5-n2d-2x2.py) | [model](https://mediastore.rz.uni-augsburg.de/get/rzlgKC5XLc/) |
66
+
67
+ ## Install the SegFormer++ without MMSegmentation/MMPose
68
+
69
+ **Step 0.** Prerequisites
70
+
71
+ - Pytorch: 2.3 (CUDA 12.1) (older versions should also work fine)
72
+
73
+ **Step 1.** Clone Repository
74
+
75
+ git clone git clone https://huggingface.co/TimM77/SegformerPlusPlus
76
+
77
+
78
+ **Step 2.** Install required Packets
79
+
80
+ ```shell
81
+ cd SegformerPlusPlus
82
+ pip install .
83
+ ```
84
+
85
+ **Step 3.** Run the SegFormer++
86
+
87
+ Running the default Segformer++ with:
88
+ python3 -m segformer_plusplus.start_cityscape_benchmark
89
+
90
+ Running it with customized Parameters:
91
+ python3 -m segformer_plusplus.start_cityscape_benchmark --backbone [b1-b5] --head [bsm_hq, bsm_fast, n2d_2x2] --checkpoint [Path/To/Checkpoint]
92
+
93
+
94
+ ## Citation
95
+ ```bibtex
96
+ @article{kienzle2024segformer++,
97
+ title={Segformer++: Efficient Token-Merging Strategies for High-Resolution Semantic Segmentation},
98
+ author={Kienzle, Daniel and Kantonis, Marco and Sch{\"o}n, Robin and Lienhart, Rainer},
99
+ journal={IEEE International Conference on Multimedia Information Processing and Retrieval (MIPR)},
100
+ year={2024}
101
+ }
102
+ ```