Hervé Bredin
commited on
Commit
·
db94671
1
Parent(s):
e900ca9
feat: rename /paper to /reproducible_research
Browse files- README.md +23 -41
- {paper → reproducible_research}/dihard3_custom_split/development.txt +0 -0
- {paper → reproducible_research}/dihard3_custom_split/train.txt +0 -0
- {paper → reproducible_research}/expected_outputs/osd/AMI.development.rttm +0 -0
- {paper → reproducible_research}/expected_outputs/osd/AMI.test.rttm +0 -0
- {paper → reproducible_research}/expected_outputs/osd/DIHARD.development.rttm +0 -0
- {paper → reproducible_research}/expected_outputs/osd/DIHARD.test.rttm +0 -0
- {paper → reproducible_research}/expected_outputs/osd/VoxConverse.development.rttm +0 -0
- {paper → reproducible_research}/expected_outputs/osd/VoxConverse.test.rttm +0 -0
- {paper → reproducible_research}/expected_outputs/rsg/AMI.development.rttm +0 -0
- {paper → reproducible_research}/expected_outputs/rsg/AMI.test.rttm +0 -0
- {paper → reproducible_research}/expected_outputs/rsg/DIHARD.development.rttm +0 -0
- {paper → reproducible_research}/expected_outputs/rsg/DIHARD.test.rttm +0 -0
- {paper → reproducible_research}/expected_outputs/rsg/VoxConverse.development.rttm +0 -0
- {paper → reproducible_research}/expected_outputs/vad/AMI.development.rttm +0 -0
- {paper → reproducible_research}/expected_outputs/vad/AMI.test.rttm +0 -0
- {paper → reproducible_research}/expected_outputs/vad/DIHARD.development.rttm +0 -0
- {paper → reproducible_research}/expected_outputs/vad/DIHARD.test.rttm +0 -0
- {paper → reproducible_research}/expected_outputs/vad/VoxConverse.development.rttm +0 -0
- {paper → reproducible_research}/expected_outputs/vad/VoxConverse.test.rttm +0 -0
- {paper → reproducible_research}/expected_outputs/vbx/AMI.rttm +0 -0
- {paper → reproducible_research}/expected_outputs/vbx/DIHARD.rttm +0 -0
- {paper → reproducible_research}/expected_outputs/vbx/VoxConverse.rttm +0 -0
- {paper → reproducible_research}/report.pdf +0 -0
README.md
CHANGED
|
@@ -19,13 +19,9 @@ inference: false
|
|
| 19 |
|
| 20 |
# pyannote.audio // speaker segmentation
|
| 21 |
|
| 22 |
-
This model is described in the technical report *[End-to-end speaker segmentation for overlap-aware resegmentation](paper/report.pdf)*, by Hervé Bredin and Antoine Laurent.
|
| 23 |
-
|
| 24 |

|
| 25 |
|
| 26 |
-
|
| 27 |
-
|
| 28 |
-
If you use this model for academic research, please consider citing the `pyannote.audio` library:
|
| 29 |
|
| 30 |
```bibtex
|
| 31 |
@inproceedings{Bredin2020,
|
|
@@ -40,7 +36,8 @@ If you use this model for academic research, please consider citing the `pyannot
|
|
| 40 |
|
| 41 |
## Support
|
| 42 |
|
| 43 |
-
|
|
|
|
| 44 |
|
| 45 |
## Requirements
|
| 46 |
|
|
@@ -90,16 +87,6 @@ pipeline.instantiate(HYPER_PARAMETERS)
|
|
| 90 |
vad = pipeline("audio.wav")
|
| 91 |
```
|
| 92 |
|
| 93 |
-
In order to reproduce results of the [technical report](paper/report.pdf), one should use the following hyper-parameter values:
|
| 94 |
-
|
| 95 |
-
Dataset | `onset` | `offset` | `min_duration_on` | `min_duration_off`
|
| 96 |
-
----------------|---------|----------|-------------------|-------------------
|
| 97 |
-
AMI Mix-Headset | 0.851 | 0.430 | 0.115 | 0.146
|
| 98 |
-
DIHARD3 | 0.855 | 0.292 | 0.036 | 0.001
|
| 99 |
-
VoxConverse | 0.883 | 0.688 | 0.106 | 0.526
|
| 100 |
-
|
| 101 |
-
We also provide the [expected output](tree/main/paper/expected_outputs/vad) on those three datasets in RTTM format.
|
| 102 |
-
|
| 103 |
### Overlapped speech detection
|
| 104 |
|
| 105 |
```python
|
|
@@ -109,16 +96,6 @@ pipeline.instantiate(HYPER_PARAMETERS)
|
|
| 109 |
osd = pipeline("audio.wav")
|
| 110 |
```
|
| 111 |
|
| 112 |
-
In order to reproduce results of the [technical report](paper/report.pdf), one should use the following hyper-parameter values:
|
| 113 |
-
|
| 114 |
-
Dataset | `onset` | `offset` | `min_duration_on` | `min_duration_off`
|
| 115 |
-
----------------|---------|----------|-------------------|-------------------
|
| 116 |
-
AMI Mix-Headset | 0.552 | 0.311 | 0.131 | 0.180
|
| 117 |
-
DIHARD3 | 0.564 | 0.264 | 0.158 | 0.080
|
| 118 |
-
VoxConverse | 0.617 | 0.387 | 0.367 | 0.334
|
| 119 |
-
|
| 120 |
-
We also provide the [expected output](tree/main/paper/expected_outputs/osd) on those three datasets in RTTM format.
|
| 121 |
-
|
| 122 |
### Resegmentation
|
| 123 |
|
| 124 |
```python
|
|
@@ -126,27 +103,32 @@ from pyannote.audio.pipelines import Resegmentation
|
|
| 126 |
pipeline = Resegmentation(segmentation="pyannote/segmentation",
|
| 127 |
diarization="baseline")
|
| 128 |
pipeline.instantiate(HYPER_PARAMETERS)
|
|
|
|
|
|
|
| 129 |
```
|
| 130 |
|
| 131 |
-
|
|
|
|
|
|
|
|
|
|
| 132 |
|
| 133 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 134 |
----------------|---------|----------|-------------------|-------------------
|
| 135 |
AMI Mix-Headset | 0.542 | 0.527 | 0.044 | 0.705
|
| 136 |
DIHARD3 | 0.592 | 0.489 | 0.163 | 0.182
|
| 137 |
VoxConverse | 0.537 | 0.724 | 0.410 | 0.563
|
| 138 |
|
| 139 |
-
|
| 140 |
-
|
| 141 |
-
[VBx RTTM files](tree/main/paper/expected_outputs/vbx) are also provided in this repository for convenience:
|
| 142 |
-
|
| 143 |
-
```python
|
| 144 |
-
from pyannote.database.utils import load_rttm
|
| 145 |
-
vbx = load_rttm("paper/expected_outputs/vbx/DIHARD.rttm")
|
| 146 |
-
resegmented_vbx = pipeline({"audio": "DH_EVAL_000.wav",
|
| 147 |
-
"baseline": vbx["DH_EVAL_000"]})
|
| 148 |
-
```
|
| 149 |
-
|
| 150 |
-
|
| 151 |
-
We also provide the [expected output](tree/main/paper/expected_outputs/rsg) on those three datasets in RTTM format.
|
| 152 |
|
|
|
|
| 19 |
|
| 20 |
# pyannote.audio // speaker segmentation
|
| 21 |
|
|
|
|
|
|
|
| 22 |

|
| 23 |
|
| 24 |
+
Model from *[End-to-end speaker segmentation for overlap-aware resegmentation](reproducible_research/report.pdf)*, by Hervé Bredin and Antoine Laurent.
|
|
|
|
|
|
|
| 25 |
|
| 26 |
```bibtex
|
| 27 |
@inproceedings{Bredin2020,
|
|
|
|
| 36 |
|
| 37 |
## Support
|
| 38 |
|
| 39 |
+
For commercial enquiries and scientific consulting, please contact [me](mailto:[email protected]).
|
| 40 |
+
For [technical questions](https://github.com/pyannote/pyannote-audio/discussions) and [bug reports](https://github.com/pyannote/pyannote-audio/issues), please check [pyannote.audio](https://github.com/pyannote/pyannote-audio) Github repository.
|
| 41 |
|
| 42 |
## Requirements
|
| 43 |
|
|
|
|
| 87 |
vad = pipeline("audio.wav")
|
| 88 |
```
|
| 89 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 90 |
### Overlapped speech detection
|
| 91 |
|
| 92 |
```python
|
|
|
|
| 96 |
osd = pipeline("audio.wav")
|
| 97 |
```
|
| 98 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 99 |
### Resegmentation
|
| 100 |
|
| 101 |
```python
|
|
|
|
| 103 |
pipeline = Resegmentation(segmentation="pyannote/segmentation",
|
| 104 |
diarization="baseline")
|
| 105 |
pipeline.instantiate(HYPER_PARAMETERS)
|
| 106 |
+
resegmented_baseline = pipeline({"audio": "audio.wav", "baseline": baseline})
|
| 107 |
+
# where `baseline` should be provided as a pyannote.core.Annotation instance
|
| 108 |
```
|
| 109 |
|
| 110 |
+
## Reproducible research
|
| 111 |
+
|
| 112 |
+
In order to reproduce the results of the paper ["End-to-end speaker segmentation for overlap-aware resegmentation
|
| 113 |
+
"](reproducible_research/report.pdf), use the following hyper-parameters:
|
| 114 |
|
| 115 |
+
Voice activity detection | `onset` | `offset` | `min_duration_on` | `min_duration_off`
|
| 116 |
+
----------------|---------|----------|-------------------|-------------------
|
| 117 |
+
AMI Mix-Headset | 0.851 | 0.430 | 0.115 | 0.146
|
| 118 |
+
DIHARD3 | 0.855 | 0.292 | 0.036 | 0.001
|
| 119 |
+
VoxConverse | 0.883 | 0.688 | 0.106 | 0.526
|
| 120 |
+
|
| 121 |
+
Overlapped speech detection | `onset` | `offset` | `min_duration_on` | `min_duration_off`
|
| 122 |
+
----------------|---------|----------|-------------------|-------------------
|
| 123 |
+
AMI Mix-Headset | 0.552 | 0.311 | 0.131 | 0.180
|
| 124 |
+
DIHARD3 | 0.564 | 0.264 | 0.158 | 0.080
|
| 125 |
+
VoxConverse | 0.617 | 0.387 | 0.367 | 0.334
|
| 126 |
+
|
| 127 |
+
VBx resegmentation | `onset` | `offset` | `min_duration_on` | `min_duration_off`
|
| 128 |
----------------|---------|----------|-------------------|-------------------
|
| 129 |
AMI Mix-Headset | 0.542 | 0.527 | 0.044 | 0.705
|
| 130 |
DIHARD3 | 0.592 | 0.489 | 0.163 | 0.182
|
| 131 |
VoxConverse | 0.537 | 0.724 | 0.410 | 0.563
|
| 132 |
|
| 133 |
+
Expected outputs (and VBx baseline) are also provided in the `/reproducible_research` sub-directories.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 134 |
|
{paper → reproducible_research}/dihard3_custom_split/development.txt
RENAMED
|
File without changes
|
{paper → reproducible_research}/dihard3_custom_split/train.txt
RENAMED
|
File without changes
|
{paper → reproducible_research}/expected_outputs/osd/AMI.development.rttm
RENAMED
|
File without changes
|
{paper → reproducible_research}/expected_outputs/osd/AMI.test.rttm
RENAMED
|
File without changes
|
{paper → reproducible_research}/expected_outputs/osd/DIHARD.development.rttm
RENAMED
|
File without changes
|
{paper → reproducible_research}/expected_outputs/osd/DIHARD.test.rttm
RENAMED
|
File without changes
|
{paper → reproducible_research}/expected_outputs/osd/VoxConverse.development.rttm
RENAMED
|
File without changes
|
{paper → reproducible_research}/expected_outputs/osd/VoxConverse.test.rttm
RENAMED
|
File without changes
|
{paper → reproducible_research}/expected_outputs/rsg/AMI.development.rttm
RENAMED
|
File without changes
|
{paper → reproducible_research}/expected_outputs/rsg/AMI.test.rttm
RENAMED
|
File without changes
|
{paper → reproducible_research}/expected_outputs/rsg/DIHARD.development.rttm
RENAMED
|
File without changes
|
{paper → reproducible_research}/expected_outputs/rsg/DIHARD.test.rttm
RENAMED
|
File without changes
|
{paper → reproducible_research}/expected_outputs/rsg/VoxConverse.development.rttm
RENAMED
|
File without changes
|
{paper → reproducible_research}/expected_outputs/vad/AMI.development.rttm
RENAMED
|
File without changes
|
{paper → reproducible_research}/expected_outputs/vad/AMI.test.rttm
RENAMED
|
File without changes
|
{paper → reproducible_research}/expected_outputs/vad/DIHARD.development.rttm
RENAMED
|
File without changes
|
{paper → reproducible_research}/expected_outputs/vad/DIHARD.test.rttm
RENAMED
|
File without changes
|
{paper → reproducible_research}/expected_outputs/vad/VoxConverse.development.rttm
RENAMED
|
File without changes
|
{paper → reproducible_research}/expected_outputs/vad/VoxConverse.test.rttm
RENAMED
|
File without changes
|
{paper → reproducible_research}/expected_outputs/vbx/AMI.rttm
RENAMED
|
File without changes
|
{paper → reproducible_research}/expected_outputs/vbx/DIHARD.rttm
RENAMED
|
File without changes
|
{paper → reproducible_research}/expected_outputs/vbx/VoxConverse.rttm
RENAMED
|
File without changes
|
{paper → reproducible_research}/report.pdf
RENAMED
|
File without changes
|