setup: update most dependencies (#1323)

- setup: switch to torch 2.0+ and lightning 2.0+ - setup: switch to torchaudio 2.0+ and soundfile 0.12+ - setup: switch to pyannote.core 5.0+ and pyannote.database 5.0+ - setup: switch to speechbrain 0.5.14+ - BREAKING(task): rename `Segmentation` task to `SpeakerDiarization` - BREAKING(task): remove support for variable chunk duration - BREAKING(pipeline): remove `SpeakerSegmentation` pipeline (in favor of `SpeakerDiarization` pipeline) - BREAKING(pipeline): remove support `FINCHClustering` and `HiddenMarkovModelClustering` - BREAKING: drop support for Python 3.7
pyannote · Apr 17, 2023 · 9faa8fc · 9faa8fc
1 parent 7121a73
commit 9faa8fc
Show file tree

Hide file tree

Showing 20 changed files with 75 additions and 859 deletions.
diff --git a/.github/workflows/test.yml b/.github/workflows/test.yml
@@ -2,9 +2,9 @@ name: Tests
 
 on:
   push:
-    branches: [ develop ]
+    branches: [develop]
   pull_request:
-    branches: [ develop ]
+    branches: [develop]
 
 jobs:
   build:
@@ -13,28 +13,28 @@ jobs:
     strategy:
       matrix:
         os: [ubuntu-latest]
-        python-version: [3.7, 3.8, 3.9]
+        python-version: [3.8, 3.9, "3.10"]
     steps:
-    - uses: actions/checkout@v2
-    - name: Set up Python ${{ matrix.python-version }}
-      uses: actions/setup-python@v2
-      with:
-        python-version: ${{ matrix.python-version }}
-    - name: Install libsndfile
-      if: matrix.os == 'ubuntu-latest'
-      run: |
-        sudo apt-get install libsndfile1
-    - name: Install pyannote.audio
-      run: |
+      - uses: actions/checkout@v2
+      - name: Set up Python ${{ matrix.python-version }}
+        uses: actions/setup-python@v2
+        with:
+          python-version: ${{ matrix.python-version }}
+      - name: Install libsndfile
+        if: matrix.os == 'ubuntu-latest'
+        run: |
+          sudo apt-get install libsndfile1
+      - name: Install pyannote.audio
+        run: |
           pip install -e .[dev,testing]
-    - name: Test with pytest
-      run: |
+      - name: Test with pytest
+        run: |
           export PYANNOTE_DATABASE_CONFIG=$GITHUB_WORKSPACE/tests/data/database.yml
           pytest --cov-report=xml
-    - name: Upload coverage to Codecov
-      uses: codecov/codecov-action@v1
-      with:
-        file: ./coverage.xml
-        env_vars: PYTHON
-        name: codecov-pyannote-audio
-        fail_ci_if_error: false
+      - name: Upload coverage to Codecov
+        uses: codecov/codecov-action@v1
+        with:
+          file: ./coverage.xml
+          env_vars: PYTHON
+          name: codecov-pyannote-audio
+          fail_ci_if_error: false
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -2,12 +2,18 @@
 
 ## Version 3.0 (xxxx-xx-xx)
 
-  - setup: switch to pyannote.database 5.0
   - feat(task): add support for label scope in speaker diarization task (from pyannote.database 5.0)
   - feat(task): add support for missing classes in multi-label segmentation task (from pyannote.database 5.0)
   - improve(task): load metadata as tensors rather than pyannote.core instances
+  - setup: switch to torch 2.0+ and lightning 2.0+
+  - setup: switch to torchaudio 2.0+ and soundfile 0.12+
+  - setup: switch to pyannote.core 5.0+ and pyannote.database 5.0+
+  - setup: switch to speechbrain 0.5.14+
   - BREAKING(task): rename `Segmentation` task to `SpeakerDiarization`
   - BREAKING(task): remove support for variable chunk duration
+  - BREAKING(pipeline): remove `SpeakerSegmentation` pipeline (in favor of `SpeakerDiarization` pipeline)
+  - BREAKING(pipeline): remove support `FINCHClustering` and `HiddenMarkovModelClustering`
+  - BREAKING: drop support for Python 3.7
 
 ## Version 2.1.1 (2022-10-27)
 

diff --git a/README.md b/README.md
@@ -30,31 +30,21 @@ for turn, _, speaker in diarization.itertracks(yield_label=True):
 # ...
 ```
 
-## What's new in `pyannote.audio` 2.x?
+## Highlights
 
-For version 2.x of `pyannote.audio`, [I](https://herve.niderb.fr) decided to rewrite almost everything from scratch.
-Highlights of this release are:
-
-- :exploding_head: much better performance (see [Benchmark](#benchmark))
-- :snake: Python-first API
 - :hugs: pretrained [pipelines](https://hf.co/models?other=pyannote-audio-pipeline) (and [models](https://hf.co/models?other=pyannote-audio-model)) on [:hugs: model hub](https://huggingface.co/pyannote)
+- :exploding_head: state-of-the-art performance (see [Benchmark](#benchmark))
+- :snake: Python-first API
 - :zap: multi-GPU training with [pytorch-lightning](https://pytorchlightning.ai/)
 - :control_knobs: data augmentation with [torch-audiomentations](https://github.com/asteroid-team/torch-audiomentations)
-- :boom: [Prodigy](https://prodi.gy/) recipes for model-assisted audio annotation
 
 ## Installation
 
-Only Python 3.8+ is officially supported (though it might work with Python 3.7)
+Only Python 3.8+ is supported.
 
 ```bash
-conda create -n pyannote python=3.8
-conda activate pyannote
-
-# pytorch 1.11 is required for speechbrain compatibility
-# (see https://pytorch.org/get-started/previous-versions/#v1110)
-conda install pytorch==1.11.0 torchvision==0.12.0 torchaudio==0.11.0 -c pytorch
-
-pip install pyannote.audio
+# install from develop branch
+pip install -qq https://github.com/pyannote/pyannote-audio/archive/refs/heads/develop.zip
 ```
 
 ## Documentation

diff --git a/doc/source/index.rst b/doc/source/index.rst
@@ -9,6 +9,9 @@ Installation
 
 ::
 
+  $ conda create -n pyannote python=3.10
+  $ conda activate pyannote
+  $ conda install pytorch torchvision torchaudio -c pytorch
   $ pip install pyannote.audio
 
 

diff --git a/pyannote/audio/cli/train.py b/pyannote/audio/cli/train.py
@@ -26,7 +26,7 @@
 
 import hydra
 from hydra.utils import instantiate
-from lightning_lite.utilities.seed import seed_everything
+from lightning.pytorch import seed_everything
 from omegaconf import DictConfig, OmegaConf
 
 # from pyannote.audio.core.callback import GraduallyUnfreeze

diff --git a/pyannote/audio/cli/train_config/trainer/default.yaml b/pyannote/audio/cli/train_config/trainer/default.yaml
@@ -2,9 +2,7 @@
 _target_: pytorch_lightning.Trainer
 accelerator: auto
 accumulate_grad_batches: 1
-auto_scale_batch_size: False
-auto_lr_find: False
-benchmark: False
+benchmark: null # TODO: automatically set to True when using fixed duration chunks
 deterministic: False
 check_val_every_n_epoch: 1
 devices: auto
@@ -13,7 +11,7 @@ enable_checkpointing: True
 enable_model_summary: True
 enable_progress_bar: True
 fast_dev_run: False
-gradient_clip_val: 0
+gradient_clip_val: null
 gradient_clip_algorithm: norm
 limit_predict_batches: 1.0
 limit_test_batches: 1.0
@@ -25,16 +23,13 @@ max_steps: -1
 max_time: null
 min_epochs: 1
 min_steps: null
-move_metrics_to_cpu: False
-multiple_trainloader_mode: max_size_cycle
 num_nodes: 1
 num_sanity_val_steps: 2
 overfit_batches: 0.0
 precision: 32
 profiler: null
 reload_dataloaders_every_n_epochs: 0
-replace_sampler_ddp: True
+use_distributed_sampler: True # TODO: check what this does exactly
 strategy: null
 sync_batchnorm: False
-track_grad_norm: -1
 val_check_interval: 1.0
diff --git a/pyannote/audio/core/model.py b/pyannote/audio/core/model.py
@@ -35,8 +35,8 @@
 import torch.optim
 from huggingface_hub import hf_hub_download
 from huggingface_hub.utils import RepositoryNotFoundError
+from lightning_fabric.utilities.cloud_io import _load as pl_load
 from pyannote.core import SlidingWindow
-from lightning_lite.utilities.cloud_io import _load as pl_load
 from pytorch_lightning.utilities.model_summary import ModelSummary
 from semver import VersionInfo
 from torch.utils.data import DataLoader
@@ -523,9 +523,6 @@ def val_dataloader(self) -> DataLoader:
     def validation_step(self, batch, batch_idx):
         return self.task.validation_step(batch, batch_idx)
 
-    def validation_epoch_end(self, outputs):
-        return self.task.validation_epoch_end(outputs)
-
     def configure_optimizers(self):
         return torch.optim.Adam(self.parameters(), lr=1e-3)
 

diff --git a/pyannote/audio/core/task.py b/pyannote/audio/core/task.py
@@ -23,31 +23,23 @@
 
 from __future__ import annotations
 
-from functools import partial
-
-import scipy.special
-
-try:
-    from functools import cached_property
-except ImportError:
-    from backports.cached_property import cached_property
-
 import multiprocessing
 import sys
 import warnings
 from dataclasses import dataclass
 from enum import Enum
+from functools import cached_property, partial
 from numbers import Number
-from typing import Dict, List, Optional, Sequence, Text, Tuple, Union
+from typing import Dict, List, Literal, Optional, Sequence, Text, Tuple, Union
 
 import pytorch_lightning as pl
+import scipy.special
 import torch
 from pyannote.database import Protocol
 from torch.utils.data import DataLoader, Dataset, IterableDataset
 from torch_audiomentations import Identity
 from torch_audiomentations.core.transforms_interface import BaseWaveformTransform
 from torchmetrics import Metric, MetricCollection
-from typing_extensions import Literal
 
 from pyannote.audio.utils.loss import binary_cross_entropy, nll_loss
 from pyannote.audio.utils.protocol import check_protocol
@@ -447,9 +439,6 @@ def val_dataloader(self) -> Optional[DataLoader]:
     def validation_step(self, batch, batch_idx: int):
         return self.common_step(batch, batch_idx, "val")
 
-    def validation_epoch_end(self, outputs):
-        pass
-
     def default_metric(self) -> Union[Metric, Sequence[Metric], Dict[str, Metric]]:
         """Default validation metric"""
         msg = f"Missing '{self.__class__.__name__}.default_metric' method."

diff --git a/pyannote/audio/pipelines/__init__.py b/pyannote/audio/pipelines/__init__.py
@@ -24,13 +24,11 @@
 from .overlapped_speech_detection import OverlappedSpeechDetection
 from .resegmentation import Resegmentation
 from .speaker_diarization import SpeakerDiarization
-from .speaker_segmentation import SpeakerSegmentation
 from .voice_activity_detection import VoiceActivityDetection
 
 __all__ = [
     "VoiceActivityDetection",
     "OverlappedSpeechDetection",
-    "SpeakerSegmentation",
     "SpeakerDiarization",
     "Resegmentation",
     "MultiLabelSegmentation",