The viewer is disabled because this dataset repo requires arbitrary Python code execution. Please consider
removing the
loading script
and relying on
automated data support
(you can use
convert_to_parquet
from the datasets
library). If this is not possible, please
open a discussion
for direct help.
Dataset Card for NSynth
The NSynth dataset is an audio dataset containing over 300,000 musical notes across over 1000 commercially-sampled instruments, distinguished by pitch, timbre, and envelope. Each recording was made by playing and holding a musical note for three seconds and letting it decay for one second. The collection of four-second recordings ranges over every pitch on a standard MIDI piano (or as many as possible for the given instrument), played at five different velocities. This dataset was created as an attempt to establish a high-quality entry point into audio machine learning, in response to the surge of breakthroughs in generative modeling of images due to the abundance of approachable image datasets (MNIST, CIFAR, ImageNet). NSynth is meant to be both a benchmark for audio ML and a foundation to be expanded on with future datasets.
Dataset Description
Since some instruments are not capable of producing all 88 pitches in the MIDI piano's range, there is an average of 65.4 pitches per instrument. Furthermore, the commercial sample packs occasionally contain duplicate sounds across multiple velocities, leaving an average of 4.75 unique velocities per pitch.
Each of the notes is annotated with three additional pieces of information based on a combination of human evaluation and heuristic algorithms:
Source: The method of sound production for the note’s instrument. This can be one of
acoustic
orelectronic
for instruments that were recorded from acoustic or electronic instruments, respectively, orsynthetic
for synthesized instruments.Index ID 0 acoustic 1 electronic 2 synthetic Family: The high-level family of which the note’s instrument is a member. Each instrument is a member of exactly one family. See the complete list of families and their frequencies by source below.
Index ID 0 bass 1 brass 2 flute 3 guitar 4 keyboard 5 mallet 6 organ 7 reed 8 string 9 synth_lead 10 vocal
Family | Acoustic | Electronic | Synthetic | Total |
---|---|---|---|---|
Bass | 200 | 8387 | 60368 | 68955 |
Brass | 13760 | 70 | 0 | 13830 |
Flute | 6572 | 35 | 2816 | 9423 |
Guitar | 13343 | 16805 | 5275 | 35423 |
Keyboard | 8508 | 42645 | 3838 | 54991 |
Mallet | 27722 | 5581 | 1763 | 35066 |
Organ | 176 | 36401 | 0 | 36577 |
Reed | 14262 | 76 | 528 | 14866 |
String | 20510 | 84 | 0 | 20594 |
Synth Lead | 0 | 0 | 5501 | 5501 |
Vocal | 3925 | 140 | 6688 | 10753 |
Total | 108978 | 110224 | 86777 | 305979 |
- Qualities: Sonic qualities of the note. See below for descriptions of the qualities, and here for information on co-occurences between qualities.
Index | ID | Description |
---|---|---|
0 | bright |
A large amount of high frequency content and strong upper harmonics. |
1 | dark |
A distinct lack of high frequency content, giving a muted and bassy sound. Also sometimes described as ‘Warm’. |
2 | distortion |
Waveshaping that produces a distinctive crunchy sound and presence of many harmonics. Sometimes paired with non-harmonic noise. |
3 | fast_decay |
Amplitude envelope of all harmonics decays substantially before the ‘note-off’ point at 3 seconds. |
4 | long_release |
Amplitude envelope decays slowly after the ‘note-off’ point, sometimes still present at the end of the sample 4 seconds. |
5 | multiphonic |
Presence of overtone frequencies related to more than one fundamental frequency. |
6 | nonlinear_env |
Modulation of the sound with a distinct envelope behavior different than the monotonic decrease of the note. Can also include filter envelopes as well as dynamic envelopes. |
7 | percussive |
A loud non-harmonic sound at note onset. |
8 | reverb |
Room acoustics that were not able to be removed from the original sample. |
9 | tempo-synced |
Rhythmic modulation of the sound to a fixed tempo. |
Dataset Sources
- Homepage: https://magenta.tensorflow.org/datasets/nsynth
- Paper: https://arxiv.org/abs/1704.01279
Uses
This dataset has seen much use in models for generating audio, and some of these models have even been used by high-profile artists. Another obvious application of the dataset could be for classification (identifying instruments or perhaps even qualities of music, which could be useful in things like music recommendation). See here one such example (which is a work in progress).
Dataset Structure
The dataset has three splits:
- Train: A training set with 289,205 examples. Instruments do not overlap with valid or test.
- Valid: A validation set with 12,678 examples. Instruments do not overlap with train.
- Test: A test set with 4,096 examples. Instruments do not overlap with train.
See below for descriptions of the features.
Feature | Type | Description |
---|---|---|
note | int64 |
A unique integer identifier for the note. |
note_str | str |
A unique string identifier for the note in the format <instrument_str>-<pitch>-<velocity> . |
instrument | int64 |
A unique, sequential identifier for the instrument the note was synthesized from. |
instrument_str | str |
A unique string identifier for the instrument this note was synthesized from in the format <instrument_family_str>-<instrument_production_str>-<instrument_name> . |
pitch | int64 |
The 0-based MIDI pitch in the range [0, 127]. |
velocity | int64 |
The 0-based MIDI velocity in the range [0, 127]. |
sample_rate | int64 |
The samples per second for the audio feature. |
qualities | [int64] |
A binary vector representing which sonic qualities are present in this note. |
qualities_str | [str] |
A list IDs of which qualities are present in this note selected from the sonic qualities list. |
instrument_family | int64 |
The index of the instrument family this instrument is a member of. |
instrument_family_str | str |
The ID of the instrument family this instrument is a member of. |
instrument_source | int64 |
The index of the sonic source for this instrument. |
instrument_source_str | str |
The ID of the sonic source for this instrument. |
audio | {'path': str, 'array': [float], 'sampling_rate': int64} |
A dictionary containing a path to the corresponding audio file, a list of audio samples represented as floating point values in the range [-1,1], and the sampling rate. |
An example instance generated with the loading script (note that this differs from the example instance on the homepage, as the script integrates the audio into the respective JSON files):
{'note': 84147,
'note_str': 'bass_synthetic_033-035-050',
'instrument': 417,
'instrument_str': 'bass_synthetic_033',
'pitch': 35,
'velocity': 50,
'sample_rate': 16000,
'qualities': [0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
'qualities_str': ['dark'],
'instrument_family': 0,
'instrument_family_str': 'bass',
'instrument_source': 2,
'instrument_source_str': 'synthetic',
'audio': {'path': '/root/.cache/huggingface/datasets/downloads/extracted/335ef507846fb65b0b87154c22cefd1fe87ea83e8253ef1f72648a3fdfac9a5f/nsynth-test/audio/bass_synthetic_033-035-050.wav',
'array': array([0., 0., 0., ..., 0., 0., 0.]),
'sampling_rate': 16000}
}
Potential Shortcomings
There are quite a few family-source pairings with little or no representation. While this is understandable in some cases - no acoustic Synth Lead, for instance - it may be problematic in others (no synthetic brass, strings, nor organ, < 100 electronic brass, flute, reed, and string samples). This can be particularly troublesome in classification problems, as there may not be sufficient data for a model to correctly distinguish between sources for a particular family of instruments. In music generation, on the other hand, these disparities may yield a bias toward the use of one source over others for a given family.
Citation
Jesse Engel, Cinjon Resnick, Adam Roberts, Sander Dieleman, Douglas Eck,
Karen Simonyan, and Mohammad Norouzi. "Neural Audio Synthesis of Musical Notes
with WaveNet Autoencoders." 2017.
BibTeX:
@misc{nsynth2017,
Author = {Jesse Engel and Cinjon Resnick and Adam Roberts and
Sander Dieleman and Douglas Eck and Karen Simonyan and
Mohammad Norouzi},
Title = {Neural Audio Synthesis of Musical Notes with WaveNet Autoencoders},
Year = {2017},
Eprint = {arXiv:1704.01279},
}
Dataset Card Authors
John Gillen
- Downloads last month
- 189