Python Copilot Audio Training using Class with Knowledge Graphs
This dataset is a subset of the matlok python copilot datasets. Please refer to the Multimodal Python Copilot Training Overview for more details on how to use this dataset.
Details
Each class method has a question and answer mp3 where one voice reads the question and another voice reads the answer. Both mp3s are stored in the parquet dbytes column and the associated source code file_path identifier.
- Rows: 135496
- Size: 284.6 GB
- Data type: mp3
- Format: narrated alpaca question and answer pairs using two voices
Schema
{
"audio_path": "string",
"audio_type": "string",
"dbytes": "string",
"dbytes_len": "int64",
"file_path": "string",
"file_path_len": "int64",
"lang": "string",
"lang_len": "int64",
"recsize": "int64"
}
How to use the dataset
from datasets import load_dataset
ds = load_dataset("matlok/python-audio-copilot-training-using-class-knowledge-graphs-2024-01-27", data_dir="files")