How to Create Audio from a .CSV File

Create audio content at scale using AI

How do we do bulk operations with the API?

One of the powers of an API is the ability to programmatically execute at scale.

So one of the inevitable questions that comes up is - "I can do a small example" - "how do I do a bigger example"

In this how-to guide we'll discuss

  • Loading a CSV into python
  • Iterating over the elements of that CSV
  • Generating fully mastered audio from the content in that CSV

We use a CSV locally in this example, however you could also use Airtable , Google Sheets or a similar file format via their API with not much more work.

You can look at the full example here

Turning a CSV into audio files

So let's take a task that you might have, which is turning a CSV into lots of Mastered audio files. For the purposes of this we'll use a freely available CSV from Wikipedia.

For this example we'll use the top songs streamed in the UK we've lightly edited it to make the file a bit easier to parse.

πŸ“˜

Learn more about mastering

You can learn more about Mastering here, at Smart Mixing and Mastering

Save the file locally

You'll need to save the csv file locally if you're following along at home, alternatively you can use a different CSV set up or another file.

Set up your Python file

Firstly we import audiostack and other libraries


import audiostack
import csv 
import os 
from os import linesep

Then we'll want to read a csv file in.

This function is pretty simple, you open a CSV file and read it's contents.

def read_large_csv(file_path):
    """
    Reads a large CSV file and returns a nested list containing the rows and columns.

    Args:
        file_path (str): The path to the CSV file.

    Returns:
        list: A nested list containing the rows and columns of the CSV file.
    """
    data = []

    # Open the CSV file and read its contents
    with open(file_path, "r", encoding="utf-8") as file:
        reader = csv.reader(file)

        # Iterate over each row in the CSV file
        for row in reader:
            data.append(row)

    return data

πŸ“˜

Alternatives

You may want to use something like a generator if you have a large CSV file.
We suggest if you want to read more about CSV parsing (a deep topic) you can read about pandas alternatively you'd also replace this with pd.read_csv

Creating the scripts

Firstly we'll iterate through the content, which will be a list of strings.
Then we'll apply some formatting to the strings - I had to do this to get this hacky solutions together.

πŸ“˜

Learn the AudioStack concepts

You can read more about the How does AudioStack work? and A deeper dive into the AudioStack Architecture

First recipe we follow is to

  1. Load in the CSV
  2. We apply some formatting to the strings and iterate across the content
  3. Then we turn this into Content

Let's double click a bit on that and explain a bit more with code

  1. Load in the CSV

content = read_large_csv("sample_2.csv")
  1. We iterate across the content
    And then we iterate across the content and we apply some formatting to the strings

for i in range(len(content)):

    my_str = " ".join(map(str, content[i]))

    script_content = f"""
            <as:section name="main" soundsegment="main">
            {my_str.replace(os.linesep, " ")}
            </as:section>
            """
    script_content = script_content.replace("\n", "").replace(
        "                ", ""
    )
    print(script_content)
  1. Turn these strings into Scripts in the AudioStack API
script = audiostack.Content.Script.create(
    scriptText=script_content,
    scriptName=i,
    moduleName="dynamicVoiceOver",
    projectName="dynamicVoiceOver",
)

Create the text to speech

It's very simple - you use the scriptItem method and voice within the Speech.TTS.create

speech = audiostack.Speech.TTS.create(scriptItem=script, voice=VOICE)

You could look at the other parameters as well and apply them, you can learn more here Create a text-to-speech resource.

Apply the production step

mix = audiostack.Production.Mix.create(
speechItem=speech, soundTemplate="sound_affects"
)

We create a mix and apply the soundTemplate sound_affects - you can look here for other examples library.audiostack.ai or even upload your own File

Deliver this file

When you've created your file you'll want to deliver it somewhere, we'll be saving this locally as a high quality mp3.


delivery = audiostack.Delivery.Encoder.encode_mix(
    productionItem=mix,
    preset="mp3_high",
    public=True,
)
print("MP3 file URL:", delivery.url)

Then the script will iterate over the rest of your content and you'll have one mastered audio file per row in your CSV file.

We could also send this to another system using our API, consider that an exercise for the reader.

Future work

There's a bunch of ways this could be improved like using different voices, different sound templates - even experimenting with something more advanced like Reduce length of speech to fit in a target using silence removal and time stretching with pitch preservation.