GUI text-based speech and music editor for creating radio/audio stories



Enabling the music browser

Issues and bugs

Basic interface usage

Step-by-step editing tutorial

Adding speech tracks tutorial

Podcast producers


View the Project on GitHub ucbvislab/speecheditor

Adding speech tracks

There is a bit more setup you need to do to add your own speech tracks to the speech editor.


You need to get HTK 3.4. First, register here: http://htk.eng.cam.ac.uk/register.shtml

Once you have a username and password, run this in the vagrant box:

$ sh alignment-setup.sh

This will prompt you for your HTK username and password. It will then download and install HTK 3.4 and p2fa-vislab (a wrapper for HTK's HVite).

Acquiring tracks

If you don't have any speech tracks of your own, you can find some free readings of classic works of literature on librivox, or you could download famous speeches. The nice thing about speeches and classic literature is that you can usually find an accompanying text transcript online too.

The transcript file should be a plain text file that has a verbatim transcript of the speech. If you want, you can indicate the speaker at the beginning of every line with the name and a colon. You can generate the transcript by hand or using an online service like CastingWords or rev for a minimum of $1 per minute of audio; be sure to request verbatim transcripts.

The speecheditor comes with a sample track, static/speechtracks/short-test-track.mp3. Here's what the accompanying transcript (static/speechtracks/short-test-track.txt) looks like:

Steve: This is a test track that comes with the speech editor.

Amy: You can have more than one person talking in the audio track.

Steve: I haven't aligned it yet so you can get practice using the alignment tool. Once you actually run that alignment you'll be able to view it in the speech editor web interface.

Analyzing your tracks

Once you have successfully run alignment-setup.sh, you can analyze your own speech tracks.

Add your new speech track mp3 file at /speecheditor/static/speechtracks/{track-name}.mp3. Also add the text transcript of the speech track at /speecheditor/static/speechtracks/{track-name}.txt.

Then, in the vagrant box, run

$ cd vagrant
$ python analyze_speech.py {track-name}

Once this finished (note: it may take a while if the speech is long), your track will show up in the new composition dialog in the speech editor.

So, to align the included test track, run

$ cd vagrant
$ python analyze_speech.py short-test-track

This creates a few files: static/speechtracks/{track-name}.transcript is the text in transcript format. static/speechtracks/{track-name}.json, and static/speechtracks/{track-name}-breaths.json are jsons in the alignment format that contain the alignment between the text and the speech. The -breaths version also has alignments for detected breaths.

You can now load the speecheditor in the browser and select {track-name} from the list of tracks.