Subtitling with GPAC

Warning

GPAC's wiki has moved to wiki.gpac.io.

This github wiki will no longer be updated.

You can contribute to GPAC's documentation here.

There are plenty of subtitle formats and plenty of types and categories of subtitles. GPAC's support for subtitles is based on the support for the ISO Base Media File Format (ISOBMFF).

The ISOBMFF considers that any data that produces human readable text to be used as subtitles, closed captions, is, well, ...subtitles. It further considers that there are two major classes of subtitle formats: formats which require only text processing capabilities (text decoding, text layout) and formats which also require image processing capabilities. These classes are identified by the Track Handler Type.

The handler type is a code given with 4 ASCII characters. Formats which require only text processing are stored in tracks identified by the handler text. Formats which may require also image processing are identified by the handler subt.

GPAC supports both classes of tracks. The choice of which track handler type to use is not left to the content creator or to the packager. It is decided by the specification defining the carriage of that subtitle format in ISO tracks. Since tracks of a given handler type may be used to store different possible formats, there is a need to identify that format when processing the file at a high level (i.e. without decoding the subtitle frames or before the file is transmitted). This is done by the so-called Sample Entry Code.

This sample entry code is also a 4 ASCII character code. So identifying a track type requires at least the couple ('handler type', 'sample entry code'). In this post and the followings, I'll use the syntax <handler-type>:<sample-entry-code> to identify a track type. Some sample entry codes are very specific to a particular format. Some other are generic formats. In fact, any one can define and register its sample entry code for its specific format. A registry of those identifiers is maintained by the MPEG Registration Authority. Here is a list of subtitle formats and their associated identifiers from the MP4RA site:

Identifier	Description
text:tx3g	Tracks containing samples whose payload is binary data according to the 3GPP Timed Text format defined by 3GPP/MPEG.
sbtl:tx3g	Apple specific identifiers for so-called "Subtitle media". The payload is the same as text:tx3g.
text:text	Apple specific identifiers for so-called "Text media". The payload is similar to text:tx3g and sbtl:tx3g with some differences, and this is not officially registered on MP4RA.
clcp:c608 and clcp:c708	Apple specific identifiers for so-called "Closed Captioning media". Not supported by GPAC (import/export and playback, DASHing may work).This is not officially registered on MP4RA.
text:wvtt	Tracks containing samples whose payload is binary data defined by MPEG that encapsulates W3C WebVTT subtitles.
subt:stpp	Tracks containing samples whose payload are XML documents. This format is defined by MPEG. All samples carry one entire XML document and use the same XML language. Further information stored in the Sample Entry box (such as namespace) and possibly in the XML samples is required to precisely identify the XML languages of those subtitles. This is currently used to carry TTML, SMPTE-TT or EBU-TT (more on GPAC support for EBU-TTD) but may be used by any other XML format. A particular version of this format is adopted by DECE.
text:stxt	Tracks containing samples whose payload is raw text. This format is defined by MPEG. Additional sample entry information (namely mime type) is required to identify the type of text data. This is only used experimentally for the moment (in particular in GPAC).
subt:sbtt	Similar to text:stxt, but for "subtitles". It is defined also by MPEG but not yet used.

MP4Box supports all these types as described in the following figure:

MP4Box subtitle import/export capabilities

The associated command lines using MP4Box are as follows:

Importing GPAC Timed Text XML as a 3GPP Timed Text track (text:tx3g):
```
MP4Box -add file.ttxt output.mp4
```
Exporting a 3GPP Timed Text track as GPAC Timed Text XML (assuming 1 is the trackId of the track):
```
MP4Box -ttxt 1 output.mp4
```
Importing SRT subtitles as a 3GPP Timed Text track:
```
MP4Box -add file.srt output.mp4
```
Exporting a 3GPP Timed Text track as an SRT file:
```
MP4Box -srt 1 output.mp4
```
Converting GPAC Timed Text XML to SVG:
```
MP4Box -svg file.ttxt
```
Converting SRT to SVG:
```
MP4Box -svg file.srt
```
Exporting a 3GPP Timed Text Track as SVG (not yet possible).
Importing WebVTT content as a WVTT track:
```
MP4Box -add file.vtt output.mp4
```
Exporting a WebVTT file from a WVTT track:
```
MP4Box -raw 1 output.mp4
```
Importing a TTML file as a STPP Track:
```
MP4Box -add file.ttml output.mp4
```
Exporting an STPP Track as a TTML document.
/!\ Not available yet as a track reconstruction but you can extract the individual samples (will generate one TTML output per MP4 sample):

```
MP4Box -raws 1 output.mp4
```

Importing an SRT file as WVTT track:

```
MP4Box -add file.srt:fmt=VTT output.mp4
```

Note that it is also possible using a combination of those steps to convert TTXT or SRT to WebVTT.

As a consequence of the packaging, it is possible to create DASH content using those formats.

Given that importers and exporters of MP4Box are simply wrappers on a filter session, the following syntaxes are also possible:

gpac -i subtitle.srt -o sub.mp4
gpac -i subtitle.srt -o sub.vtt
gpac -i subtitle.vtt -o sub.srt
gpac -i subtitle.mp4 -o sub.vtt
...

Apple and QuickTime specifics

You need to change the QT handler name to stbl, using:

MP4Box -i SRC:hdlr=sbtl ...

The alternate group shall be set if multiple subs are present e.g.:

gpac -i video.264 \
   -i audio.aac:#udta_tagc="public.accessibility.describes-video":#udta_name="English (describes video)":#Language=en \
   -i sub-fr.srt:#Language=fr:#AltGroup=2:#StreamSubtype=sbtl:#Disable \
   -i sub-en.srt:#Language=en:#AltGroup=2:#StreamSubtype=sbtl \
  -o result.mp4

Rendering support

GPAC support rendering of text:tx3g, text:wvtt and subt:stpp. The former is natively supported while the latter are supported through JavaScript by converting WebVTT or TTML samples to SVG.

Rendering is automatically activated when using the compositor filter (e.g. gpac -gui or gpac -mp4c). It is also possible to render subtitles to image or video sequences:

gpac -i source.vtt -o dump_$num$.png
gpac -i source.vtt -o dump.264

HOME » HOWTOs » Subtitling

Introduction
EBU-TTD

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Subtitling with GPAC

Apple and QuickTime specifics

Rendering support

Clone this wiki locally