spacer
Home News Links People Catalog
spacer
activepages
spacer

Based on a question asked in our class Forum, we reviewed an interesting XSL transformation case study together in class.
I hope you remember the process by which we solved it (Maria framing the problem well, Andrea and Todd finding useful
information via Google to help me to do it in front of the class in real time, Jay suggesting an idea based on his project work,
Rachel interjecting technical information from reading she had been doing). Whoever says too many cooks spoil the broth
weren't referring to our class process.

I am a little late in providing you the documentation you requested, but here it is (you had to work on your projects anyway)...

XSLT process for XML to text document

We want to convert an XML document into a text document as required by the SRT standard for subtitles encoding on DVDs.
The SRT language is documented well via example at http://www.matroska.org/technical/specs/subtitles/srt.html.
We start with an XML document we have created that validates to a subtitle markup language we can imagine:

<?xml version="1.0"?>
<subtitles>
    <subtitle>
        <index>1</index>
        <start_time>
            <hour>00</hour>
            <minute>01</minute>
            <second>20</second>
            <millisecond>476</millisecond>
        </start_time>
        <end_time>
            <hour>00</hour>
            <minute>01</minute>
            <second>22</second>
            <millisecond>504</millisecond>
        </end_time>
        <text>I don't like you.
        Mr. Lieutenant.</text>
    </subtitle>
    <subtitle>
        <index>2</index>
        <start_time>
            <hour>00</hour>
            <minute>02</minute>
            <second>20</second>
            <millisecond>476</millisecond>
        </start_time>
        <end_time>
            <hour>00</hour>
            <minute>02</minute>
            <second>22</second>
            <millisecond>501</millisecond>
        </end_time>
        <text>Very good, Lieutenant.</text>
    </subtitle>
</subtitles>

Then, we write an XSL Transform to be able to convert our XML document to a valid SRT document (which also
validates as a valid XML document following the 1999 XSL Transform specification):

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="text" omit-xml-declaration="yes" />
    <xsl:template match="/">
        <xsl:for-each select="subtitles/subtitle">
            <xsl:value-of select="index"/><xsl:text>&#10;</xsl:text>
            	<xsl:value-of select="start_time/hour"/>:<xsl:value-of select="start_time/minute"/>:<xsl:value-of select="start_time/second"/>,<xsl:value-of select="start_time/millisecond"/> --> <xsl:value-of select="end_time/hour"/>:<xsl:value-of select="end_time/minute"/>:<xsl:value-of select="end_time/second"/>,<xsl:value-of select="end_time/millisecond"/><xsl:text>&#10;</xsl:text>
            <xsl:value-of select="text"/><xsl:text>&#10;</xsl:text><xsl:text>&#10;</xsl:text>
        </xsl:for-each>
    </xsl:template>
</xsl:stylesheet>

We can run the XML file through the XSL Transform by using the SAXON XSLT services, just as we practiced in
weeks 6-8 of our class. We get the following output which appears to me valid to the SRT specification. But, of course,
without SRT being an XML-based language, we would have to write our own validator without the benefit of basic
XML validation services.

Things of interest from the XSL Translator above:

  • There is a specific xsl:output element we can use to let the translator know that we intend to create text output instead
    of XML output. We use a method="text" attribute to explicitly declare our intention to create a text document (as the
    SRT specification requires).
  • We use a xsl:text element to put our line feeds into the SRT document where they are required. The &#10; character
    entity refers to a line feed character and so we need to place that in multiple places to get the output required by the SRT
    specification.
The output:

1
00:01:20,476 --> 00:01:22,504
I don't like you.
    Mr. Lieutenant.

2
00:02:20,476 --> 00:02:22,501
Very good, Lieutenant.
Welcome to Class

File Size: 37 kb
Posted: Sun, May 30, 2009

Class Project Discussion

File Size: 24 kb
Posted: Fri, Jun 26, 2009