Maintained by: David J. Birnbaum (djbpitt@gmail.com) Last modified: 2013-08-24T10:51:08+0000
XSLT is a programming language designed to transform XML documents into other formats. This workshop will provide introductory information that will enable participants to begin using XSLT to manage their own XML documents. It also provides, for those without prior XSLT experience, the background needed to understand the 10:00 a.m. workshop on Visualizing structural similarity with plectograms and XSLT.
This workshop covers only those basic features of XSLT that are needed to understand and
write the transformation script used in the next workshop, Visualizing structural
similarity with plectograms and XSLT.
It is not a general introduction to XSLT,
and it emphasizes features of the language that are important for plectogram generation
but that otherwise may play a relatively small part in most XSLT programming tasks. In
particular, plectograms are produced using a pull (procedural) architecture
that depends on <xsl:for-each>
, which is appropriate for generating
plectograms, but tends to be the wrong strategy for most XSLT processing, which is
better suited to a push (declarative) model.
The topics described here are:
<xsl:for-each>
<xsl:if>
count()
functionAn XSLT stylesheet is a program that converts an XML document into something else. In these workshops we’ll use XSLT to convert a list of values in an XML document into an SVG plectogram. The input document is gregory.xml and the desired output is gregory.svg. To view the SVG properly, you need a completely up-to-date version of Chrome, Firefox, Internet Explorer (Windows only), Opera, or Safari. Internet Explorer and Safari render the SVG correctly but do not support animation, which means that the text colors won’t change when you mouse over different parts of the image, as they will in the other browsers. Firefox and Opera have rendering bugs that sometimes cause the coloring to misfire. Chrome should display all features of SVG image reliably.
An XSLT stylesheet has the following superstructure:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"> </xsl:stylesheet>
The code you write and insert between these tags can be used to transform your input XML document (in this case, into SVG).
An XML comment starts with <!--
and ends with -->
. It can
contain any characters, including markup characters, except a sequence of two hyphens
(--
). You should use comments to document any feature of your XSLT stylesheet
that might not be obvious to someone else. That someone else may be you, too, which is
to say that you may understand your code when you write it, but if you aren’t confident
that you’ll understand it just as well if you go away for six months and then take
another look, you should add comments as a way of helping yourself maintain your code
over time.
XSLT is a set of elements that cause code to be executed. XPath is the language XSLT uses
to identify the specific parts of the input document that are to be processed at a
particular time and in a particular way. In the discussion of XSLT template
rules below, the value of the @match
attribute is an XPath
pattern that tells the system to apply the template rule whenever it
encounters content in the input document that matches the pattern.
XSLT is a transformation language that is written in XML syntax. This means that the
programming instructions take the form of XML elements, all of which begin with
xsl:
, e.g.,:
<xsl:template match="/">
The xsl:
namespace prefix (see below) identifies the element as code to be
executed by the XSLT processor. XML elements not in the XSLT namespace are called
literal result elements, and they are created verbatim in the output. For
example:
<xsl:template match="novel"> <title><xsl:apply-templates/></title> </xsl:template>
matches a <novel>
element in the input document and for each such
element, creates a new <title>
element in the output. Between the
start and end tags of that new <title>
element, it processes (applies
templates to) whatever was contained between the start and end tags of the
<novel>
element in the input XML document. The
<xsl:template>
and <xsl:apply-templates>
elements, which are in the XSLT namespace, are instructions to be executed during the
transformation, and no elements by those names are written into the output. The
<title>
element, which is not in the XSLT namespace, is a literal
result element, and causes an element called <title>
(that is, the
start and end tags) to be written into the output.
As was explained above, in the preceding example, the value of the @match
attribute on the <xsl:template>
element is an XPath path
expression that says that this XSLT template (set of instructions) should be
applied whenever a <novel>
element is encountered in the input XML
document. XSLT thus uses XPath to navigate around the input document.
XPath contains path expressions, such as novel
in the example above, as well
as predicates and functions. A predicate, which is written in square brackets
as part of an XPath expression, filters the results of a path expressions. For example,
a path expression like novel
as the value of the @match
attribute of a template rule would match any <novel>
in the input
document. A numerical predicate like novel[2]
would match only
the second of a sequence of <novel>
elements. A predicate like
novel[@language = 'Russian']
would match only
<novel>
elements that have a @language
attribute,
the value of which is Russian
. It would, therefore, match <novel
language="Russian">
but not <novel
language="Ukrainian">
(different value of the @language
attribute) or <novel>
(no @language
attribute).
Predicates can be combined, nested, and chained.
All XML elements are either in a namespace or in no namespace. In an XSLT stylesheet,
XSLT instructions are in the XSLT namespace, which is expressed by prefixing the element
names with xsl:
. When using XSLT to create SVG, the output SVG elements
must be in the SVG namespace. We do this by creating an <svg>
element
in that namespace, and declaring that the namespace is a default that applies to
everything contained in the generated SVG. This saves us the trouble of having to write
an svg:
prefix on all of our SVG elements. The code looks like:
<xsl:template match="/"> <svg xmlns="http://www.w3.org/2000/svg" width="100%" height="100%"> </svg> <xsl:template/>
This template rule says that when the document node (the very top level of the input
documeent, represented by the slash charcater [/
] in XPath) is encountered,
an <svg>
element should be created in the output stream in the SVG
namespace. The xmlns
declaration in the <svg>
literal
result element says that that element, plus all of the elements inside it (unless
specified otherwise) are in the SVG namespace. This means that any SVG elements then
created (rectangles, lines, text, etc.) will not require an explicit namespace
declaration, and will automatically be in the SVG namespace.
XSLT defines variables using the <xsl:variable>
element with two
attributes: @name
is the name that will be used to refer to the variable
and @select
is the value of the variable. XSLT variables can be of
different types, but the ones we use in these workshops all have single values that are
either numbers or strings. For example, in:
<xsl:variable name="cellWidth" select="60"/>
we declare a variable called cellWidth
with a value of 60
. To refer
to a variable later in the program, precede its name with a dollar sign. For example,
the SVG element:
<rect x="10" y="20" height="10" width="{$cellWidth}">
creates a rectangle at particular X and Y coordinates with a height of 10 units and a
width of the value of cellWidth
, which we’ve defined as 60
. The
curly braces are needed around the variable to tell the system to interpret the
value; without the curly braces, the system would fail to set the width and the
rectangle would not be displayed.
As was described in the preparatory reading, all XML attribute values must be enclosed in quotation marks. If the value of a variable is a string, it requires an additional set of quotation marks, e.g.:
<xsl:variable name="background-color" select="'#f4eee2'"/>
Note the single quotation marks inside the double quotation marks as the value of the
@select
attribute. Without those extra quotation marks, the system
would try to find an element called #f4eee2
, which doesn’t
exist.
We declare all constant numerical and string values as variables at the beginning of our program because it makes the code easier to read and maintain. For example, we need to refer to the cell width in several places in our program, and if we had written the raw numerical value into our code and then decided we wanted to change it, we would need to make the change in several places. If we declare the value once in a variable and use the variable wherever the value is required, we can make just one change (in the variable declaration) and have it take effect everywhere that variable is used in our code.
Variables can be set to the value of expressions, and not just numerical and string constants. For example, our plectogram-generation code includes the following variable declaration:
<xsl:variable name="cellMidHeight" select="$cellHeight div 2"/>
This variable is used to find the middle of the cell height, so that we can draw
connecting lines joining cells at their midpoints. div
is an XPath operator
for division. By making the value of the cellMidHeight
variable dependent
on the value of the cellHeight
variable, we can ensure that when we change
cellHeight
, cellMidHeight
will change automatically along
with it.
(We don’t need curly braces around the expression in this case because it involves an attribute of an XSLT element. We need curly braces only when we are creating non-XSLT elements, such as the SVG rectangle above.)
<xsl:for-each>
elementThe <xsl:for-each>
element uses the @select
attribute to
collect a sequence of nodes and process each one in turn. For example:
<xsl:for-each select="text"> </xsl:for-each>
rounds up all of the nodes of type <text>
that are children of the
current context (see below) and does something to each of them. You specify what should
be done for each of those <text>
elements between the start and end
tags of the <xsl:for-each>
element. We can use this strategy in
plectogram generation to generate a new rectangular cell in the plectogram for each
<text>
element in the input XML document.
<xsl:if>
elementThe <xsl:if>
element runs a test, and the code between the start and
end tags is executed only if the test succeeds. For example:
<xsl:if test="@language = 'Russian'"> </xsl:if>
will execute any code between the start and end tags only if the current element (inside
an <xsl:for-each>
, for example) has a @language
attribute, the value of which is Russian
. We use <xsl:if>
in
plectogram generation to draw a connecting line from a cell on the left side only if
there is a cell on the right side that has the same value.
Both XPath and XSLT keep track of what they consider the current context, or where the program is in the process of navigating a sequence of elements or other nodes. For example, in the path expression:
//branch/text
the leading double slash establishes the initial XPath context as the document node and
tells the system to look at all of its descendents and select those descendents that are
of type <branch>
. The next slash is another step in the path, and it
changes the XPath current context to the results of the previous step. That is, the new
XPath current context is the sequence of <branch>
elements returned
by the first step. The XPath processor then sets the current context to each of those
<branch>
elements in turn and collects any child elements it
might have of type <text>
. In other words, each step in the path
expression determines the sequence of nodes to be used as the current context at the
next step on the path.
XPath shorthand for the current context is a dot (.
). For example:
//branch/text[. = "1"]
tells the system to find all of the <branch>
elements, then to find
all of the <text>
elements that are children of those
<branch>
elements, and then to filter those
<text>
elements by keeping only the ones that have a value equal
to the string 1
. The dot inside the predicate means take the current context,
which is the
<text>
element of interest at the moment, and check
whether its value is equal to 1
.
The XSLT current context is represented by the current()
function (nothing
can go inside the parentheses, but they nonetheless have to be there). Consider the
following code:
<xsl:for-each select="//branch[@version = 'N']/text"> <xsl:if test="//branch[@version='M']/text[. = current()]"> <line x1="{$xOffset + $cellWidth}" y1="{position() * 20 + 10}" x2="{$xOffset + $interMsSpace}" y2="{//branch[@version='M']//text[. = current()]/(count(preceding-sibling::text) + 1) * 20 + 10}" stroke="{$ink}" stroke-width="2"/> </xsl:if> </xsl:for-each>
This code draws connecting lines between cells in the left and right columns (versions N
and M, respectively) when those cells have the same value. In other words, if the fifth
<text>
element in version N and the twelfth
<text>
element in version M have the same value, the code draws a
line between them.
For the moment ignore the instructions that actually draw the connecting line, so that we
can concentrate on how the <xsl:if>
element works. The first line of
the code snippet, the <xsl:for-each>
element, finds all of the
<branch>
elements in the input document (there are two, one for
branch N and one for branch M). It then filters them to retain only the one for branch
N, which has a @version
attribute with a value of N
. It then takes
all the <text>
element children of that branch and does something
with them.
What it does first is test whether there is a <text>
element in branch
M that has the same value as the current <text>
that we’re looking at
in branch N, because it will draw a line only when it finds a match. The XPath that
serves as the value of the @test
attribute in the
<xsl:if>
element starts by finding both branches, filters the
results of that first step with a predicate that retains only branch M, and then takes
all of its <text>
node children. It then has to test whether any of
those children have the same value as the current value it is looking at over in branch
N. The XPath context is the step in the path expression, that is, each of the
<text>
elements in branch M. What is has to compare those to is
the value of the <xsl:for-each>
element, which is each of the
<text>
elements in branch N. The XSLT current context is the
current value of the <xsl:for-each>
, which is an XSLT element; that
is, the XSLT current context is something from branch N. The XPath current context,
though, is determined by the XPath expression in which it occurs, which begins in branch
M. The XSLT current context is represented by the current()
function and
the XPath current context is represented by the dot.
count()
functionThe XPath count()
function counts the number of items in whatever is
expressed by the XPath expression between the parentheses. For example:
count(preceding-sibling::text)
counts the number of elements of type <text>
that are preceding
siblings of the current context node. We use this strategy in plectogram generation to
position the connecting lines between the left and right columns. The Y position of the
connecting line of a <text>
element with a particular value can be
calculated partially by counting the number of preceding sibling
<text>
nodes (that is, how far down it is in its column) and
multiplying that count by the height of each cell.
Scalable vector graphics (SVG) is an XML schema, or tag set, for describing graphic images. The scalable part means that as the image is stretched or shrunk, it is always optimally clear and crisp, and it does not become pixelated, no matter what thes resolution. The reason this happens is that SVG relies on vector graphics, that is, it draws a line, for example, by specifying the endpoints, rather than by describing which pixels along the line are colored and which are not. Because SVG describes the topology of an image (how to draw it), rather than a bitmap (which pixels are colored or not at a particular resolution), it draws the image in a way that is optimized for the size at which it is being rendered at that moment. This makes SVG a very convenient format for images on the web, where users expect to be able to resize their browser windows.
SVG is expressed in XML, which is to say that when you view the source of an SVG image, what you see is the underlying XML markup. This makes SVG a natural choice to visualize information that is encoded initially in XML, since one can use XSLT to transform textual XML into SVG, or graphic, XML. In this workshop we will transform tables of contents of manuscripts into a plectogram, a graphic map of correspondences between branches of a textual tradition.
This workshop covers only those features of SVG that are needed to understand how plectograms work. For a general tutorial on SVG, see http://www.w3schools.com/svg/svg_intro.asp.
The topics described here are:
SVG is a schema (tag set and rules for its use), according to which an SVG document has a
root element called <svg>
in the SVG namespace and uses only certain
other elements and attributes, all in that same namespace, and only in certain ways. The
exoskeleton of an SVG document looks like:
<svg xmlns="http://www.w3.org/2000/svg" width="100%" height="100%"> </svg>
As we note above, the xmlns
looks like an attribute, but technically it’s a
namespace declaration, and it asserts that the <svg>
element and
everything inside it is in the SVG namespace unless you specify otherwise. Inside the
SVG document you can use only elements from the SVG schema, and they must be used as the
schema requires (that is, they must occur only where the schema permits, and they can
contain only elements and attributes allowed by the schema).
SVG writes into a Cartesian coordinate space that is different from the one to which we’re accustomed. The X and Y axes cross at the origin, which has a value of (0,0), and the X axis is negative to the left of the origin and positive to the right, as we expect. The Y axis, though, is negative above the origin and positive below it. This means that, contrary to our expectations, an object with a larger Y value is rendered lower on the screen than an object with a smaller Y value. For plectogram purposes, we’re going to draw only in the lower right quadrant, so all of our X and Y values will be positive.
Plectograms use three SVG elements, as follows:
<rect>
(draw a rectangle)An SVG rectangle must specify the X and Y coordinates of the upper left corner
(attributes @x
and @y
) as well as a width and height
(@width
and @height
). Furthermore, it uses the SVG
attributes @stroke
(color of the border),
@stroke-width
(width of the border), and @fill
(color of the interior). Consider
<rect x="100" y="150" width="200" height="50" stroke="green" stroke-width="5" fill="pink"/>
This creates a rectangle with the upper left corner at position (100,150) with a
width of 200 units and a height of 50 units. Since values on the Y axis increase
as one moves downward, the height also grows
downward, which is to say that the four corners of this rectangle are
(100,150), (300,150), (100,200), and (300,200). @stroke
specifies
the color of the border, @stroke-width
specifies the width of the
border, and @fill
specifies the color of the interior.
For our plectogram generator we use the following code to create the rectangular cells:
<rect x="{$branch * $interMsSpace + $xOffset}" y="{position() * $cellHeight}" width="{$cellWidth}" height="{$cellHeight}" stroke="{$ink}" stroke-width="{$strokeWidth}" fill="{$background-color}"/>
This code is called inside an <xsl:for-each>
loop, and each
time it’s called it draws the rectangle in a different place because the
variables and functions used to calculate the position change inside that loop.
As explained earlier, the values are inside curly braces so that the XPath code
will be interpreted, with the value entered into the output stream. Without the
curly braces, the literal characters would be written into the output stream,
and since some of them are illegal (not the expected numbers, colors, etc.), the
image would not be rendered by the browser.
<text>
(place some text)Text in SVG is created with a <text>
element, which specifies
the the starting coordinates with @x
and @y
attributes. The color of the text is specified with the @stroke
attribute. For example:
<text x="100" y="200" stroke="blue"l>Hi, mom!</text>
creates a <text>
element that renders the string Hi,
mom!
in blue at position (100,200). For our plectogram generator we
use:
<text x="{$branch * $interMsSpace + $xOffset + $textOffset}" y="{(position() + 1) * $cellHeight - $textOffset}" stroke="{$ink}"l> <xsl:value-of select="."/> </text>
As with the rectangle above, this element is located inside a
<xsl:for-each>
loop that generates a new
<text>
element for each cell in the output.
<line>
(draw a line)After we’ve created a column for each branch of the tradition with SVG
<rect>
elements and labeled them with
<text>
elements, we draw connecting lines between cells
that have the same value with SVG <line>
elements. An SVG
<line>
element has the following structure:
<line x1="100" y1="100" x2="200" y2="200" stroke="red" stroke-width="2"/>
An SVG <line>
element specifies the two endpoints of the line
with the @x1
, @y1
, @x2
, and
y2
attributes, using @stroke
to specify the color
of the line and @stroke-width
to specify the width. The preceding
code draws a red line from (100,100) to (200,200) that is two units wide.
For the plectogram generation code we use the following construction:
<line x1="{$xOffset + $cellWidth}" y1="{position() * 20 + 10}" x2="{$xOffset + $interMsSpace}" y2="{//branch[@version='M']/text[. = current()]/(count(preceding-sibling::text) + 1) * $cellHeight + $cellMidHeight}" stroke="{$ink}" stroke-width="2"/>
This code runs inside a <xsl:for-each>
loop, which examines
every cell in the N branch of the tradition. The loop first checks to see
whether there is a matching cell in the M branch. If so, it draws a line between
the matching cells. The complicated XPath in the value of the @y2
attribute works as follows:
<branch>
elements in the
document.<branch>
elements, one for branch N and one
for branch M. The predicate [@version='M']
takes all (both) of
the <branch>
elements and filters them, retaining only
the one that has a @version
attribute with the value
M.
/text
, finds all of the children of the M
<branch>
element that are of type
<text>
. There is one such element for each sermon in
the branch.<text>
elements,
retaining only certain ones. <set>
(animate an SVG image)The SVG <set>
element is part of the SVG schema that supports
animation, or changes in the image in response to, among other things,
user-generated mouse events. A <set>
elements looks like:
<set attributeName="fill" to="red" begin="N12.mouseover" end="N12.mouseout"/>
The <set>
element is placed between the start and end tags of
whatever element it is supposed to affect. If the <set>
element above is placed between the start and end tags of an SVG
<rect>
element (which describes a rectangle), it will
change the fill (interior) color of that rectangle to red when the user mouses
over an element with an @id
attribute that has the value
N12
, and the rectangle will return to its original color when the user
mouses out of that element.
For plectogram generation we include <set>
elements inside
each <rect>
and <line>
element,
instructing the SVG rendering engine to change the fill color of the rectangle
(or the stroke color of the line) when the user mouses over the rectange itself,
or the <text>
element used to label it, or a rectangle with
the same text in the other branch of the tradition. The effect is that
when the user mouses over a rectangle, it changes color, as does a
correspondingly labeled rectangle in the other branch of the tradition and the
line that connects them.
For pedagogical reasons we’ve simplified the examples above by omitting the
animation, and for that reason we also didn’t generate @id
attributes for the SVG elements we’ve been creating. For an animated plectogram,
it’s necessary to assign unique @id
attribute values to any element
that is intended to serve as a trigger for an animation effect, since the values
of the @begin
and @end
attributes of a
<set>
element need to know which elements can trigger
that effect. In a more complete XSLT stylesheet that supports animation, we
create a rectangle with:
<rect id="{concat($version,$stripped)}" x="{$branch * $interMsSpace + $xOffset}" y="{position() * $cellHeight}" width="{$cellWidth}" height="{$cellHeight}" stroke="{$ink}" stroke-width="2" fill="{$background-color}"> <set attributeName="fill" to="{$color}" begin="{concat('N',$stripped,'.mouseover')}" end="{concat('N',$stripped,'.mouseout')}"/> <set attributeName="fill" to="{$color}" begin="{concat('M',$stripped,'.mouseover')}" end="{concat('M',$stripped,'.mouseout')}"/> <set attributeName="fill" to="{$color}" begin="{concat('N',$stripped,'text.mouseover')}" end="{concat('N',$stripped,'text.mouseout')}"/> <set attributeName="fill" to="{$color}" begin="{concat('M',$stripped,'text.mouseover')}" end="{concat('M',$stripped,'text.mouseout')}"/> </rect>
For each cell in the plectogram, we generate a rectangle and assign it a unique
@id
attribute. Because an @id
cannot contain
spaces and periods confuse the animation triggers, we take the textual content
of the cell and strip those characters, using the stripped version (which we
assign to the variable $stripped
) to build the @id
value and the identifiers used insde the @begin
and
@end
attributes. The concat()
XPath function
inside those attributes concatenates its values, which in this case is the
string that identifies the branch (N
or M
), the stripped version
of the textual content of the rectangle (e.g., Ep101
or Ep. 101
),
and the string .mouseover
or .mouseout
. The four
<set>
elements color the rectangle when one mouses over
it, the text inside it, or the corresponding rectangle and text in the
plectogram column for the other branch of the tradition. The
<line>
elements, created further down, contain the same
triggers.