University of Kansas digital humanities workshop


Maintained by: David J. Birnbaum (djbpitt@gmail.com) [Creative Commons BY-NC-SA 3.0 Unported License] Last modified: 2015-09-24T21:11:00+0000


Formatting echeloned poetry

The problem

The Russian poet Vladimir Vladimirovič Majakovskij laid out a large portion of his verse in echeloned lines, which can be considered a graphic representation of his declamatory style. One poem that utilizes this typographic layout is Majakovskij’s Stixi o sovetskom pasporte, translated into English as My Russian passport. Echeloned layout was not unique to Majakovskij; it was, for example, copied by the US poet Frank O’Hara in his Ave Maria.

There are several challenges to reproducing echeloned layout in HTML. One challenge is that HTML collapses sequences of space characters into a single space, which makes it impossible to use multiple regular spaces to control the amount of indentation. This can be overcome by using non-breaking spaces, which are not collapsed, but the developer must then know how many non-breaking space characters to use for each indentation. Or the developer can use regular spaces inside a <pre> element (or apply the appropriate CSS property to a different element), which prevents the collapse of consecutive white-space characters. That task, using either of those strategies, is relatively easy with a monospaced font, since the developer can just count the number of characters in the preceding line(s), but typically developers would prefer to use variable-spaced fonts because those are more culturally appropriate and expected in the representation of poetic texts. The sample at http://pishi-stihi.ru/stihi-o-sovetskom-pasporte-mayakovskij.html uses a monospaced font inside a <pre> element, and you can view the source to see the use of regular space characters. With a variable-spaced font it’s easy to get it wrong, as at https://www.marxists.org/subject/art/literature/mayakovsky/1929/my-soviet-passport.htm, where we can see that the indentations overlap where they shouldn’t, and you can view the source to see the use of non-breaking space characters. Alternatively, the developer can hard-code the amount of indentation for each line using CSS, but that requires calculating the exact measurement, and it is not clear how that process could be automated, which means that it must be done manually for each individual line. This is what http://feb-web.ru/feb/mayakovsky/texts/ms0/msa/msa-068-.htm does; you can see the underlying HTML and CSS by viewing the source.

How to think about the problem

What we want is to indent each line by the length of the preceding line(s) in an enviroment where the length of the preceding line is difficult to calculate. One type of simple solution, then, would be to use the length of the preceding line without knowing (or having to calculate) what that length is.

Suppose we have just the first echeloned set from the English version of the poem:

I'd tear
I'd tearlike a wolf
I'd tear like a wolfat bureaucracy.

In the example above, we’ve measured the length of the preceding line(s) in the most direct way possible—by not measuring them. What we’ve done instead is reproduce the actual text in the place where the indentation is needed, so that the length is exactly the length of the indentation, and we’ve made that place-filling text invisible by using the CSS visibility property with the value hidden. Not only does hidden visibility make the text invisible to the human eye (something we could alternatively have done with white text on a white background), but it also makes it invisible to searching, selecting, etc. (try searching or selecting that hidden text now), so it behaves in the browser as if it didn’t exist except that it occupies space, which is exactly the combination of behaviors we need. Here’s how it looks under the hood, with the markup exposed:

I'd tear
<span class="echeloned">I'd tear</span>like a wolf
<span class="echeloned">I'd tear like a wolf</span>at bureaucracy.

and the associated CSS reads:

.echeloned {visibility: hidden;}

From plain text to XML

The most common form in which we’re likely to encounter echeloned poetry is plain text, so let’s assume that we've found some on line and we want to convert it to formatted HTML. We can do the entire transformation from the plain text using XSLT, but in order to concentrate here just on creating the HTML output, let’s convert the plain text first to XML using Find-and-replace operations inside <oXygen/>. Here’s the procedure, using the English-language example at https://www.marxists.org/subject/art/literature/mayakovsky/1929/my-soviet-passport.htm:

  1. Navigate to https://www.marxists.org/subject/art/literature/mayakovsky/1929/my-soviet-passport.htm, select the text of the body of the poem (ignore the title and everything else on the page for now; you can add it into the output later). Open a new Text document in <oXygen/> and paste the text in there.
  2. Open the Find-and-replace dialog in <oXygen/> (ctrl-f in Windows; cmd-f in Mac) and check Case sensitive, Wrap around, and Regular expression. Under Regular expression, be sure that Dot matches all is unchecked (which is the default).
  3. In the Find box, type .+ (a period followed by a plus sign) and in the Replace with box type <line>\0</line>. Then hit the Replace All button. This has the effect of wrapping every line in <line> tags. (If you aren’t familiar with using regular expressions, you can read about how this works by exploring the tutorials at http://www.regular-expressions.info/quickstart.html.)
  4. Insert a <poem> start tag at the very beginning and a </poem> end tag at the very end. Save the file with the filename poem.xml (the .xml extension is important), close it out of <oXygen/>, and then open it. The reason you have to close and reopen it again is that you created it as plain text, and even though you’ve saved it as an XML file, <oXygen/> doesn’t know that it’s XML (and therefore can’t transform it with XSLT) until you reopen it as XML.
  5. Note that the number of spaces at the beginning of the indented lines probably looks wrong. You don’t have to worry about that. The only important detail is that indented lines must begin with at least one space, but it doesn’t matter how many spaces they have, since we’re going to implement the level of indentation according to the length of the preceding line(s), and we’re not going to count or measure the space characters anyway. All we need to know is which lines are indented (at all) and which are not, and the presence of absence of leading space characters tells us that.

How to think about the problem, part 2

Each line that has no leading space characters should begin at the left margin, and subsequent lines should be indented by the aggregated length of all of the preceding line(s), but only going back to the first preceding non-indented line. In other words, each non-indented line plus any following indented lines can be considered an indentation group, and the group ends and the next one begins as soon as another non-indented line is encountered (or at the end of the poem).

Grouping in XSLT

Once we understand our indentation groups conceptually, we can identify them programmatically by using the XSLT 2.0 <xsl:for-each-group> element. The version we’ll use is the following:

<xsl:for-each group select="line" group-starting-with="line[not(starts-with(.,' ')]">

This creates a new group for each sequence of lines that begins with a non-indented line. We then iterate over each of the lines in the group with:

<xsl:for-each select="current-group()">

For each line in the group we create a <p>. We then test whether the group member we’re processing at the moment is the first member of the group, that is, the non-indented line. We use a negative test (if the position of the current group is not 1), and if the test succeeds, we create a <span> with a @class attribute that has the value echeloned. Inside that <span>, we apply templates to all members of the current group that appear before the current line. Whether we have a preceding <span> or not, we alway end the processing of a line by applying templates to the current line itself, so that it will be rendered normally:

<xsl:for-each select="current-group()">
    <p>
        <xsl:if test="position() ne 1">
            <span class="echeloned">
                <xsl:apply-templates
                    select="current-group()[. << current()]"/>
            </span>
        </xsl:if>
        <xsl:apply-templates/>
    </p>
</xsl:for-each>

Here is the full XSLT stylesheet:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema" exclude-result-prefixes="xs" version="2.0"
    xmlns="http://www.w3.org/1999/xhtml">
    <xsl:output method="xml" indent="yes" doctype-system="about: legacy-compat"/>
    <xsl:template match="/">
        <html>
            <head>
                <title>Echeloned layout example</title>
                <style type="text/css">
                    p{
                        margin-top:0;
                        margin-bottom:0;
                    }
                    .echeloned{
                        visibility:hidden;
                    }</style>
            </head>
            <body>
                <xsl:for-each-group select="//line"
                    group-starting-with="//line[not(starts-with(., ' '))]">
                    <xsl:for-each select="current-group()">
                        <p>
                            <xsl:if test="position() ne 1">
                                <span class="echeloned">
                                    <xsl:apply-templates
                                        select="current-group()[. &lt;&lt; current()]"/>
                                </span>
                            </xsl:if>
                            <xsl:apply-templates/>
                        </p>
                    </xsl:for-each>
                </xsl:for-each-group>
            </body>
        </html>
    </xsl:template>
</xsl:stylesheet>

You can try it yourself with poem.xml and poem.xsl. The output of our transformation is at poem.xhtml. If you run the transformation inside the <oXygen/> XSLT debugger, the formatted HTML output will be garbled because the HTML viewer in the debugger (which isn’t designed as a full-featured HTML browser) doesn’t support the CSS visibility property. But you can see in the text view that the output is correct, and if you save the transformation and open it in a real browser, the indentation will be correct.