Maintained by: David J. Birnbaum (djbpitt@gmail.com) Last modified: 2015-09-25T19:30:36+0000
David J. Birnbaum
University of Pittsburgh
Email:
djbpitt@gmail.com
URL:
http://www.obdurodon.org
Jeffrey A. Rydberg-Cox
University of Missouri, Kansas
City
Email:
rydbergcoxj@umkc.edu
URL:
http://daedalus.umkc.edu
These two workshops are designed to help digital humanists with basic XML experience refine their skills in document analysis, markup, and XML processing. The morning workshop (Creating literary and linguistic annotation, 3 hours of instruction) concentrates on document analysis and advanced beginner level XML annotation. The afternoon workshop (Using literary and linguistic annotation once you’ve created it, 3 hours of instruction) introduces the use of XPath and XSLT to transform and query XML documents. The two workshops are independent of each other; participants may register for either or for both. Both workshop will take place in Watson Library room 455, and are part of the University of Kansas DH Forum 2015: Peripheries, barriers, hierarchies.
This workshop will concentrate on document analysis, project design, and making markup decisions in complex cases, such as those involving overlap or dependencies on external documents. Examples will be drawn from data supplied in advance by participants and from other sources. Participants should already have hands-on experience tagging XML documents and should already have read or re-read An even gentler introduction to XML.
9:00–9:30: Brief re-introduction to XML, using Julius Segall’s
Wo ist des Armen Vaterland?
9:30–10:15: Working session #1, using Jarring Prov. 105 (Turki legal document from 1855–56)
This document includes missing text due to holes in the manuscript, marginalia, a seal, and interesting indigenous paper. Possible topics for discussion may include:
ibetween them)?
ʰ, have a dedicated Unicode code point, but some others don’t. What’s the best way to represent them (e.g., by wrapping them in tags so that they will later be rendered as superscripts). (For Unicode values see Unicode 8.0 Character Code Charts. To identify the Unicode code points of an existing digital text, see Richard Ishida’s Unicode code converter. Mac OSX users can download earthlingsoft’s free Unicode checker application.)
Prov.11
10:30–11:15: Working session #2, using Krieg und Frieden (1917-09-08)
Goals: encode as much of the formatting as possible in order to provide truly diplomatic transcriptions. Issues include multiple columns, centering-related issues, indentation of individual lines, footnotes and annotations.
11:15–12:00: Working session #3, using Rudyard Kipling’s The truce of the bear (1914-08-31)
Goals: encode as much of the formatting as possible in order to provide truly diplomatic transcriptions. Issues include multiple columns, centering-related issues, indentation of individual lines, footnotes and annotations.
This poem is particularly challenging but fairly typical for poems we are finding in periodicals: centered poem title, centered prose after title/before poem, two-colum layout, unusual indentation/spacing within poem text, etc.
This workshop will introduce participants to querying and transforming XML documents using XPath and XSLT and to validating documents using XPath and Schematron. Examples will be drawn from data supplied in advance by participants and from other sources. Participants should already have hands-on experience tagging XML documents and should already have read or re-read An even gentler introduction to XML and What can XPath do for me?. The following is not required, but those who are interested may read ahead in our introductory XSLT and Schematron tutorials. The first workshop is not a prerequisite for the second; participants may enroll in either or in both.
1:00–1:30: Processing XML with XSLT
1:30–2:15: Working session #1, using Jarring Prov. 105 (Turki legal document from 1855–56)
This document includes missing text due to holes in the manuscript, marginalia, a seal, and interesting indigenous paper. Processing tasks include managing editor annotations (seal, holes, marginalia, paper, etc.)
Possible topics for discussion:
<orth>
tier. What’s the best way to process
that? And conversely, what’s the best way to write the output of a function
into a comment?ibetween them)?
ʰ, have a dedicated Unicode code point, but some others don’t. What’s the best way to represent them (e.g., by wrapping them in tags so that they will later be rendered as superscripts).
Using XSLT to create HTMLsection of ourHTML basics tutorial.)
2:30–3:15: Working session #2, using Krieg und Frieden (1917-09-08)
Goals: encode as much of the formatting as possible in order to provide truly diplomatic transcriptions. Issues include multiple columns, centering-related issues, indentation of individual lines, footnotes and annotations.
3:15–4:00: Working session #3, using Rudyard Kipling’s The truce of the bear (1914-08-31)
Goals: encode as much of the formatting as possible in order to provide truly diplomatic transcriptions. Issues include multiple columns, centering-related issues, indentation of individual lines, footnotes and annotations.
This poem is particularly challenging but fairly typical for poems we are finding in periodicals: centered poem title, centered prose after title/before poem, two-colum layout, unusual indentation/spacing within poem text, etc.
One of the hardest formatting challenges for indented poetry is echeloned lines, of the sort popularized by Vladimir Vladimirovič Majakovskij and Frank O’Hara. See our tutorial on Formatting echeloned poetry about how to deal with that sort of indentation in a plain-text to XML to HTML transformation process.