HackML — the vocabulary

Ian K Tindale’s mark–up language — HackML

This is a vocabulary list for Ian K Tindale’s simple mark–up language for article writing. As you can see, there’s not a lot to it. The majority of usage will only use a small amount of this — the rest of it can be ignored freely.

About half of it is ‘meta’ stuff — reasonably esoteric and by no means regularly used (by most people’s standards). Nevertheless, it should be found to be within easy comprehension. The current DTD for HackML is viewable here. The home page for HackML is back here.

The root element is one called ‘article’. This reflects the scope of this language — simply a quick way of slapping down an article in the minimum of time and hassle, and yet get a fair amount of ‘tagging–up’ done in the process.

Most elements have a class and id attribute (exceptions include the ‘article’, ‘meta’, ‘q’ elements, all of the ‘dateline’ children), which work in the expected manner. In the descriptions below, I describe only the additional attributes.

article

The root element of HackML. Everything is within an ‘article’.

<!--© Ian K Tindale 2004. Latest revision timestamp: 19:20 25th July 2004-->

<!-- Note: 'figure' is no longer a 'meta' element, but occurs zero or more times after the 'copy' element. The 'photo' element is now the 'picture' element (which is what the 'figure' element was originally called). The 'tabs' element is no more. The 'list' element is no more (just the 'item' element). Coming soon: losing the 'cols' attribute of the 't' element, and subgrouping the common elements among 'copy' and 'figure' into one group: %passage ('q', 'crosshead', 'i', 'b', 't' and 'item') -->

meta

The meta element describes things pertaining to the article, but not within the flow of the galley, as it were. Elements within the copy can point to meta items here to derive content from.

title

Holds the title of the article. This is not necessarily the same as the banner headline, although if it is, the banner can point to here via the source attribute, which is an IDREF. (Not mandatory, as there can only be one ‘title’ element, and if there's only one ‘banner’ element, and it is empty, it might as well default to the contents of the ‘title’ element.

 
author

Person wot wrote the stuff. Again, not necessarily the same as the byline, but if it is, the byline can simply point to here via the source attribute, which is an IDREF.

Attribute: role

The ‘role’ attribute is optionally used to indicate a relationship the author might have, particularly if there is more than one author involved in an article’s creation.

 
publication

Specifies the intended publication that this article is destined for.

 
dateline

The dateline meta element is essentially for holding canonical time references, which might link to an anchored reference in the copy. Initially, it would be used as the meta information usage of the date of this article.

However, it could also conceivably be used for pretty much any date or time, and offers a single point of reference for time-related anchors in the copy. Hence, if an important date changes, it changes everywhere equally all at once. This also allows the presentation style and logic of date presentations to be altered without altering the original copy.

Attribute: role

If there is only one dateline, it is likely to be assumed that this contains the date of writing of the article. If there is more than one dateline, there needs to be a way of sorting out what they all mean. The ‘dateline’ element group can be legitimately used for pretty much any date or time appearing in the article (as well as further meta uses such as editing timestamps). The ‘role’ attribute gives the author the chance to define this information. The suggested value for date of writing or submission might be the string ‘article’.

year

The year.

month

The month, numerically or as words. Transformation can convert to and from a full named version quite simply.

date

The date of the month, numerically, or in words (transformation would have to derive the logic of ‘first’, ‘second’, third’ until we get to the ‘th’s.

day

The day of the week, in words. Nobody will ever get it right if you try enforcing a numerical way of numbering days of week. Too many standards of where a weeks starts, causing confusion, so don’t even bother. The transformation will have to look for the minimum string that can define a day of the week and make sense of that.

Shouldn’t be too difficult, once lower –cased and trimmed back. This allows the writer to write things like: ‘Wed’; ‘wednesday’; ‘Friday’; ‘thurs.’ as they will. It should be noted that if this data isn’t present, it should still be able to be derived from the other data.

hour

Hour in 24hr format, or 12hour format if followed by am or pm

minute

A number from 0 to 59

second

A number from 0 to 59

words

Words placed within this group of elements simply act as ‘hints’ or ‘markers’ to a transformation. It might be that the existence of a word listed here enables every occurrence of said word to have some kind of treatment applied to every occurrence of it in elsewhere, either on the first, or all occurrences.

Certain words in the copy might need to be visually treated a special way each time they are encountered. For example, a person’s name or a product name, according to a house style, might need to be in bold and italic on the first occurrence, and in italics for all subsequent occurrences.

Listing it here once allows the transformation to search for each occurrence of the listed word and transform it accordingly without the author having to specify the same treatment or style over and over each time they type it.

There are four child elements of ‘words’, simply as convenient groupings which may be treated differently by different style sheets. Words may also be placed loose, directly within the ‘words’ element, where they will all be treated by the style sheet in the same way as each other with no differentiation.

Attribute: occurrence

The ‘occurrence’ attribute is entirely optional, as this sort of thing should take place in the design of the transformation, dictated by the publication house style. It is intended to imply whether a global presentational treatment of the list of contents of the ‘words’ should apply to the first, or all occurrences. Likely to be a rarely used attribute.

buzzword

List of buzzwords — technical names, acronyms, important and pertinent words particular to the topic of the article, not already covered by the ‘person’, ‘product’, or ‘organisation’ categories.

Attribute: occurrence

The ‘occurrence’ attribute is intended to imply whether a global presentational treatment of the list of contents of the ‘words’ should apply to the first, or all occurrences.

person

List of person names for treatment by transformation. Quite often it is a house style that the first occurrence of a person’s name should be emboldened or italicised, for example.

Another thing is that it is considered ‘bad mannered’ to break a person’s name across lines. Thus, non–breaking spaces can be substituted by transformation into any names on this list.

Attribute: occurrence

The ‘occurrence’ attribute is intended to imply whether a global presentational treatment of the list of contents of the ‘words’ should apply to the first, or all occurrences.

product

Product names, brand names, song names, intellectual property items, book titles, films, works of art, that kind of thing.

Attribute: occurrence

The ‘occurrence’ attribute is intended to imply whether a global presentational treatment of the list of contents of the ‘words’ should apply to the first, or all occurrences.

organisation

Organisation and company names. Pretty much any body of people that perform some identifiable task and perhaps have an identifiable group boundary.

Attribute: occurrence

The ‘occurrence’ attribute is intended to imply whether a global presentational treatment of the list of contents of the ‘words’ should apply to the first, or all occurrences.

banner

The banner headline is the main headline. It is often the title of the piece, in which case, it need not contain any actual text, but instead simply point to the meta title, using the source attribute (an IDREF) (or if there is no content and the attribute is left out, it should do this by default).

For example, a future ‘design engine’ might look specifically for the content of the banner, subjecting it to a complex and highly stylised typographic design treatment. This designed graphic could be substituted for the textual content of the banner element, typically occupying a prominent place in the layout.

Attribute: source

The ‘source’ attribute is is an IDREF. It is used to reference another element by its ID value. In this case, it is optional. If the banner element contains no text, and has a source attribute that points to the title element in meta, it should act as if it contains what the title element contains. If there is no source attribute, the layout engine should assume it wants the content of the title element anyway. If the source attribute points to the ID of some other element that contains text, it should use that (the results might be strange, but nevertheless, that might be what the author wants). If the banner headline contains any usable text, it should be used regardless.

 
strap

A strap differs from a crosshead in that it is a secondary level to the banner, but isn’t necessarily present in the flow of the galley. It will be a higher level than any crosshead, however.

Attribute: source

The ‘source’ attribute is is an IDREF. It is used to reference another element by its ID value. In this case, it is optional. If the strap element contains no text, and has a source attribute that points to the identified element, it should act as if it contains what that element contains. Typically, this might point to a pullquote. If there is no source attribute and no strap element content, then you get no strapline.

 
byline

The attribution line. In most cases, it will simply point to the meta author, using the source attribute (an IDREF) (or if there is no content and the attribute is left out, it should do this by default). In cases where a publication pseudonym is used, the ‘byline’ can hold that text directly.

Attribute: source

The ‘source’ attribute is is an IDREF. It is used to reference another element by its ID value. In this case, it is optional. If the byline element contains no text, and has a source attribute that points to an author element in meta, it should act as if it contains what that author element contains. If there is no source attribute, the layout engine should assume it wants the content of the primary author element anyway. If the source attribute points to the ID of some other element that contains text, it should use that (the results might be strange, but nevertheless, that might be what the author wants). If the byline contains any usable text, it should be used regardless.

 
standfirst

A large and prominent passage of text that occurs between the banner and the body copy. It is not part of the flow of the galley, and occupies a higher level than body copy. Its purpose is partially to sum up the point of the article as a whole, and partially to entice the reader to actually begin reading this article.

It might be that this is also one of the most visible parts of the article from an external point of view. For example, a contents page system which knows the pagination, might look at an article for the banner, byline and standfirst, using the standfirst as the summary on the contents listing (as style might dictate).

 
copy

The single most important element in HackML. This is where the article itself is written. This element contains all of the body copy that would flow in a galley. It doesn’t, however, contain anything that sits outside of galley flow.

For example, the main headline types, picture block captions, various other boxout copy, will live in either meta element types or figure elements. A bare minimum HackML article would simply consist of an article element, containing a copy element, in turn containing all the words that the author writes.

Note: the ‘q’, ‘i’, ‘b’, ‘crosshead’, ‘t’ and ‘list’ child elements are the same as those found as child elements of the ‘figures’ element, but are described here.

q

The ‘q’ element (which stands for “quad”), together with the ‘article’ and ‘copy’ elements, is among the most fundamental of elements in HackML. The ‘q’ element is used to separate paragraphs. It is an empty element — that is, it is not a container.

Type a paragraph, bang a quad in, type another paragraph, bang another quad in, and so on. If nothing else comes along to end a paragraph (like the various kinds of crosshead, tabs or lists), then this acts as the most basic paragraph separation mechanism.

It is designed to be easy to use, as well as harking back to the days of inline–code driven typesetting terminology, where quadding was the mechanism for ending a paragraph and determining the justification direction (the latter of which this doesn’t – that is entirely down to the stylesheet or layout engine to decide).

 
t

The ‘t’ element is a simple tabulation element. Just bang one in where you need a tab.

It is left up to the intelligent design of the transformation to detect how many columns and rows a table is implied to be by being aware of successive lines containing any ‘t’ elements, each delimited by a quad element.

Tables are able to be combined freely with list items to form quite structurally rich formats of content, using only a seemingly few constructional elements.

l

The l element simply begins a list item — terminated with a ‘q’ element, perhaps with another ‘l’ element or so, and perhaps also some‘t’ elements along the way.

This apparently simple scheme leaves it up to the intelligent design of transformation to detect complex list (and table) constructions over consecutive lines.

i

The i container marks out a specific passage of text within the copy as being important. The transformation might want to take the content of the i element and apply a style to it, for example, italicisation. It doesn’t, however, stand for italics, it stands for ‘important’.

b

The b container marks out a specific passage of text within the copy as being bloody important. The transformation might want to take the content of the b element and apply a style to it, for example, emboldening. It doesn’t, however, stand for bold, it stands for ‘bloody-important’.

crosshead

These are sub-headline, appearing within the galley. Crossheads are often also known to many people in other situations, as subheads. In such cases, crossheads are generally centred within the column and subheads typically appear ranged left.

pullquote

The pullquote container marks out a specific passage of text within the copy. The idea is that a suitable transformation can pick out these marked pullquotes and place them yet again, in addition to their galley flow presence, as separate and especially styled floating items in the layout.

Attribute: anchor

The ‘anchor’ attribute is is an IDREF. It is used to identify which particular anchor element that the duplicated and styled copy of the pullquote should point to, and hence appear near. If no anchor attribute is specified, the layout engine should simply treat unanchored pullquotes in loose order of appearance.

anchor

Quite an important element. This links a specific point in the copy to a specific asset in the out-of-galley elements, such as figures and pullquotes.

figure

Figures (typically pictures, illustrations, but also purely textual box–outs) are (possibly counter–intuitively) described here, rather than in the copy. This is because they are part of the assets of an article job–bag, but don’t necessarily live within the flow of the galley copy. They can be anchored to a specific point in the flow of galley, using an anchor.

This keeps it ‘near’ as a hint to the layout system. If tight positional relevance is not required, any anchoring can be omitted and the figure simply goes where it fits best, in something loosely resembling the order provided. Hopefully a layout engine would take care of unanchored pictures as well as anchored ones to present them in a nice layout style.

Note: the ‘q’, ‘i’, ‘b’, ‘crosshead’, ‘t’ and ‘list’ child elements are the same as those found as child elements of the ‘copy’ element, and are described there.

Attribute: anchor

The ‘anchor’ attribute is is an IDREF. It is used to identify which particular anchor element that the figure block should point to, and hence appear near. If no anchor attribute is specified, the layout engine should simply treat unanchored figures in loose order of appearance.

picture

A link to a picture (photo or illustration) object. Control over sizes of photographs or illustrations are a big question mark at the moment. I suggest that layout engines try and take the given size and the ‘bigness’ attribute as a combined hint.

Attribute: source

The ‘source’ attribute is actually a lot more permissive than it seems it should be. Ideally, it should hold the path to an asset such as a photo or illustration file, which will be used at the layout and composition stage. Typically, this path should be relative. In actual reality, people will mess this up and put absolute paths to directories on their own machines in there, etc. The transformation should attempt to try and figure out whether things are missing or not, and ask for them before processing can occur.

Attribute: bigness

The ‘bigness’ attribute is also extremely permissive. It should allow an author to indicate roughly how big a picture should be presented. It most certainly doesn’t allow an author precise control over size (hence the name of the attribute). That’s the designer’s job.

 
credit

Simply contains the name of the person or entity that is responsible for this photograph. To attribute ownership of this photo to a specific person or entity. Could be derived into a copyright line, or a byline, if so chosen by the design of the transformation.

caption

The caption text that will accompany an illustration, photo, list, table or passage of text. In a typical usage, the rest of the figure might be perhaps boxed in, whereas the caption would sit outside of the box.

q

(described under ‘copy’ element)

t

(described under ‘copy’ element)

l

(described under ‘copy’ element.

i

(described under ‘copy’ element)

b

(described under ‘copy’ element)

crosshead

(described under ‘copy’ element)

This might seem a simple collection of elements, but there is a heavy reliance on intelligent design of transformations, and even the vague and as yet undeveloped notion of a layout engine.

The whole article is presented quite as a sort of ‘job bag’, where the various assets are bundled up either in or outside the flow of galley that is represented by the contents of the ‘copy’ element. Layout can be hinted by means of the ‘anchor’ elements.