|
To simplify the task of annotating regular structures (e.g., tables or
lists), the Mangrove system provides the reglist
macro
command. This document describes how to use it.
The reglist command can help to annotate any collection of items (such as publications, events, or interests). To be concrete, this document describes how to annotate all "events" in a table with a single template statement, referred to as a regular list, or simply a reglist. This simplifies annotation and makes it easier to maintain the annotations when you edit your document.
Assumptions:
Using a template of that kind reduces to adding a reglist
macro that describes the structure of a
single recurring element (e.g., a row in a table).
This "definition" should be written before the very first actual event in the table.
For example, if the first row of the table contains the column headings, the template
should immediately follow it; otherwise it should be right after the <table>
tag.
The position of the closing reglist tag defines the scope of the macros, i.e.,
the semantic structure defined in the template will be applied only to the rows
enclosed between the pair of open and close reglist
tags.
The following HTML fragment is annotated with a reglist
macro. The
relevant code is in bold.
<html xmlns:uw="http://www.cs.washington.edu/research/semweb/vocab#v1_0">
<table>
<tr>
<th>Date</th>
<th>Topic</th>
<th>Presenter</th>
<th>Paper</th>
<th>Additional Notes</th>
</tr>
<!-- <reglist="<tr><uw:event>
<td><uw:date>...</uw:date>
<td><uw:topic>...</uw:topic>
<td><uw:presenter>...</uw:presenter>
<td><uw:paper>...</uw:paper>
<td>...
</uw:event></tr>"> -->
<tr>
<td>Feb 3, 2003</td>
<td>Semantic Web</td>
<td>Luke McDowell</td>
<td>Evolving the Semantic Web with Mangrove</td>
<td>cookies provided...</td>
</tr>
...
<!-- </reglist> -->
</table>
</html>
<reglist>
Elementreglist
element has its own syntax and
is very similar to a regular expression. Let's take a look at the details.
reglist
element is enclosed in an HTML comment. (In comparison, the
regular semantic tags could be added directly among other HTML tags.)
Note that in that comment there are no other tags except the reglist itself.reglist
does not have the “uw:”
prefix. The reason for that is that the reglist
element
is a macro command, and it is available in all user-defined name spaces.reglist
tag.
Let’s go through the above example with
the table and explore the actual syntax of the element.
In this case the reglist
element describes the structure of a single row from the table.
(The very first and the last tag from the value of the reglist
define that scope.)
The string representing the value of the reglist element is actually the skeleton of a
row from the table (the HTML elements) with additional semantic tags and the special symbol “...
”
(without the quotes).
The sample reglist
tells the semantic parser to treat each row in this table as a <uw:event>
object.
The data in the first column should be interpreted as a date for that event, the next one as the event’s
topic, etc. The last column does not have any semantic tags (i.e., we have only ‘<td>...
’).
This means that there is a column in the table, but we do not want to annotate its contents, or there is
no suitable semantic tag for it in the name space we are using currently.
The symbol “...
” is a place holder for the data which is present in the actual table.
The order of the semantic tags and their neighboring HTML tags could be switched, i.e., it is
correct to use "<tr><uw:event>
" or "<uw:event><tr>
".
Here is an example of a reglist
element used for annotation of an HTML list.
reglist
element is not suported by the GUI tagger, so
if you like to use it, you have to edit the source of your web page
by hand
(i.e., with a text editor).
reglist
tag and the semantic tags inside it are ignored by traditional browsers and thus will
not disrupt the look and feel of your web page. Likewise, HTML
formatting tags are ignored by the semantic parser, so you can tag your
data without having to make any other changes to the HTML.
<html>
tag as specified above.reglist
element doesn't have a "uw:"
prefix and that it is inside an HTML comment.reglist
element (also inside a comment).reglist
tag corresponds exactly to the actual structure of the HTML tags in the table.
E.g., if you have 3 columns in the table, describe all of them in the reglist value; if you use
closing HTML tags in the reglist value, check that all of their corresponding columns have that closing tag.<uw:date>
(including a year) and a <uw:topic>
.