Developing A Topic Map Programming Model

Khalil Ahmed

Principal Consultant
Ontopia AS

Table of Contents

Justification for Developing a Topic Map Programming Model
Reduced Developer Learning Curve
Application Portability
Enable Library Development
Relationship to TMQL
Requirements of a Topic Map Programming Model
Simplicity
Practicality
Additional Constraints
Architecture Proposals
API-1: DOM Extension Architecture
API-2: Graph-based Architecture
API-3: XTM Conceptual Model Based Architecture
API-4: Modified Conceptual Model-Based Architecture
API Analysis
XTM event-based model
Conclusions
Comparison of the APIs
Future Work
Acknowledgements

Abstract

Topic maps provide a standard data model for describing complex, interconnected information. ISO/IEC 13250:2000 [ISO] and XTM 1.0 [XTMS] each provide a standard serialization of that data model using SGML and XML syntax. We believe that having addressed the problem of interchange, it is time for the topic map community to consider the development of a standard interface for programming. Developing a standard programming model has the benefits of flattening the learning curve for developers and protecting the investment that businesses make in bespoke topic map application development.

The first wave of topic map applications have been the 'engines' - software applications which are, in general, capable of parsing one or both of the standard serialization syntaxes and then exposing the parsed data via an Application Programming Interface (API) which may, or may not, directly reflect the conceptual model of topic maps. However, every implementation of a topic map engine has a different API reflecting the different approaches taken by the developers to representing the topic map data model. While the development of a one-size-fits-all API is probably an unachievable goal, we feel it is necessary for an API developed which is sufficiently fully-featured to meet the majority of topic map application development requirements and to provide a solid foundation for the development of more advanced APIs and topic map applications such as TMQL.

One of the primary factors in the rapid adoption of XML was the early development of a pair of standardized APIs for access to and manipulation of XML data, namely SAX and the DOM. Such standardized APIs have many benefits for developers including a reduced learning curve and the ability to move code from one parser implementation to another. It is our belief that in order to bring topic maps fully into the mainstream, a standardized API for access to and manipulation of topic map data structures is also required. This paper explores some possible forms of APIs for topic maps in an attempt to develop such a standard API which meets the requirements of maximum simplicity; maximum practicality and minimal size. We explore possible means of representation including the extension of the Document Object Model; an abstract graph-based representation and an object-oriented interface based on the XTM Conceptual Model.

Justification for Developing a Topic Map Programming Model

The topic map programming model embodies two major pieces of work - the development of a topic map data model; and the development of an object-oriented API for mainstream object-oriented languages such as Java, C++ and Python. In our opinion, it is the latter which is the important end-product of this process and it is that which we focus on in this paper. Our justification for focussing on the API is that it should be possible for any data model which supports a complete implementation of the topic map concepts to be mapped to a standard API which provides access to topic map constructs.

Standard APIs are in general a Good Thing. The benefits of a standard API are reduced developer learning curve, application portability and the enabling of widespread library development.

Reduced Developer Learning Curve

By introducing a standard API for topic maps, a developer can take her knowledge from one implementation of a topic map system to another without having to spend significant amounts of time learning a new API. Additionally, by standardising on a single API, training material and developer support communities need not be restricted to the vendors - this would make it possible for a much larger body of training material to be made available to the topic map "newbie". An extant example of this is the number of DOM API training courses which are available (A quick Google search lists 3,220 hits for the search string "DOM API Training")

Application Portability

For organisations making investment in developing topic map applications, the existence of a standard API provides a degree of protection for that investment. The current situation for topic map application developers is that moving between systems would require a complete rewrite of the customised code, so despite having invested in a portable format for information organisation and exchange, customers are still locked-in to a particular vendor's implementation by the APIs used to develop their bespoke applications or application extensions.

Enable Library Development

A standard API for topic map access and manipulation would also allow developers to create higher-level application development libraries and tools which are portable across all systems implementing that API. This could enable the development of high-level toolkits such as standard indexing and querying APIs, toolkits for topic map generation from meta data sources and topic map visualisation and navigation applications. Again, XML's SAX and DOM show the way, with applications ranging from transformation (XSLT) to content management applications (Cocoon and Zope) all being created on top of these standard access and manipulation APIs.

Standard APIs also enable easier integration of the technology within the wide-spread wide-spread consumer applications such as web browsers and operating systems.

Relationship to TMQL

Topic Map Query Language (TMQL) is a proposed work item for both ISO and XTM. TMQL will provide a standardised language for topic map query and update, similar in scope to that of SQL for relational database systems. There is overlap between the purpose of TMQL and that of a standard topic map API in that both are attempting to define a standard means of data access and manipulation. The current TMQL proposal defines operations on topic maps which return topic maps as their 'results set'. Such results sets would still require representation in a data model and APIs for accessing that data to be useful to a client application, in this way a standard topic map API would be a natural adjunct to TMQL providing a JDBC-like API for manipulating the results of a TMQL query.

The TMQL effory will also be producing a rigorous formal model of the topic map abstract data-type based on the topic maps Conceptual Model presented in the XTM 1.0 Specification [XTMS], and on a set of common operations which the topic map user community finds useful. At the time of writing, this is planned work for the TMQL group and no initial version was available on which to base a proposal for a programming model, however it will be important that any community effort to develop a standard programming model should pay close attention to this formal model as it is available.

Requirements of a Topic Map Programming Model

The requirements for the programming model are generated from the need to get a potentially very wide audience of developers up to speed with topic map technology as quickly as possible. For this reason, the core focus of our requirements are in two areas: Simplicity and Practicality.

Simplicity

The topic map programming model should be simple to explain and easy to learn. This means that the number of classes and operations should ideally be kept low and that the API should only focus on covering the topic map model itself. Complex extensions such as query languages and inference rules are definitely out of scope. It was more difficult to draw the line in the grey area of other, more common, extensions such as transitive association types and type hierarchies. However, we believe that these common extensions should be easily implemented on top of a core API which is focused only on the representation of the topic map model and so have no place in the API we propose to develop.

To further simplify the programming model, it was felt to be important that the constructs in the programming model should be closely parallel to those constructs found in the syntax. By making parallel structures in the programming model to those which exist in the topic map syntax, we enable programmers who have looked through the XTM specification to quickly get a 'feel' for the API.

Practicality

The API could easily support common topic map operations such as import and export of a serialised form of the topic map; association traversal and direct manipulation of topic map constructs. Providing additional practicality such as merging, indexing and filtering operations within this core API would sacrifice the simplicity principle. By providing a solid foundation, the common extensions should be easy to implement in a layered manner. The precedence for this can be clearly seen in the XML family of standards, with implementations of linking (XLink), path expressions (XPath) and more complex operations such as transformation (XSLT) all being typically implemented upon the DOM abstraction of the XML document.

The programming model must support all those topic map operations which are the most fundamental parts of topic map applications. This means that the programming model must provide the means to achieve the following:

  • Topic map parsing - APIs for reading from a serialized syntax into some internal data structure.

  • Topic map manipulation - APIs for manipulating the constructs of topic maps (topics, associations, occurrences etc.)

  • Topic map serialization - the reverse of parsing, these are APIs for creating a serialized version of the internal data structures used to represent the topic map.

The parsing and serialization of topic maps can be considered "utility" functions which are useful additions to a core topic map API. Serialization is merely the process of walking a topic map data structure using the API and outputting the appropriate XML syntax to represent the constructs found there. Parsing is the reverse, although the parsing process must also include the processing of the raw parsed data into a topic map in accordance with the requirements of the XTM Specification [XTMS].

Such an programming model should also provide a solid foundation on which ancillary standards and systems may be implemented. The precedence for this is clearly set by the XML family of standards - systems which implement XSLT and XPath are typically built upon a DOM implementation. The ancillary standards for topic maps are not yet fully defined, but would seem likely to include a query language (TMQL).

Additional Constraints

To set some boundaries on what should be included in a the core topic map programming model, it was decided to set boundaries on the level of functionality that the programming model would be expected to provide. For this reason, the models developed here do not directly address:

  • Representation of the mergeMap construct and the topic map merging process

  • Representation of templating mechanisms.

  • Indexing of topic map constructs.

  • Support for transitive association types or type hierarchies.

  • Maintenance of lexical information about the source XTM document

In many respects, these constraints are common requirements for a topic map programming model, regardless of its form, and this paper is concentrating on the potential differences between various forms of programming model, rather than on the commonalities between those forms. However, the constraints do serve as a useful guide in determining the practicality of a given model as these are all higher-level functions which it must be possible to build on the core programming model.

The development and analysis work presented here also does not delve into the extremely important API issues relating to object deletion, referential integrity, transaction support, duplicate suppression and the handling of merges which take place as a result of manipulating API objects directly. Again, these are all issues which must be addressed regardless of the API used.

Architecture Proposals

In this section we present each of the architectures considered as candidates for a topic map programming model.

API-1: DOM Extension Architecture

The XML Document Object Model (DOM) [DOM] provides a simple abstraction of an XML document as a tree (or collection of trees) consisting of nodes which represent the XML document markup and content. It is popular with developers because of its simplicity - especially for a developer already familiar with the concepts of XML markup - and because of its functionality - for example, being able to locate the DOM node which has a specific ID attribute value, or locating the set of nodes representing elements with a specific tag name.

API-1 is a topic map programming API developed as an extension of the DOM, similar to the HTML extension which is part of the DOM specification. The DOM Node class provides basic node hierarchy operations, such as insert and deletion of nodes, managing a child list of a node and support for both depth-first and breadth-first traversal. The DOM Element class, which is derived from the DOM Node class. The topic map extension provides a set of additional classes, all derived from the DOM Element class. Most of the classes provide no additional functionality other than a labelling function (returning a distinct value for the nodeType property) with the exception of TopicMap, AddressableSubject and NonAddressableSubject which return the URI of the base address of the topic map, the addressable subject or the subject indicator respectively; and TopicReference which returns the type of reference (with distinct values for references made directly to the topic and references made via a subject indicator reference) and the locator used in the reference. The UML diagram in Figure 1. shows the class structure of this architecture. The DOM classes of Node, Document and Element are shown in this diagram along with some of their public functions to give a feel for the range of operations such an implementation would make available to the programmer.

Developing The API

The principal design issue in developing this architecture is the handling of topic references within the constraints of a tree-based architecture. The data model which we are attempting to represent is not a tree, but a graph of interconnected topics. A tree cannot be used to represent a graph (in the general case) without a construct for cross-linking between tree nodes which are not in a direct parent-child relationship. We provide this construct in the form of a TopicReference Node which is a surrogate for a Topic Node. The TopicReferenceNode must provide a function to resolve the reference to a TopicNode (which will be a direct child of the TopicMap Node).

Another decision, common to development of all the architectures is the representation of syntactic short-cut constructs such as the <instanceOf> element (which is a short-cut for creating a type-instance association between two topics) and names (which are privileged forms of occurrence). For this representation, we choose to match the form of the DOM extensions such as the HTML DOM and directly represent the syntax. This means that type/association equivalencies and other syntactic short-cuts are directly reflected in the model.

The third issue regards the representation of subjects which are not directly represented by topics in the topic map. Such subjects may be referenced from <subjectIndicatorRef> elements in parent elements such as <subjectIdentity>, <instanceOf> and <member>. For this model, we elect to reify all subjects referenced in the topic map. This means that when a <subjectIndicatorRef> is imported into this model, if its parent is any element other than a <subjectIdentity> element, it is represented by a new Topic Node with no children and a single value in its subjectIndicatorsList property (the value of the <subjectIndicatorRef>'s -href- attribute).

The following table and UML diagram illustrate the form of the proposed DOM extension programming model. The table shows the proposed node types for the DOM extension, with an indication of the expected containment hierarchy (the expected parent of an instance of the node type) and a mapping to the XTM element that the node type represents. [1] The UML diagram shows that the classes representing topic map constructs include a set of convenience attributes in each class. These attributes would almost certainly be supplemented by convenience functions if the API were to be fully developed further.

Node Type (Type in the DOM)Node parentRepresents
TopicMap (Element)NonetopicMap
Type (Element)Topic, Occurrence, Association, MemberinstanceOf and roleSpec
Topic (Element)TopicMaptopic
SubjectIdentity (Element)TopicsubjectIdentity
AddressableSubjectSubjectIdentitysubjectIdentity/resourceRef
NonAddressableSubjectSubjectIdentitysubjectIdentity/subjectIndicatorRef
BaseName (Element)TopicbaseName
Occurrence (Element)Topicoccurrence
Scope (Element)Association, BaseName, Occurrencescope
Name (Element)BaseName, VariantNamebaseNameString, variantName/resourceData
Variant (Element)BaseName, Variantvariant
Parameters (Element)Variantparameters
Occurrence (Element)Topicoccurrence
Reference (Element)Occurrence, VariantresourceRef
Association (Element)TopicMapassociation
Member (Element)Associationmembers
TopicReference (Element)Member, Scope, ParameterstopicRef, subjectIndicatorRef

Figure 1. UML for API-1

API Analysis

API-1 offers complete coverage of the XTM syntax, with a class for each of the elements defined in the XTM DTD. As the API is node-based, most of the classes are provided for tagging requirements only. While it would be possible to remove many of the classes shown in Figure 1., these classes do provide the essential hook for extensibility and the development of more complete APIs on a consistent base. Including the DOM classes of Document, Node and Element required to represent a topic map, the API consists of 19 classes and at least 16 functions (more functions are defined for the DOM classes than are shown in the diagram, but these 16 are the minimum needed to traverse and manipulate the topic map).

Figure 2. shows a simple topic map represented in the programming model of API-1. The associations shown in red between TopicRef objects and Topic objects are generated as a result of evaluating the TopicRef to the Topic it references. The topic map being represented by the data structure shown in this diagram consists of a single association (assoc) between two topics (topic1 and topic2), both of which have a single base name in the unconstrained scope and one of which has an occurrence. The association and the roles of the association are typed by published subjects which are indicated by the reifying topics (rt1, rt2 and at1). This API requires a total of 27 objects to represent the topic map. The large number of programming constructs is due to the closeness of API-1 to the, somewhat verbose, XTM syntax, requiring that <topicRef> and <instanceOf> elements in the XTM syntax of the topic map have matching constructs in the programming model.

Figure 2. API-1 representation of a sub-type/super-type relationship between two named topics (one with an occurrence)

A more serious criticism of this model is that the ordered-tree model of the DOM is not a suitable model for representing a topic map. The DOM regards node order as important, which is not the case for a topic map. The DOM NodeLists are ordered lists, not sets, and so do not directly support operations such as duplicate suppression which are required for a complete implementation of the topic map model. Most seriously of all, the DOM provides no explict support for making references between nodes which means that either the extension API must define such support (which could be regarded as breaking the DOM model to support topic maps) or else references can only be supported through the manipulation of DOM Attribute Node values. The API presented here does not create explicit references between nodes, but instead relies on the run-time resolution of attribute values to resolve TopicRef nodes to their referenced Topic node. This form of syntax-based reference is awkward for the developer to create and maintaining integrity of references would be more difficult to implement than in a system which uses direct object-to-object references.

API-2: Graph-based Architecture

The graph-based topic map API architecture is developed from the 4th December draft of the XTM Processing Model [XTMP]. The XTM Processing Model document defines the processing of a topic map document into a graph structure which consists of just three types of nodes and four types of connecting arc between nodes. The nodes represent the basic elements of the abstract data model of topic maps: topics, associations and scopes. The connecting arcs are:

  • Scope Participant Arcs which connect a scope node to a topic node which defines one of the subjects in that scope.

  • Association Scope Arcs which connect an association node to the scope node which defines the context within which the association is considered to be valid.

  • Association Member Arcs which connect an association node to a topic node which plays a role in the association. These nodes are optionally labelled by a further topic node which characterises the role being played in the association.

  • Association Template Arcs which connect an association node to a topic node which defines a template for the association, that constrains the roles and the role players which may be used in the association.

The processing model also defines the concept of the Subject Identity Point which is a binding point where all topics with the same subject are merged. The concept of merging is central to topic maps and subject and subject identity are pivotal to this concept. A subject may be represented in two distinct ways - by reference to the addressable object which is the subject (the subject constituting resource) or by reference to an addressable object which describes the subject (the subject indicator resource) . A subject identity point is shared by all subject indicator resources which describe the same subject (and the subject constituting resource which is the subject, if such a resource exists).

Developing the API

In developing the XTMP model into an API, we have elected to create classes representing each of the node types and class associations to represent the arcs. However, this approach makes it impossible to capture the 'label' property of an Association Member Arc, and so the Association Member Arc has to be promoted to a first-class object and give a property to represent this label.

The second decision to be made regards the representation of the concept of the Subject Identity Point. A Subject Identity Point consists of zero or one subject constituting resources and zero or more subject indicator resources. The XTM Processing Model defines rules which require that in a consistent topic map there be only one topic node for each Subject Identity Point. This one-to-one relationship means that the properties of a Subject Identity Point (the subject constituting and subject indicating resources) may be expressed as properties of the TNode class. Doing this does not prevent the API from representing topic maps which are not consistent as any such topic map would simply include more than one TNode with the same value for either subjectIndicatingResource subjectConstituting resource.

The third issue regards the representation of the class-instance relationship. The XTMP model uses a templating mechanism in order to define the core association type which are required to express class-instance relationships, topic-occurrence relationships and other fundamental relationships of the topic map data model. For this reason, we need to break with our previously imposed constraint against inclusion of templating mechanisms in the programming model and include a template property for an ANode which references the TNode which defines the association template.

Finally an object is required to represent the entire graph with all of its nodes and arcs. This is provided by the TopicMap class which simply serves as a container of topic nodes, scope nodes and association nodes.

The UML diagram of this API is shown in Figure 3.. It should be noted that this proposal is very liberal in allowing almost all references between classes to be traversed bidirectionally. It is arguable that bidirectional traversal of properties should be left out of the core programming model subsystem, delegating these reverse look-ups to an indexing subsystem built on top of the core model. However the bidirectional nature of these relationships is part of the essence of the topic map, especially when viewing that topic map as a graph.

Figure 3. UML Diagram of API-2

Model Analysis

The programming model developed for API-2 is extremely minimal. It certainly has the desired property of being small in size, just 6 classes and only 27 functions , but this simplicity is achieved at a cost to practicality as shown by the collaboration diagram in Figure 4.. The diagram shows a similar simple topic map to that shown in Figure 2. for API-1, with the exception that to maintain some clarity in the diagram, the occurrence of one of the topics is not shown. Without this occurrence, 31 API objects are required to express the topic map (this total includes the TopicMap object which is not shown in the diagram). With the addition of the occurrence, an extra 6 objects would be required to express the topic-occurrence association template and an extra 5 to represent the topic-occurrence association, bringing the total number of objects required to 42. In practical use, the API complicates the job of the programmer who must be familiar with the XTM Processing Model as well as the XTM Syntax specification to be able to create and manipulate topic maps.

On the positive side, API-2 treats all syntactic constructs, with the exception of the <scope> and <member> and <subjectIdentity> constructs, as TNodes - so reification of topic map constructs other than <topics>s is easily implemented and API-2 also includes full support for the templating mechanism described by the XTM Processing Model, a feature which is not an integral part of any of the other APIs developed here.

Figure 4. API-2 representation of a sub-type/super-type relationship between two named topics (one with an occurrence)

API-3: XTM Conceptual Model Based Architecture

The XTM 1.0 Specification [XTMS] includes an annex which describes the conceptual model implemented by the specification. As a record of 'what was in the minds' of the group which produced the XTM Specification, this document provides important input into the process of developing a programming model. To examine if this model is sufficient for a programming model, we present here a programming model developed upon the XTM Conceptual Model.

The programming model presented as a pair of UML diagrams below is derived directly from the UML diagrams presented in the XTM 1.0 Specification annex. The first diagram shows the top-level class hierarchy of the API. The class Subject provides the explicit representation of the reification function of topics its relationship with the class Class, is used to represent the various type-instance relationships which exist in the XTM model and represented syntactically by the <instanceOf> and <roleSpec> elements.

Figure 5. UML Class Diagram of API-3 - Upper Hierarchy

Figure 6. UML Class Diagram of API-3 - Main Classes

Model Analysis

While the Conceptual Model clearly defines the relationship between topics and real world objects (by the use of Subject, NonAddressableSubject and Resource classes), the additional constructs required to do so add three extra classes, complicating the API for developers and causing the API-3 to diverge from the XTM syntax. In fact despite consisting of 12 classes and 36 functions, API-3 suffers from a lack of completeness with respect to coverage of the syntax as the <variantName> syntactic construct is not represented. To represent the <variantName> construct, it is necessary to consider a <variantName> as an occurrence of a topic with a fixed role type and with a scope defined as a union of the subjects referenced from the <parameter> elements of its ancestor <variant> elements and the <scope> element of its ancestor <baseName> element. From a conceptual perspective, this is clean as the two forms (a <variantName> and an <occurrence> of a specific type) are equivalent and it is redundant to include both forms in the model. However, from a programmer's perspective, the need to iterate or search through all of the occurrences of a topic to locate and manipulate its variant names and the lack of the ability to create a nested hierarchy of variant names as provided by the XTM syntax are weaknesses in the programming model.

The way in which API-3 expresses the class-instance relationship is also divergent from the XTM syntax. API-3 allows any Subject instance to be in a class instance relationship with zero or more Class instances (each of which is a NonAddressableResource). This is an accurate reflection of the underlying model of topic maps. However, the mechanism provided by the XTM syntax for defining class-instance relationships is to reify the Subject and the Classes to Topics and to define a class-instance relationship between the reifying topics. This syntactic mechanism should be more directly supported by a programming model to enable simpler import and export of XTM syntax data to/from the programming model, and to improve the mapping between the syntax and the programming model for developers already familiarity with the XTM syntax and the mechanism of reification. That said, Figure 7. shows how much simpler this makes the collection of objects required to express a sub-type/super-type relationship. The core concepts of sub-type, super-type and the sub-type/super-type association are represented as three Class objects (which are derived from Subject and so may have 0 or more SubjectIndicators), without the need for creating topics to reify the subjects. This means that only 16 objects are required to express the topic map (including the TopicMap object which is not shown in the diagram for clarity).

Figure 7. API-3 representation of a sub-type/super-type relationship between two named topics (one with an occurrence)

API-4: Modified Conceptual Model-Based Architecture

API-3 shows great promise as a base from which to develop a practical topic map programming model. To refine the model, we need to reduce the complexity of the representation of the relationship between a topic and a subject and we need to add the necessary classes to enable the syntactic constructs of <variant> and <variantName> to be represented more explicitly in the programming model.

Developing The API

To simplify the upper hierarchy of API-3, we choose to instead implement the Topic/Locator relationship from API-2. We remove Class, Subject, Resource and NonAddressableResource, and add the Locator class, making it the type for the properties of subject and subjectDescriptor (changed from subjectIndicatingResource and subjectConstitutingResource to more closely match the current state of the XTM Specification).

Having removed Subject and Class from the class hierachy of the API, we must replace the, now removed, class-instance association for at least Topic, Association and TopicCharacteristic. Any class-instance relationship in a topic map may be represented as an association between a topic reifying the typed node and the topic or topics which reify the class of the node. However, the most common syntactic form of representation is the use of the <instanceOf> child element for the typed node, which is equivalent to defining an association between a topic reifying the typed node and the topic or topics reifying the class of that node. Thus the mechanism for the representation of type-instance associations in the programming model are directly related to the mechanism used for representing the reification of constructs of the topic map (such as associations, members, occurrences and so on). The means of representing reification of topic map constructs within a programming model may be broadly divided into the "implicit reification" of topic map constructs and the "explicit reification" of the constructs. Implicit reification requires that the programming model's class hierarchy acknowledges that all topic map constructs may be reified and so may exhibit any of the properties of a Topic. Typically, this would be done by making the Topic class a super-class of all other classes representing topic map constructs. Implicit reification is already a part of API-2. Explicit reification requires that the programmer control reification through the creation of a Topic object which regards another topic map construct as its subject indicator - typically an explicit reification programming model provides no direct support for reification beyond a means to uniquely and persistently identify any topic map construct. API-1 and API-3 are examples of explicit reification programming models.

Figure 8. Modified API-3 Class hierarchy with Implicit Refication

Figure 9. Modified API-3 Class hierarchy with Explicit Reficiation

The implicit reification programming model has the advantage that a programmer is not required to perform any operations to establish the reification of a topic map construct. If she wants to give an Occurrence object a BaseName, this may be achieved simply through the inherited functions of the Topic class. The explicit reification model has the advantage of being more closely related to the XTM syntax, however, in this case, maintaining a close relationship with the syntax is exposing the developer to one of the weaknesses of the serialized syntactical form, so in this case a divergence from the API may be justified in that it delivers far greater functionality and there is a very clear relationship between the syntactical form of reification and its representation in the programming model. To support implicit reification of all topic map constructs, we choose to make Topic a super-type of Association and TopicCharacteristic.

In order to support the class-instance association in its commonly-used syntactic short-cut form (using <instanceOf> to represent a class-instance association in the unconstrained scope), we define the classes property for a Topic as a collection of Topic objects. The classes property represents only the class-instance associations in the unconstrained scope. To get all types in a particular scope, we must define a helper operation getTypes(Scope s) which returns all of the Topic objects which play the role of class in a class-instance association in which the current topic plays the instance role, where the association member characteristics of each role are in the scope s. It is not necssary to provide an equivalent setTypes() function as this operation may be implemented by creating an Association object which links the Topic object and its type. As we have already solved reification by deriving all other topic map constructs from Topic, the solution for Topic applies equally to all other topic map constructs represented in the API.

To complete the coverage of the XTM syntax in API-4, two new classes are added, Variant and VariantName. Variant is derived from TopicCharacteristic and VariantName from TopicNode. As both BaseName and Variant share the property of a list of child Variants, the API is extended to introduce a common super-class, VariantContainer. The resulting API class diagram is shown in Figure 10.

Figure 10. API-4 final class diagram

API Analysis

API-4 maintains a very close mapping to the XTM syntax. All of the syntactic constructs can be mapped to a class or property in the API which in most cases shares the name of the syntactic construct. The only construct without a direct mapping is the <subjectIdentity>, the content of which is represented by the subjectIndicators and subject properties of the Topic class. This complete coverage is costly in terms of additional classes and functions, bringing the total size of this API to 11 classes and 32 functions. Much of the complexity of the API is contained in the Topic class (still only 9 functions) which is the super-type for most of the other classes. Figure 11. shows that for our simple example, API-4 proves no more complex than API-3, requiring 16 objects (including the TopicMap object which is not shown) to represent the topic map.

Figure 11. API-4 representation of a sub-type/super-type relationship between two named topics (one with an occurrence)

XTM event-based model

This API differs from the others proposed in this paper in that it does not represent the parsed topic map as an object model consisting of interconnected objects that represent the entire topic map all at once. Instead, it takes an event-based view of topic maps and makes the topic map available to the application through a sequence of method calls. This means that unless the application builds a structure representing the topic map itself it will never have the entire topic map available at the same time.

The difference between the other models presented in this paper and the event-based model is analogous to the difference between the DOM and the Simple API for XML (SAX) [SAX]. And as experience with XML has shown, it can be valuable to have two standard APIs that take different approaches to presenting a data model.

XTM, however, differs from XML in that most of the information in an XTM document are references, which means that the event-based view of an XTM document is not a complete solution for the parsing and processing of an XTM file. It can still be useful, however, in encapsulating much of the hard work in interpreting an XTM document and allowing several different kinds of object structures to be built using this API. Principal uses for this kind of API are as a consistent interface to a topic map generating data source (which may or may not be an XML file) which is to be used as the input to a topic map processor; and as a useful interface for enumerating the contents of a processed topic map for any purposes which require a traversal of the entire topic map data structure.

The following UML diagram shows one potential implementation of an XTM event-based API. The entire API can be encapsulated in a single 'TopicMapHandler' interface which defines notification functions for the start and end of every XTM syntactic construct. A more complete API might also include an error handling interface similar to that defined for the SAX API.

Figure 12. UML Diagram of an XTM Event-Based API

This interface shown in Figure 12. requires all references from <instanceOf>, <scope>, <subjectIdentity>, <roleSpec> and <parameters> elements to be made using the Ref data type. The Ref data-type encapsulates a referenced URI and a type which indicates whether the URI references a topic, a subject indicator or an addressable subject. Nesting of topic map constructs (e.g. a <baseName> in a <topic>) is handled by calling a startXXX() function before the endtXXX() function of the containing construct is called, so the order of calls for a topic map containing a single named topic would be: startTopicMap(), startTopic(), startBaseName(), endBaseName(), endTopic(), endTopicMap().

The focus and application of the event-based API is completely different from that of the other APIs presented and so no analysis and comparison with the other APIs is presented here. However, there is a need for a event-based API for all of the applications described in this section and if such an API is closely based on the XTM syntax, it provides a practical interface for the serialization and parsing of XTM files, two of the requirements of a standard programming model. Thus an event-based API is a necessary adjunct to all of the other APIs presented here.

Conclusions

The one thing which is striking about the topic map paradigm is the number of different ways in which the same underlying conceptual model can be represented both syntactically and programmatically. These differences do not (we hope) reflect a difference in understanding of the topic map model, but rather stem from the desire to make "short-cuts" for various topic map model constructs (the issue of reification being a good example, as is the choice to represent a class-instance relationship with a short-cut element syntax as opposed to the 'pure' form of an association between two topics). A topic map programming model is another area where the same trade-off between simplicity and practicality must be made, but the requirements are subtly different from those of a programming model and it will probably prove to be the case that the programming model and the syntax model will never be fully convergent for this reason.

Comparison of the APIs

The following table shows a side-by-side comparison of the four APIs developed here. The API size index is computed as (number of functions + (number of classes x 2)). Constructor and destructors are not included in the function count. The representation complexity index is computed as the number of distinct objects required to represent our sample topic map of two named topics, one with a single occurrence, in a sub-type/super-type relationship. The syntax convergence index is subjectively defined on a rating of 0 (poor convergence) to 3 (perfect convergence).

Table 1. Comparison of APIs

APISize IndexRepresentation ComplexitySyntax Convergence
API-154272.5
API-245420
API-361161
API-454162

A purely statistical comparison such as this is somewhat inconclusive, especially with such a limited set of use cases to determine the Representation Complexity index, and in fact it turns out that the real reasons for choosing one API over another are not so much the size and representational complexity issues as they are an issue of convergence with XTM syntax and having a basic data model suitable for the representaiton of a topic map.

As explained in the analysis for that model, API-1 is not really suitable for representing a topic map as it is based on an programming model originally developed for the representation of an ordered tree data structure rather than the graph structure of a topic map. The additional mechanisms which would be required to support the set operations and node reference operations are beyond the scope of the DOM specification and would make a DOM extension for topic maps considerably more heavy-weight than the HTML extension. For these reasons, API-1 is felt to be a weak candidate.

API-2 captures the graph nature of the topic map and encapsulates the complete model in a very small API. However, the lack of constructs for the direct representation and manipulation of common topic map objects such as occurrences and baseNames is a serious drawback. Requiring that a programmer be familiar with both the syntax and a processing model makes the learning curve for this API steeper than any of the others presented here. This issue alone makes API-2 unattractive as a standard programming model, although its simplicity makes it an interesting candidate for a data model for topic map storage. It was felt that if API-2 were to be implemented, there would be a need for a higher-level API layered on top providing direct access to an manipulation of topic map constructs other than topic, association and scope, and given that this was the case, the final API might well look more like API-3 or API-4.

The difference between API-3 and API-4 is relatively minimal, the treatment of class-instance association being the major departure. The treatment in API-3, making a Class a first-class object is a divergence from the XTM syntax and given that a class can only be represented by a topic in the syntax, it would seem that Class is unecessary in a programming model, so in this respect, API-4 is to be preferred to API-3. API-4 is also more complete in its coverage of the syntax, providing a class to represent the <variant> and <variantName> constructs.

It is acknowledged that the comparisons presented here are limited in scope and it would be hard, on the basis of these results alone to defend the selection of any one of the presented programming models over the others. However, from this preliminary work it would appear that the modified conceptual model, with implicit reification as presented in API-4 may be the most effective of the models.

Future Work

The work presented here is acknowledged to be light on analysis. Much more rigorous assessment of the practicality of the different programming models needs to be undertaken, both in terms of more complex representation examples and in source code measurements for implementation of topic map processing functions under the different models. In addition, other formal models of the topic map paradigm are under development: Topicmaps.org is in the process of developing a standard topic map processing model; and the TMQL effort will also involve the development of a formal model of the topic map abstract data-type based on the conceptual model and on a set of common operations which the topic map user community finds useful. Both of these efforts must be considered in any further work towards the development of a standard topic map API.

Acknowledgements

I would like to thank the following people for their input into and comments on this paper during its development: Lars Marius Garshol, Ann M. Wrightson, Steve Pepper. I would also like to thank the contributors to and authors of the different programming models and topic map data models which have been used as the starting point for developing the APIs presented here. I would also like to take this opportunity to state that any criticisms made of the APIs in this paper should not be construed as a reflection of my opinion on the usefulness of any of the source data/programming models for the purpose for which they were intended.

Bibliography

ISO. ISO/IEC 13250 Topic Maps. ISO/IEC JTC1 SC34.

XTMS. XML Topic Maps (XTM) 1.0 ( http://www.topicmaps.org/xtm/1.0/). TopicMaps.Org. 10th February 2001.

XTMP. XML Topic Maps (XTM) Processing Model 1.0 ( http://www.topicmaps.org/xtm/1.0/xtmp1-20001204.html). TopicMaps.Org. 4th December 2000.

DOM. Document Object Model (DOM) Level 1 Specification(Second Edition) (http://www.w3.org/TR/2000/WD-DOM-Level-1-20000929/). World-Wide Web Consortium. 29th September 2000.

SAX. Simple API for XML (http://www.megginson.com/SAX/Java/index.html). XML-DEV. 5th May 2000.



[1] Where a node type represents an element in context, the context is shown using XPath slash-separated path syntax.