Suggested schema change
Multiple collection types should be able to be recorded for a collection. (Similar issue for services is canvassed in Multiple service types).Problem this suggestion addresses
Only one collection type (currently "collection", "dataset", "registry", "repository", and "catalogueOrIndex" are suggested) can be recorded. This is a problem because these are not mutually exclusive categories, therefore more than one valid choice could be made, resulting in inconsistent metadata. A given collection may potentially be described using all of these types.
The core types ("collection" and "dataset") are sufficient for distinguishing structured data from other data. The other types are descriptive of the functions of different kinds of aggregations and can be regarded more as descriptive keywords, of value primarily for discovery and search.
One or more collection types should be permitted.
Also for discussion: what collection types need to be specified? who would use this information for what purpose?Identified by
ANDS Staff (Sally Goodenough)RIF-CS schema components affected
Collection typesImpact on content providers
No impact; the change would allow but not require multiple types to be provided.Pros
Over time this will result in more meaningful metadata to support searching.Cons
Technical options Option A:
a new field is added to <collection>, “secondaryType”, of cardinality 0..*, and with values drawn from the Service Type Vocabulary.
<xsd:element name="secondaryType" minOccurs="0" maxOccurs="unbounded" type="secondaryTypeType">
Type(s) relevant to the collection.
<xsd:extension base="xsd:string"> </xsd:simpleContent>
<xsd:documentation>A value taken from a controlled vocabulary indicating the secondary type of the collection.</xsd:documentation> </xsd:annotation>
Business rules would need to be defined to determine how multiple types would be used in faceted display groupings relating to collection type resulting from a search in RDA.
Changes would be required to “Register My Data” to allow multiple selects from the “Type” drop-down list.Pros
: No changes to existing feeds and no legacy issuesCons:
Implies a hierarchy of typing that may not suit all providers
Issues with displaying faceted search results in RDA based on subtypingOption B
: collection records are still constrained to have only one type; a separate collection record is contributed for each collection type of the record. The different collection records are identified as referring to the same collection instance through a shared identifier or location.Pros:
No change to existing schema or feeds.Cons:
Difficulty in displaying these collections accurately in RDA based on the facet of typeOption C:
Enable multiple types to be defined for a collection by allowing space delimited values to be passed as a single string to the type attribute. This solution requires no changes to the rif-cs schema definition as the type attribute is of type string.
Changes would be required to “Register My Data” to allow multiple selects from the “Type” drop-down list.
Business rules would need to be defined to determine how multiple types would be used in faceted display groupings relating to collection type resulting from a search in RDA.Pros:
This option has no legacy data problems or migration and transition arrangements to consider. Cons:
Some systems will strongly prefer to maintain a primary service type. (In VITRO, for example, the plan is to assign each service type to a subclass of Service.)
More difficult to treat service type as a controlled vocabulary: validation will involve tokenising the type string."