View unanswered posts | View active topics It is currently Wed Aug 15, 2018 9:54 am

All times are UTC + 10 hours [ DST ]




Reply to topic  [ 3 posts ] 
Clarify semantics for "collection" and "dataset" 
Author Message
ANDS Staff
User avatar

Joined: Thu Feb 10, 2011 11:18 am
Posts: 76
Suggested schema change
Clarify semantics for collection types "collection" and "dataset" by adopting the following expanded definitions:
“collection”: real-world objects together with the metadata required to analyse, interpret and re-use them. Examples are museum items or collections, biological samples, geological samples, analogue images, analogue audio recordings, mixed collections of data and other kinds of objects.
“dataset”: digital research datasets stored and managed within computer systems, and over which computation can occur, together with the metadata required to analyse, interpret and re-use them. Examples are datasets stored in relational databases, scientific observations stored in observing instruments or related systems, digital representations of images, digital audio recordings.

Problem this suggestion addresses

Difficulty in selecting appropriate collection type.

RIF-CS schema components affected
Collection types "collection" and "dataset".

Impact on content providers

No intention to alter existing content.

Pros

Clearer definition should allow easier type selection.

Cons
None identified.

Technical options
No system changes required.


Wed Jun 22, 2011 2:30 pm
Profile
ANDS Partners
User avatar

Joined: Sun Jun 06, 2010 8:15 pm
Posts: 5
I have a problem with the physical / digital split implied in the proposed definition. For example, I would call an aggregation of digitised images like the one at http://espace.library.uq.edu.au/collection/UQ:3521 a collection rather than a dataset (but maybe my interpretation is incorrect).

Another possibility is to adopt the Dublin Core Type Vocabulary definitions of these terms http://dublincore.org/documents/dcmi-type-vocabulary/

Collection: An aggregation of resources. A collection is described as a group; its parts may also be separately described.

Dataset: Data encoded in a defined structure. Examples include lists, tables, and databases. A dataset may be useful for direct machine processing.

These Dublin Core definitions overlap (a dataset is a collection of data/facts in a defined structure). Not sure if that is a problem or not.

Some questions I have for moving forward:
* What is a purpose of making the distinction?
* Are we aiming for non-overlapping definitions?

Nigel Ward


Tue Aug 09, 2011 11:20 am
Profile

Joined: Fri Oct 09, 2009 9:55 am
Posts: 57
I think Nigel's comments show the difficulty in distinguishing between the collection types.

As you say, "These Dublin Core definitions overlap (a dataset is a collection of data/facts in a defined structure). Not sure if that is a problem or not."

I think it is, see discussion at viewtopic.php?f=151&t=1511 where I have brought together discussion on this issue and the proposals for multiple collection types or multiple service types.

Questions raised:

* What is a purpose of making the distinction? — from the ANDS perspective, to use in facet displays in Research Data Australia. Multiple types will allow resources to be listed under more than one facet where appropriate, which will improve discovery. From the contributor perspective, types assist in description, but, I think, only if this is a non-painful decision, hence the suggestion to allow multiple types to be assigned.

* Are we aiming for non-overlapping definitions?—as discussed in my other post, yes.

If the categories overlap this decreases data quality.

You would not be happy with a survey that asked you to put your age into a category and then offered 20-30, 30-40, 40-50 and so on, because if you are, for example, 30, you don't know which category to choose. Well-constructed surveys would offer 20-29, 20-39 and so on.

Categories need to be mutually exclusive and comprehensive - a place for everything and everything in its (one and only possible) place.

If we can't do this, and I think for collection types it looks increasingly as if we will not be able to, then we should offer multiple collection types as an option instead, to cater for the overlap.

_________________
Sally Goodenough
Australian National Data Service
W. K. Hancock Building (#43)
The Australian National University
Canberra, ACT, 0200, AUSTRALIA
http://www.ands.org.au

E: sally.goodenough@ands.org.au
P:+612 6125 1176
M: 0466 579 618


Tue Aug 09, 2011 11:39 am
Profile
Display posts from previous:  Sort by  
Reply to topic   [ 3 posts ] 

All times are UTC + 10 hours [ DST ]


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group
Designed by ST Software for PTF.