The Data Catalog Collaboration Project-Basic Science (DCCP-BS) is a working group with the objective of creating best practices for curating basic science-related records into the DCCP catalogs. Members of this DCCP subgroup include subject specialists, catalogers, and data/metadata librarians from the Universities of Pittsburgh, Maryland-Baltimore, and North Carolina.
The group formed after the realization that contributors from the various DCCP institutions were disparately interpreting field definitions of existing metadata entities when curating basic science-related data catalog entries. Upon reflection this is not surprising, as the original DCCP metadata schema focused on human subject datasets and didn’t sufficiently capture specific information affiliated with animal research and basic science datasets.
The data catalog metadata schema used by the DCCP was created at NYU by analyzing and comparing existing metadata schemas that focus on indexing research data, specifically:
Elements were selected from these schemas based on their relevance and applicability to the datasets described within the data catalog. One of our main goals was to make sure that our metadata could be transferred over to future national data discovery systems from the NIH and others, so that when these systems become available metadata transfer would be seamless. The existing DCCP metadata schema and documentation is available here. Since its creation, DCCP members have slowly been adapting and modifying the metadata to accommodate new types of datasets.
The DCCP-BS has begun by focusing on issues related to GEO records and will address other types of records as necessary. The GEO accession record example shown here highlights the metadata fields under discussion by the DCCP-BS.
Data Type -- Defined as “the type of data collected or created.” Current category list is limited to nine options, including Genetic/Genomic; DCCP-BS suggestion: addition of Genetic/Genomic sub-categories to capture more specific data types such as Microarray or Sequence Reads, as well as the flexibility to add more sub-categories as necessary.
Subject of Study -- Defined as “the (strain of the) species of the subject of the study.” Current metadata subfields are limited to Species and Strain; DCCP-BS suggestion: addition of Tissue/Cell Line as a subfield.
Equipment -- Defined as “the name, URL, and contextual information about equipment used to collect or create data.” Current data entry is free text; DCCP-BS suggestion: standardized equipment names/URLs/descriptions to be shared between DCCP institutions to ease workload and facilitate consistency.
Software -- Defined as “the name, URL, and contextual information about software used to collect, create, or analyze data.” Current data entry is free text; DCCP-BS suggestion: standardized software names/URLs/descriptions to be shared between DCCP institutions to ease workload and facilitate consistency, as well as the addition of a subfield for software Version.
Study Type -- Defined as “the type of study used to collect the data.” Current category list is limited to Observational and Interventional; DCCP-BS suggestion: addition of a category to capture “bench research” (e.g., Empirical) in addition to the current clinically-defined options.
All of the group’s suggestions on new metadata elements and their intended use will be reviewed by DCCP members and then added to the documentation if approved. This effort serves to improve metadata documentation for basic science datasets and will continue to evolve as we engage with researchers striving to make their datasets discoverable.
Have questions or comments? Leave them below or send to the DCCP-BS coordinator, Carrie Iwema (firstname.lastname@example.org).