Stories

Librarians practicing what we preach: Making our Library Research Discoverable through the Pitt Data Catalog

Academic and hospital libraries that offer data services often provide guidance and training on data sharing and reuse, covering topics such as:

  • funder/journal requirements for sharing

  • the benefits of data sharing such as enhanced transparency and reproducibility and the potential to find new collaborators

  • proper documentation to accompany a dataset

  • how to identify appropriate data repositories and evaluate them to determine the most suitable

  • locating existing datasets for reuse

In 2017, we at the University of Pittsburgh, Health Sciences Library System, decided it was time that we “practice what we preach” and over the next two years deposited four datasets from our own research into the repository figshare. Our initial goals were to:

  • understand and document the data deposit workflow in order to assist researchers

  • facilitate requests from colleagues to share our data and survey instruments

  • make unpublished results discoverable

  • track the usage of our data

  • model best practices to researchers and librarian colleagues

Given the new data sharing policy for the Journal of the Medical Library Association that will go into effect October 2019, we believe this last goal is of particular importance.

As the University of Pittsburgh is one of the nine partners of the Data Catalog Collaboration Project (DCCP), in addition to depositing our datasets we also included a metadata record to each dataset within the Pitt Data Catalog. Available datasets to date:

These records increase the visibility of our data (one of the mission statements of the DCCP) and provide an additional access point.

This blog post is adapted from the MLA presentation: Ratajeski, M.A. and Iwema, C.L. (2019, May). Practicing What We Preach: Making Our Own Research Data Open Access. Lightning Talk presented at Medical Library Association Annual Conference, Chicago, IL.

DCCP at MLA '19: Check out our Slides, Notes, Posters and More!

It was an eventful week in Chicago for MLA ‘19!

While we wish everyone was able to make it to the conference, we know that isn’t always possible, so we have uploaded all of the slides, posters, and notes related to the DCCP and our work. Below, we have listed a description of each presentation, the slides or poster, and a person to contact if you have any questions.

The DCCP Information Session

Kevin Read presenting at the DCCP Information Session at MLA ‘19

Kevin Read presenting at the DCCP Information Session at MLA ‘19

  • Provided information about what it means to join the DCCP, implementing the Data Catalog, and how different institutions are using the catalog for their specific needs

  • Link to slides

  • Link to notes

  • Contact: Kevin Read, DCCP Project Lead: kevin.read@med.nyu.edu

Paper presentation: From Conception to Action: Elevating Library Projects through Collaboration between Librarians and Developers

  • Demonstrates how developers and librarians have worked together on the Data Catalog, as well as other library projects and provides tips on how to improve developer and librarian collaborations

  • Link to the slides

  • Contact: Ian Lamb, Solutions Developer, ian.lamb@nyulangone.org

Paper presentation: Developing Workflows to Facilitate the Sharing of Electronic Health Record Data

  • Discusses how NYU created a process to include Electronic Health Record (EHR) data in the NYU Data Catalog. Outlines the workflow and provides example records for EHR data in the NYU Data Catalog

  • Link to the slides

  • Contact: Nicole Contaxis, NYU Data Catalog Coordinator: nicole.contaxis@nyulangone.org

Paper presentation: Creating Institution Specific Resources on Data Transfer and Data Sharing

  • Illustrates how NYU supplements their work on the NYU Data Catalog with ongoing projects to help researchers transfer and share their data while still being in compliance with national regulation, funder and publisher requirements, and institutional policy

  • Link to the slides

  • Contact: Nicole Contaxis, NYU Data Catalog Coordinator: nicole.contaxis@nyulangone.org

Poster: A Multisite Collaboration to Improve Data Curation and Discovery in Academic Health Sciences Centers

dccp_general_poster.jpg

  • Provided information on what the Data Catalog Collaboration is, what our goals are, and ways that the Data Catalog is used at participating institutions

  • Contact: Kevin Read, DCCP Project Lead: Kevin.Read@med.nyu.edu

  • Link to the poster

Poster: Outreach Strategies and Researchers’ Motivations for Sharing Data through a Data Catalog

dccp_outreach_poster.jpg
  • Demonstrated why researchers share data through the Data Catalog as well as the outreach strategies employed at different institutions in the DCCP

  • Link to the poster

  • Contact: Melissa A. Ratajeski, Pitt Data Catalog Lead and Coordinator of Data Services at the University of Pittsburgh Health Sciences Library System, mar@pitt.edu

Poster: Using the PubMed Central Data Availability Search Filter and an Institutional Data Catalog to Make Data more Discoverable

PMC_MLA19_Poster.jpg
  • Illustrates how NYU is using the PubMed Central (PMC) Data Availability Search filter to add new datasets to the NYU Data Catalog. Includes the workflow and an example record

  • Link to the poster

  • Contact: Nicole Contaxis, NYU Data Catalog Coordinator, nicole.contaxis@nyulangone.org

Harlem Health Advocacy Partners and a Case Study in Data Re-Use

In the fall of this year, a Research and Data Librarian at the NYU Health Sciences Library, Fred LaPolla, was brought in to help teach an Intensive Research Practicum for Primary Care Residents. Dr. Colleen Gillespie, the Director of the Division of Education Quality in the Institute for Innovations in Medical Education and an Associate Professor in the Department of Medicine, led the practicum and wanted residents to ask a question of a secondary dataset, analyze the data, present the results, and write up a draft of a manuscript in 10 days. Prior to the beginning of the practicum, LaPolla pointed Dr. Gillespie to the NYU Data Catalog, and she was able to contact Dr. Lorna Thorpe about the Harlem Health Advocacy Partners Data Set.

“West 125th Street looking west from Seventh Avenue, Harlem, New York City” From the Schomburg Center for Research in Black Culture, Photographs, and Prints Division, The New York Public Library. 1946.

“West 125th Street looking west from Seventh Avenue, Harlem, New York City” From the Schomburg Center for Research in Black Culture, Photographs, and Prints Division, The New York Public Library. 1946.

The Harlem Health Advocacy Partners (HHAP) dataset was collected in five public housing developments in Harlem, New York City, where the chronic disease burden is high. Two rounds of data collection were performed: first, a telephone survey of 1,633 individuals and second, an interventional study of 370 individuals.The variables through these two rounds of data collection included age, gender, race/ethnicity, employment status, health insurance, self-reported general health, self-reported mental health, level of physical activity, smoker status, BMI, blood pressure, level of social connectedness, and specific health conditions including asthma, diabetes, hypertension, and depression. Previous articles published with this data include “A Place-Based Community Health Worker Program: Feasibility and Early Outcomes, New York City, 2015,” published in the American Journal of Preventive Medicine.

After completing the practicum, the residents worked together with Dr. Gillespie, Dr. Thorpe, and Mr. LaPolla to submit the manuscript for publication as co-authors. This case study in data re-use illustrates how the NYU Data Catalog fits into the data ecosystem, bridging connections between researchers and helping people locate relevant datasets. It also illustrates how important data re-use can be to young researchers and students, as it can provide access to data without the high cost of them having to collect it themselves, or pay for that data.