General

Cataloging Software and 3D Models in the Pitt Data Catalog

When the Health Sciences Library System at the University of Pittsburgh launched the Pitt Data Catalog last spring, we wanted to provide researchers with flexible options for advertising and sharing their data. Now that the catalog has grown to describe more than 20 Pitt-created datasets, that flexibility has led our collection development in surprising and exciting directions. We have recently added our first records describing software code and 3D models, all created by Dr. Charles C. Horn.

Dr. Horn is an associate professor of medicine who studies gut-brain communication, particularly via the vagus nerve. His research makes use of several open-source software packages, which he demonstrates in his paper (with David M. Rosenberg) “Neurophysiological analytics for all! Free open-source software tools for documenting, analyzing, visualizing, and sharing using electronic notebooks.” Electrophysiological data used to demonstrate the software tools are available in the publication’s data supplements and on Github, where Dr. Horn has also uploaded scripts and a Docker image containing tools to make neurophysiological data analysis easier. Pitt Data Catalog records linking to those software/data packages include:

Dr. Horn has also designed several printable 3D models for experimental apparatuses in electrophysiology. The files shared through the NIH 3D Print Exchange include printable files in a variety of formats, photos, and assembly instructions. The 3D model records in the Pitt Data Catalog are:

From a collections standpoint, expanding our catalog to include software and 3D models is a logical consequence of our mission to collect Pitt-authored data, especially in computational fields where relatively few data products fit the definition of a traditional “dataset.” So far, the DCCP’s metadata schema has proven flexible enough to accommodate these new entity types, but we may pursue some software-specific modifications if the need arises. Shortly after Pitt published these records, NYU added their own first software record, so this may be the beginning of a collaboration-wide trend or a new working group, similar to the DCCP Basic Science Working Group.

Data in the News: ProPublica and the U.S. Health and Retirement Study

As the year winds down and we all recover from the busy holiday season, ProPublica published an article on the ways in which employers push older U.S. workers out of their jobs. The article, “If You’re Over 50, Chances Are The Decision to Leave a Job Won’t be Yours,” by Peter Gosselin uses data from the U.S. Health and Retirement Study (HRS) from the University of Michigan. Gosselin refers to HRS as the “premier source of quantitative information about aging in America,” as it provides longitudinal data about 20,000 people in the United States from the age of 50 and older.

The NYU Data Catalog includes datasets collected outside of NYU (e.g. by the U.S. Census Bureau or by other universities) in order to help researchers locate datasets that they may not otherwise know about. The HRS is an one of the external datasets included in the NYU Data Catalog, and two faculty members act as local experts on the dataset for other researchers at NYU. While not all instances of the Data Catalog include local experts, at NYU we include information on researchers who have already worked on a dataset in order to encourage collaboration at the institution. Local experts are institutional researchers with experience using the dataset who agree to help guide researchers as they decide whether a dataset can answer their questions or provide meaningful information.

What the ProPublica article demonstrates (as well as the many articles in PubMed that feature the dataset) is that a single dataset can be used to investigate a wide variety of questions, if the analysis is done properly. For example, while Gosselin uses the dataset to investigate how U.S. workers are pushed out of their jobs and the financial ramifications of this practice, Virginia Chang, a researcher in the College of Global Public Health at NYU, has used it to investigate the effects of obesity on the survival rates of common acute illnesses.

The Data Catalog was designed to increase cross-disciplinary research and collaboration, and Gosselin’s article illustrates how research data can benefit the public when many people with different areas of expertise have access to it.

Data Catalog Collaboration Project receives CTSA Great Team Science Contest Award for Top Importance

what is team science?

Team science is a collaborative effort to address scientific challenges that leverage the strengths and expertise of professionals trained in different fields. One of the overarching goals of the Clinical and Translational Science Awards (CTSA) given to select institutions is to promote team science through establishing mechanisms by which biomedical researchers can collaborate, be trained in why team science is important, and develop evaluation measures to assess teamwork in biomedical research contexts.

about the award

Last week, the Data Catalog Collaboration Project (DCCP) found out that they had received an award from the CTSA Great Team Science Contest, which asked CTSA-funded hubs to submit examples of team science successes to be evaluated by a review panel and presented at the fall meeting. Each application was scored based on a number of categories: overall score, top importance, top innovation, top impact, among others. 170 applications were submitted, and the DCCP received the highest score for the Top Importance category. I was able to present the topic at the Fall CTSA Program Meeting where I could discuss the value of the data catalog approach to leaders in biomedical translational research. The people I spoke to were most interested in how the data catalog can help them make disparate, hard to find research datasets that are spread out and stored in various places across their institution more discoverable using a single system.

Expanding our reach beyond libraries

From our perspective, the most exciting part about receiving this award was that our approach of having libraries implement local data catalogs, establishing collaborations between librarians and developers to improve data discovery, fostering partnerships with our local institutional research initiatives, and making concerted efforts to reduce the barrier on the research community to share was seen as the most important project by a community that expands well beyond the realm of libraries. This is a considerable achievement because the other projects that were submitted were very strong in addressing a diverse range of team science initiatives. The DCCP has long been an advocate of ensuring that institutional research data is discoverable, available and usable regardless of where it is stored, and this award is an acknowledgement that the broader biomedical research community agrees.

The DCCP has grown to 8 libraries in total working to improve institutional data discovery, and this award can serve as evidence of its value to libraries or broader institutions interested in improving their data discovery needs. The DCCP members all provide a great service to their institution, and to the other libraries participating in this effort. If you are interested in being a part of this effort, please reach out to us.