Finding Data To Index: When the Data Availability Statement Leads Nowhere

This blog post is final part of a series on using the “has data avail” filter on PubMed Central (PMC) to identify a wide range of institutional datasets and what we at NYU learned about our institution’s data sharing practices from this exercise. To learn more about the background of this project and how we pulled the bibliographic data used, please refer to our first post. This blogpost is the last in the series and will discuss additional findings related to the bibliometric data we pulled from PMC.

Unsavory Researcher Behavior

When investigating Data Availability Statements (DAS), we learned about how researchers use repositories, use data that is available through application to a consortium, and make their data available in Supporting Information Files. Yet, we also found several examples of unsavory researcher behavior. Several authors listed the data as available in non-existent repositories. For example, on researcher stated that his data was available at an institutional data access point that does not exist. Other researchers listed the data as available on their lab websites, yet when librarians examined the lab website, there wasn’t any data available.

Uninformed Researcher Behavior

Additionally, other Data Availability Statements (DAS) seemed to demonstrate a lack of understanding on what constitutes “data” and what should be included in a statement. One statement reads, “No datasets were generated or analyzed during the current study,” even though the researchers took samples and analyzed them in the publication. Other DAS’s did not list enough information for a researcher to track down the data described. For example, one stated, “NLM has access to all the data and data are available upon request.” With so little information, it seems unlikely that the data could be located and re-used in a meaningful way.

What Librarians Can Do

While it may be easy to assume that all of these researchers are bad actors, it is also possible that the researchers require more guidance in order to write helpful and meaningful DAS’s. As librarians, we can advocate for better DAS’s by providing information on what the DAS is meant to accomplish - guide other researchers to the data for re-use or replications. While it could be helpful for librarians to develop templates, data varies immensely across disciplines and projects. Providing the logic of the DAS will allow researchers to extrapolate about what information is necessary within the boundaries of their project and their domain.