Cataloging EHR Data: Experiences at NYU Langone Health

The Opportunity

The implementation of Electronic Health Record (EHR) systems has allowed researchers to leverage clinical data for research purposes. At NYU Langone Health, researchers are able to work with administrators to pull data from the EHR system and study the patient population of NYU Langone Health as well as the health care services offered here.

DataCore, the department that supports clinical data management needs at the institution, pulls these datasets from NYU’s EHR system. However, these EHR requests are not visible to the institutional research community. The NYU Health Sciences Library saw that these requests would be a good use case for the Data Catalog and that the Data Catalog could increase researcher awareness of DataCore’s service.

So, we suggested a collaboration with DataCore to index these EHR datasets in the NYU Data Catalog in order to make them, and the capabilities of our institutional systems, discoverable to others. In turn, DataCore could also utilize the NYU Data Catalog to avoid redundant researcher requests, as researchers could reference NYU Data Catalog records before submitting a pull request.

The Process

The process of creating these records and indexing the data pulls from the EHR system in the NYU Data Catalog happened in three stages:

  1. Gaining access to DataCore’s EHR pull request database

  2. Working directly with researchers to locate enough volunteers in order to prove the concept;

  3. Having the NYU Data Catalog written into DataCore policy and indexing all EHR system data pulls.

Crucial to both stages was access to the information DataCore collects on researchers when they request a pull. DataCore thus provided the NYU Data Catalog team with access to their request form data, which is stored in REDCap. Data collected from researchers includes information about their study, the specifics of the pull request, any grant funding, IRB numbers, and additional context that could help during the curation process.

Access to this data allowed the NYU Data Catalog team to reach out to researchers who had requested EHR data to see if they would be willing to participate and also to help index the records about the data pulls that each researcher requested. After locating enough volunteers to create ten Data Catalog records of this type of dataset, the NYU Data Catalog team presented the project to our institutional Informatics Research Steering Committee, with the goal of having the NYU Data Catalog written into the DataCore request form, make participation mandatory for all researchers who make EHR pulls for research through DataCore.

After this meeting, the Informatics Research Steering Committee made it mandatory for EHR pulls done through DataCore to be indexed in the NYU Data Catalog. By completing this second stage, the NYU Data Catalog team could index and include all past and future datasets EHR datasets generated by DataCore.

The Solution & Next Steps

At the time of this post, we have completed 103 of 143 EHR datasets pulled for research purposes at NYU Langone Health.

Below is a diagram of the workflow followed for researchers, DataCore staff, and the NYU Data Catalog team. The diagram demonstrates the flow of information from researcher to DataCore to the NYU Data Catalog team, including from researchers who are using the NYU Data Catalog to re-use an EHR dataset request that has already been cataloged.


Below is a screenshot of a record for the dataset, “Characteristics and Treatment of Dermatomyositis: NYU Langone Health EHR.” This record describes an EHR pull that was a part of a project to better understand the relevant characteristics of patients with dermatomyositis. The record was generated through the process described above.


Outside of the indexing, there are several ongoing tasks related to this project that provide avenues for growth for the NYU Data Catalog and the DCCP more broadly.

Mass Upload of Data Catalog Records

Due to the number of datasets in the DataCore backlog, the NYU Data Catalog team began to work on a way to batch upload datasets. This functionality has been discussed at several DCCP meetings, but it was this project that pushed the group forward to create the functionality. With input from the DCCP, the developer took the lead on this aspect of the project. Although the mass upload tool requires some additional work, completed datasets can now be uploaded to the NYU Data Catalog (and any of the DCCP catalogs) in one fell swoop.

Tracking Requests & Publications

With the continued input and assistance from DataCore, we hope to implement better tracking for EHR dataset requests and future publications that result from use of the EHR data. Tracking EHR dataset requests will allow us to better understand the people who use the data catalog, while tracking future publications that result from the use of the data will allow us to better understand how researchers leverage EHR datasets for their research.

Tracking future publications will be part of an ongoing project at NYU Langone Health that is not currently tied to the Data Catalog. The Faculty Bibliography (FacBib) contains citation data for all publications of staff and faculty at NYU Langone Health. The NYU Data Catalog team is working with FacBib data to devise ways to track publications that result from particular EHR pulls so that that information can be shared with our institution.

Continued Collaboration

The NYU Data Catalog team will continue to index EHR datasets, and several other DCCP members are currently considering similar projects. With more DCCP members moving forward in this area, it will allow us to compare workflows and processes so that we can better catalog these datasets, minimize redundant effort, and learn from each others’ expertise.