UC Health Epic Data for Research Policy Update

What are the changes, and when are they going into effect?

Effective May 1, 2025,

  1. Fully identified datasets (containing PHI) based on IRB waivers will not be provided to researchers except for exceptional requests approved on an ad hoc basis by the appropriate UC Health Digital Council committee.
  2. Fully identified datasets (containing PHI) based on IRB waivers stored on UC resources (servers or local desktops) will be progressively migrated to the UC Health data center in a phased approach.
  3. Limited and de-identified datasets stored on UC resources (servers or local desktops) will be progressively migrated to the UC Health data center in a phased approach.

What does this mean for researchers?

  1. Researchers will no longer be able to receive a fully identified dataset based solely on IRB waivers.
    1. Instead, de-identified or, in specific cases, limited data sets (as per HIPAA definition) will be provided, and pseudo-identifiers will replace the actual PHI/PII. The UC Center for Health Informatics (CHI) will maintain patient re-identification maps in the event the study needs extra data linkage (i.e., for recruitment, data augmentation, or de-duplication).
    2. If there is an exceptional reason for a researcher to have a fully identified dataset, they will submit a formal request with a detailed justification explaining why Protected Health Information (PHI) is necessary to the CHI, which will route it to the appropriate UC Health Digital Council committee.
  2. Existing datasets and ongoing studies with recurring datasets will be progressively migrated to the UC Health data center or another data center with the appropriate business agreements with UC Health.

What are the phases of the data migration plan?

Phase I:

The fully identified datasets residing in UC-managed servers and currently holding UC Health’s data for unconsented patients will be identified and cataloged through surveys and other appropriate methods. After being identified, representatives from the CHI and the research teams responsible for those datasets will develop a plan to move the data following the updated guidelines.

During this phase, a limited dataset can be generated upon request at no additional charge and provided to the research team. In that case, a copy of the original dataset will be archived on UC Health-approved servers.

If, in the future, researchers need access to the fully identified datasets and have an approved justification for their use, the CHI will assist researchers in obtaining access to the archived datasets at UC Health.

Researchers will continue to have access to de-identified or limited datasets hosted on UC servers.

This Policy Update provides definitions of limited and de-identified datasets below. The CHI has procedures and governance to permit the re-linking of limited or de-identified data for recruitment, data linkage, data updates, and other uses.

Phase II:

Following the plans developed in Phase I, all datasets will be moved to or retained on UC Health-approved servers. A web portal will allow researchers to access their datasets from within the UC or UC Health networks. Data will be maintained securely within a database management system.

In the future, any researcher who prefers to perform their statistical analysis securely will be provided access to an enclave (virtual server) with statistical programs, e.g., R Studio and Jupyter Notebook, for analysis.

Will I be charged for converting previously extracted, fully identified datasets into a limited or de-identified dataset?

The UC Center for Health Informatics will not charge for converting an existing fully identified dataset into a limited or de-identified dataset. For a limited time, the CHI will also not charge for storing those datasets on UC Health-approved servers.

How do I request data now and in the future?

Researchers will continue to utilize our Service Portal as before.

Researchers must submit a detailed data request outlining the purpose of their research, the inclusion/exclusion criteria, the type of data needed, and the justification for the request for any fully identified data.

All requests will be reviewed and approved by the Director of the UC Center for Health Informatics.

Who do I contact for more information?

For more information or questions, please contact the UC Center for Health Informatics at combmichi@uc.edu.

What are limited and de-identified datasets?

Limited Datasets

A limited dataset under HIPAA is a set of identifiable healthcare information that excludes certain direct identifiers but may still include some indirect identifiers. These datasets can be used for research, public health activities, and healthcare operations without obtaining prior patient authorization, provided a data use agreement is in place.

Key Characteristics:
  • Includes: City, state, ZIP code, and dates of service.
  • Excludes: Names, street addresses, phone numbers, social security numbers, email addresses, medical record numbers, and other direct identifiers
  • Usage: Can be shared with entities that have signed a data use agreement, ensuring the data is used only for specified purposes and not re-identified.

De-identified Datasets

De-identified data is health information from which all 18 personal identifiers specified by HIPAA have been removed, making it impossible to trace back to an individual. This type of data is no longer considered PHI under HIPAA and can be used more freely for research and other purposes.

Key Characteristics:
  • Excludes: All 18 HIPAA-specified identifiers, including names, geographic subdivisions smaller than a state, all elements of dates (except year) directly related to an individual, and other unique identifying numbers or characteristics
  • Usage: Can be used without restrictions related to HIPAA, as it no longer contains identifiable information

Summary of Differences

  • Identifiability: Limited datasets still contain some indirect identifiers and are considered identifiable under HIPAA, whereas de-identified datasets have all direct and indirect identifiers removed.
  • Regulatory Requirements: Limited datasets require a data use agreement and are subject to HIPAA Privacy Rule standards, while de-identified datasets are not subject to these restrictions