Along with CRO partner MMS Holdings, TrialAssure presented a poster titled “Automating Data Anonymization Procedures with Software” at the 2019 DIA Global Clinical Trial Disclosure and Data Transparency Conference in Bethesda, Maryland.
As sharing of clinical data becomes commonplace, standards for anonymization techniques have developed and opportunities have arisen to automate much of the data anonymization process. This presentation will overview how anonymization teams can use configurable software systems to drive automation.
The team built a standard rule set, providing the ability to apply transformation methods to standard SDTM and ADaM variables in data. As datasets were processed, the library was augmented with “custom” variables specific to study, product, or sponsor, including associated anonymization methods.
Simply by re-using the standard rule set and automation tools, the team immediately realized an 80% increase in efficiency over customizing anonymization SAS macros. The library of custom variables also provided further downstream benefits as many of these variables and their anonymization rules were re-used in later studies. A risk assessment algorithm, embedded within the software, is also important as it provides justification for the anonymization rules applied.
The tool allowed the team to assess risk and make informed decisions for determining a rule for one or more variables, in real time. The risk of reidentification for each study is dependent on various quantitative parameters, like contextual risk score and publicly available quasi-identifiers (age/sex/race etc.) of the study data and of the similar studies that were conducted at that time.
By compiling information from questionnaires, clinical trial registries, and the data itself, the team was able to compare a quantitative risk score against a previously estimated risk threshold to ensure the anonymized study data falls within an acceptable level of risk, and thus reduces the risk of reidentification of individual participants and retains maximum data utility.
The team found that a dedicated software solution provided numerous benefits over a traditional approach using SAS, even when augmented with automation techniques. Most of the efficiency is gained during the setup stages, when a dataset can be quickly analyzed and processed according to a predefined standard rule set for anonymization. A risk-embedded anonymization tool allows teams to assess the risk of re-identification, adjust the anonymization rules accordingly, and effectively produce datasets that are ready to be shared.