API Reference

DeIdentification.DeIdDicts — Method.

DeIdDicts(maxdays, shiftyears, dateformat)

Structure containing dictionaries for project level mappings

Primary ID -> Research ID
Research ID -> DateShift number of days
Research ID -> Salt value

source

DeIdentification.ProjectConfig — Method.

ProjectConfig(config_file::String)

Structure containing configuration information for project level information in the configuration YAML file. This includes an array containing the FileConfig structures for dataset level information.

source

DeIdentification.build_config — Method.

build_config(data_dir::String, config_file::String)

Interactively guides user through writing a configuration YAML file for DeIdentification. The data_dir should contain one of each type of dataset you expect to deidentify (e.g. the data directory ./test/data' contains pat.csv, med.csv, and dx.csv). The config builder reads the headers of each CSV file and iteratively asks about the output name and deidentification type of each column. The results are written to config_file.

source

DeIdentification.build_config_from_csv — Method.

build_config_from_csv(project_name::String, file::String)

Generates a configuration YAML file from a CSV file that defines the mappings. The CSV file needs to have at least three named columns, one called Source Table which defines the name of the CSV file the data will be read from, a second called Field which defines the name of the field in the data source and a final column called Method which contains the method to apply (one of Hash - Research ID, Hash, Hash & Salt, Date Shift, or Drop).

Any column renames and pre- or post-processing will need to be added manually to the file.

source

DeIdentification.deidentify — Method.

deidentify(cfg::ProjectConfig)

This is the constructor for the DeIdentified struct. We use this type to store arrays of DeIdDataFrame variables, while also keeping a common salt_dict and dateshift_dict between DeIdDataFrames. The salt_dict allows us to track what salt was used on what cleartext. This is only necessary in the case of doing re-identification. The id_dict argument is a dictionary containing the hash digest of the original primary ID to our new research IDs.

source

DeIdentification.deidentify — Method.

deidentify(config_path)

Run entire pipeline: Processes configuration YAML file, de-identifies the data, and writes the data to disk. Returns the dictionaries containing the mappings.

source

DeIdentification.FileConfig — Type.

FileConfig(name, filename, colmap, rename_cols)

Structure containing configuration information for each datset in the configuration YAML file. The colmap contains mapping of column names to their deidentification action (e.g. hash, salt, drop).

source

DeIdentification.dateshift_val! — Method.

dateshift_val!(dicts, val, pid)

Dateshift fields containing dates. Dates are shifted by a maximum number of days specified in the project config. All of the dates for the same primary key are shifted the same number of days. Of note is that missing values are left missing.

source

DeIdentification.deid_file! — Method.

deid_file!(dicts, file_config, project_config, logger)

Reads raw file and deidentifies per file configuration and project configurationg. Writes the deidentified data to a CSV file and updates the global dictionaries tracking identifier mappings.

source

DeIdentification.getcurrentdate — Method.

getcurrentdate()

Returns the current date as a string conforming to ISO8601 basic format.

This is used to generate filenames in a cross-platform compatible way.

source

DeIdentification.hash_salt_val! — Method.

hash_salt_val!(dicts, val, pid)

Salt and hash fields containing unique identifiers. Hashing is done in place using SHA256 and a 64-bit salt. Of note is that missing values are left missing.

source

DeIdentification.setrid — Method.

setrid(val, dicts)

Set the value passed (a hex string) to a human readable integer. It generates a new ID if the value hasn't been seen before, otherwise the existing ID is used.

source

DeIdentification.write_dicts — Method.

write_dicts(deid_dicts)

Writes DeIdDicts structure to file. The dictionaries are written to josn. The files are written to the output_path specified in the configuration YAML.

source

DeIdentification.write_yaml — Method.

write_yaml(file::String, yml::AbstractDict)

Recursively writes YAML object to file. A YAML object is a dictionary, which can contain arrays of YAML objects. See YAML.jl for more on format.

source