API Reference
DeIdentification.DeIdDicts
— Method.DeIdDicts(maxdays, shiftyears, dateformat)
Structure containing dictionaries for project level mappings
- Primary ID -> Research ID
- Research ID -> DateShift number of days
- Research ID -> Salt value
DeIdentification.ProjectConfig
— Method.ProjectConfig(config_file::String)
Structure containing configuration information for project level information in the configuration YAML file. This includes an array containing the FileConfig structures for dataset level information.
DeIdentification.build_config
— Method.build_config(data_dir::String, config_file::String)
Interactively guides user through writing a configuration YAML file for DeIdentification. The data_dir
should contain one of each type of dataset you expect to deidentify (e.g. the data directory ./test/data'
contains pat.csv
, med.csv
, and dx.csv
). The config builder reads the headers of each CSV file and iteratively asks about the output name and deidentification type of each column. The results are written to config_file
.
DeIdentification.build_config_from_csv
— Method.build_config_from_csv(project_name::String, file::String)
Generates a configuration YAML file from a CSV file that defines the mappings. The CSV file needs to have at least three named columns, one called Source Table which defines the name of the CSV file the data will be read from, a second called Field which defines the name of the field in the data source and a final column called Method which contains the method to apply (one of Hash - Research ID, Hash, Hash & Salt, Date Shift, or Drop).
Any column renames and pre- or post-processing will need to be added manually to the file.
DeIdentification.deidentify
— Method.deidentify(cfg::ProjectConfig)
This is the constructor for the DeIdentified
struct. We use this type to store arrays of DeIdDataFrame
variables, while also keeping a common salt_dict
and dateshift_dict
between DeIdDataFrame
s. The salt_dict
allows us to track what salt was used on what cleartext. This is only necessary in the case of doing re-identification. The id_dict
argument is a dictionary containing the hash digest of the original primary ID to our new research IDs.
DeIdentification.deidentify
— Method.deidentify(config_path)
Run entire pipeline: Processes configuration YAML file, de-identifies the data, and writes the data to disk. Returns the dictionaries containing the mappings.
DeIdentification.FileConfig
— Type.FileConfig(name, filename, colmap, rename_cols)
Structure containing configuration information for each datset in the configuration YAML file. The colmap contains mapping of column names to their deidentification action (e.g. hash, salt, drop).
DeIdentification.dateshift_val!
— Method.dateshift_val!(dicts, val, pid)
Dateshift fields containing dates. Dates are shifted by a maximum number of days specified in the project config. All of the dates for the same primary key are shifted the same number of days. Of note is that missing values are left missing.
DeIdentification.deid_file!
— Method.deid_file!(dicts, file_config, project_config, logger)
Reads raw file and deidentifies per file configuration and project configurationg. Writes the deidentified data to a CSV file and updates the global dictionaries tracking identifier mappings.
DeIdentification.getcurrentdate
— Method.getcurrentdate()
Returns the current date as a string conforming to ISO8601 basic format.
This is used to generate filenames in a cross-platform compatible way.
DeIdentification.hash_salt_val!
— Method.hash_salt_val!(dicts, val, pid)
Salt and hash fields containing unique identifiers. Hashing is done in place using SHA256 and a 64-bit salt. Of note is that missing values are left missing.
DeIdentification.setrid
— Method.setrid(val, dicts)
Set the value passed (a hex string) to a human readable integer. It generates a new ID if the value hasn't been seen before, otherwise the existing ID is used.
DeIdentification.write_dicts
— Method.write_dicts(deid_dicts)
Writes DeIdDicts structure to file. The dictionaries are written to josn. The files are written to the output_path
specified in the configuration YAML.
DeIdentification.write_yaml
— Method.write_yaml(file::String, yml::AbstractDict)
Recursively writes YAML object to file. A YAML object is a dictionary, which can contain arrays of YAML objects. See YAML.jl for more on format.