Changes

Mara Heneks · 30a6d572
--- a/NDA-Upload-Preparation.md
+++ b/NDA-Upload-Preparation.md
+```markdown
+# REDCap Export and NDA Upload Preparation
+
+## Overview
+![REDCap NDA overview](docs/redcap-and-nda-prep/NDA Upload Workflow.png)
+
+This chart shows the data flow for uploading data from REDCap and XNAT to the NDA (NIMH Data Archive). In each bounding box, we are showing relevant Dataorc commands. REDCap and XNAT are our primary data sources, and where data are permanently stored within our system. We use the REDCap export record API to export selected fields in CSV format. Next, we apply the transformation rules to split the single CSV into multiple relational database tables stored in the Dataorc SQLite database, as shown in the flowchart. The transformation rules define how to divide different fields (and potentially fields appearing under different contexts[^1]) into different database tables. 
+
+During the transformation step, we can decide how strict we want to be regarding data types. For example, we can enforce that a field must always contain integers. However, this will cause the whole import will fail, even if only one field contains something that cannot be parsed as an integer. In general, we want to avoid this because the SQL database is much easier to query and we have more options to handle different edge cases. To achieve this, we can define SQL queries that will be applied to the SQLite database. One of the most common queries is to disable certain records from being exported.
+
+At this point, we are ready to export clinical/behavioral instruments as CSV files for NDA upload. Usually, there are differences in field names and value representations between REDCap and NDA. New REDCap databases can be designed to minimize this difference, but sometimes this is not practical, as we will always export raw choice values from REDCap. Sometimes we can address the difference by performing a simple mapping between the two sets of values. In other cases, we need to do more sophisticated operations, including calculations. All of these data manipulations are defined in the export config file.
+
+For neuroimaging data, Dataorc can generate image03 CSV files for DICOM and NIfTI formats. In both cases, we require data to be processed through the QuNex `import_dicom` step, after which we have sorted DICOM files, NIfTI files, NIfTI JSON sidecar files, and bvec and bval files for diffusion data. For NIfTI uploads, because certain DICOM fields are not included in the JSON file by dcm2niix, we have to export them separately from XNAT API. 
+
+Config files you need to prepare for a new study:
+1. workspace config: `redcap` and `nda_upload` sections in the [workspace config file](docs/dataorc-workspace/workspace-config.md)
+1. [transformation rules](docs/redcap-and-nda-prep/02_transformation_rules.md)
+1. [SQL queries](docs/redcap-and-nda-prep/03_database_and_custom_queries.md)
+1. [export config for each NDA instrument](docs/redcap-and-nda-prep/04_export-config.md)
+1. qunex mapping with nda sequence type
+
+Once you have a study setup with the appropriate configurations for the data that you need to upload, re-running the workflow is 
+
+`dataorc redcap-import-api <path to dataorc redcap-nda-config dir> `
+
+to drop the old database and re-run the steps of the workflow to create a database ready to generate the output products. Then you would ensure that you have all of the required sessions downloaded from xnat and available where you're running it, and then for nifti upload, you would run
+
+`dataorc xnat-export-dicom-fields --host <HOST> --project <PROJECT> --sessions <SESSIONS(optional)> --dicom-fields <DICOM_FIELDS(optional)> --output-file <OUTPUT_FILE>`
+
+Then, if there are no other project specific steps, you would be ready to export the image03 csv using a command such as
+
+`dataorc nda-prep-export-image03 [OPTIONS] --sessionsfolder <SESSIONS_FOLDER> --sessions <SESSIONS> --mapping <MAPPING> <WORKSPACE_DIR>`
+
+Which you would need to run for each session to generate the data. Then you can create the NDA CSVs using 
+
+`dataorc nda-prep-export-csv [OPTIONS] <WORKSPACE_DIR>`
+
+Which will generate the csv for all of the defined instruments to export.
+
+Then you should have the data required for validation and upload to the NDA.
+
+
+[^1]: For example, we can potentially separate the same REDCap field appearing in a simple event, a repeating instrument, or a repeating event into different database tables. The original transformation rule system, as designed by IU-REDCap, technically allows this, and our transformation implementation can handle this correctly. However, the NDA export algorithm is not designed the handle this. 
+```
\ No newline at end of file