dataorc
Dataorc
A mono-repo for all dataorc functionalities including:
-
XNAT -
XNAT upload session - Instructions for XNAT Uploads -
XNAT download session -
XNAT upload resource -
XNAT download session -
XNAT administration -
XNAT qunex related functionalities
-
-
REDCap -
NDA -
Prepare csv -
Download
-
-
ETL and SQL query related to clinical assessments and behavioral data -
QuNex log tracking
Installation
To install dataorc, follow the installation instructions for your OS:
If you are going to upload data for the ProNET study, there is a special build created just for ProNET users, follow the installation instructions available in the ProNET Data Upload SOP
Development setup
Requirements
git lfs
Rust
The easiest way to install Rust is to use rustup
, a Rust version manager. Follow the instructions for your operating system on the rustup
site. rustup
will check the rust-toolchain
file and automatically install and use the correct Rust version for you.
Cargo hakari
Managing feature flags and improve build performance https://docs.rs/cargo-hakari/latest/cargo_hakari/index.html
cargo install cargo-hakari --locked
To update, run
cargo install cargo-hakari --locked --force
Cargo insta
https://insta.rs/docs/cli/ We use insta at several critical places where manually writing unit tests is not scalable. In transform
, we use insta to check generated sql queries and loaded tables. We also use it to test qunex session and mapping file parsing. You don't need to have insta installed to run all the tests, because we always use snapshot files. However, if you need to review and update those tests, you need to have cargo-insta installed.
Cargo edit
https://github.com/killercup/cargo-edit More and more features are moved from this project into cargo itself (add and update was also previously provided by this project). Right now still need cargo upgrade
from this project, which allows to upgrade dependencies without manually editing Cargo.toml
files.
Protobuf (protoc)
We are storing passwords in the system keychain/secure storage in a binary format defined as a protobuf message. The rust prost library only needs protoc, because prost is a rust code generator format.
Docker & Docker Compose
This is for setting up a local standalone XNAT instance.
Start local xnat server
docker-compose up -d
Create the .env
with the following env variables
XNAT_URL=http://127.0.0.1:8080
XNAT_USERNAME=admin
XNAT_PASSWORD=admin
Testing
Run all default unit tests
cargo test --workspace
Run XNAT API integration tests
Prerequisite
- An XNAT test instance must be up and running
- The
.env
file with corresponding information must exist in the repo root. - Currently, we will only read those env variables during tests. This may change when we start to use dataorc inside container on XNAT. Note: you should always run on an ephemeral XNAT instance, and some of the tests will create new accounts that cannot be (easily) removed.
Testing API library functions with cargo
The function names of XNAT api integration tests should start with api
.
cargo test --package xnat api -- --include-ignored
test client::user::test::api_create_user ... ok
test client::projects::test::api_create_project ... ok
test client::projects::test::api_manage_project_users ... ok
Logging
Logging in dataorc is done with the tracing
library. You can adjust the logging level through the DATAORC_LOG
env variable. For more detailed documentation for the env log filter feature look at https://docs.rs/tracing-subscriber/latest/tracing_subscriber/struct.EnvFilter.html
DATAORC_LOG=ddataorc_cli=info,error # info level logging for cli, error level for everything else (default)
DATAORC_LOG=debug # turn on debug globally
There are two special logging targets for debugging NDA csv export export_internal
export_internal_value
. They will provide information regarding how a value is found and transformed, with and without the actual values.
Workspace-hack
We currently have cargo-hakari
enabled at the workspace level to manage feature flags of all the dependencies. The CI pipeline ensures that this file is always up-to-date. When you add new dependencies, you should run
cargo hakari generate
When you create a new crate, you should run
cargo hakari manage-deps
MacOS CI pipeline
We use docker containers provided by this project https://github.com/joseluisq/rust-linux-darwin-builder Please check the CI script for more information. Note that we have C dependencies (sqlite, lzma, ...), so setting the correct C compiler is important (x86_64 vs. arm64).
Upgrade rust version and dependencies
To update Rust version, you only need to change the version number in rust-toolchain.toml. rustup/cargo
will pick up this change and use the correct version. Rust is on a 6-week release cycle. Currently, cargo update will update the Cargo.lock file https://doc.rust-lang.org/cargo/commands/cargo-update.html You need to install cargo-edit
and use cargo upgrade
to upgrade dependencies.
Using dataorc in other projects
Before one can use some crates independently in other projects (e.g., NBM), we need to restructure the current workspace. The dataorc-cli
crate cannot be included in the workspace, and the Cargo.lock
should not be included at the workspace level but in dataorc-cli
instead. Crates that should need to be shared externally should not be directly tracked by workspace-hack. Their dependencies should still be included transitively. https://docs.rs/cargo-hakari/latest/cargo_hakari/config/index.html#final-excludes
External documentation
XNAT API: https://wiki.xnat.org/display/XAPI/XNAT+REST+API+Directory
Acknowledgment
This REDCap portion of this project is heavily inspired by the redcap-etl
REDCap plugin developed by Indiana University. We used the same transformation rule syntax in this project.