Skip to content
D

dataorc

Dataorc

A mono-repo for all dataorc functionalities including:

  • XNAT
    • XNAT upload session - Instructions for XNAT Uploads
    • XNAT download session
    • XNAT upload resource
    • XNAT download session
    • XNAT administration
    • XNAT qunex related functionalities
  • REDCap
  • NDA
    • Prepare csv
    • Download
  • ETL and SQL query related to clinical assessments and behavioral data
  • QuNex log tracking

Installation

To install dataorc, follow the installation instructions for your OS:

Windows Installation

MacOS Installation

Linux Installation

If you are going to upload data for the ProNET study, there is a special build created just for ProNET users, follow the installation instructions available in the ProNET Data Upload SOP

Development setup

Requirements

git lfs

Rust

The easiest way to install Rust is to use rustup, a Rust version manager. Follow the instructions for your operating system on the rustup site. rustup will check the rust-toolchain file and automatically install and use the correct Rust version for you.

Cargo hakari

Managing feature flags and improve build performance https://docs.rs/cargo-hakari/latest/cargo_hakari/index.html

cargo install cargo-hakari --locked

To update, run

cargo install cargo-hakari --locked --force 

Cargo insta

https://insta.rs/docs/cli/ We use insta at several critical places where manually writing unit tests is not scalable. In transform, we use insta to check generated sql queries and loaded tables. We also use it to test qunex session and mapping file parsing. You don't need to have insta installed to run all the tests, because we always use snapshot files. However, if you need to review and update those tests, you need to have cargo-insta installed.

Cargo edit

https://github.com/killercup/cargo-edit More and more features are moved from this project into cargo itself (add and update was also previously provided by this project). Right now still need cargo upgrade from this project, which allows to upgrade dependencies without manually editing Cargo.toml files.

Protobuf (protoc)

We are storing passwords in the system keychain/secure storage in a binary format defined as a protobuf message. The rust prost library only needs protoc, because prost is a rust code generator format.

Docker & Docker Compose

This is for setting up a local standalone XNAT instance.

Start local xnat server

docker-compose up -d

Create the .env with the following env variables

XNAT_URL=http://127.0.0.1:8080
XNAT_USERNAME=admin
XNAT_PASSWORD=admin

Testing

Run all default unit tests

cargo test --workspace

Run XNAT API integration tests

Prerequisite

  • An XNAT test instance must be up and running
  • The .env file with corresponding information must exist in the repo root.
  • Currently, we will only read those env variables during tests. This may change when we start to use dataorc inside container on XNAT. Note: you should always run on an ephemeral XNAT instance, and some of the tests will create new accounts that cannot be (easily) removed.

Testing API library functions with cargo

The function names of XNAT api integration tests should start with api.

cargo test --package xnat api -- --include-ignored
test client::user::test::api_create_user ... ok
test client::projects::test::api_create_project ... ok
test client::projects::test::api_manage_project_users ... ok

Logging

Logging in dataorc is done with the tracing library. You can adjust the logging level through the DATAORC_LOG env variable. For more detailed documentation for the env log filter feature look at https://docs.rs/tracing-subscriber/latest/tracing_subscriber/struct.EnvFilter.html

DATAORC_LOG=ddataorc_cli=info,error # info level logging for cli, error level for everything else (default)
DATAORC_LOG=debug # turn on debug globally

There are two special logging targets for debugging NDA csv export export_internal export_internal_value. They will provide information regarding how a value is found and transformed, with and without the actual values.

Workspace-hack

We currently have cargo-hakari enabled at the workspace level to manage feature flags of all the dependencies. The CI pipeline ensures that this file is always up-to-date. When you add new dependencies, you should run

cargo hakari generate

When you create a new crate, you should run

cargo hakari manage-deps

MacOS CI pipeline

We use docker containers provided by this project https://github.com/joseluisq/rust-linux-darwin-builder Please check the CI script for more information. Note that we have C dependencies (sqlite, lzma, ...), so setting the correct C compiler is important (x86_64 vs. arm64).

Upgrade rust version and dependencies

To update Rust version, you only need to change the version number in rust-toolchain.toml. rustup/cargo will pick up this change and use the correct version. Rust is on a 6-week release cycle. Currently, cargo update will update the Cargo.lock file https://doc.rust-lang.org/cargo/commands/cargo-update.html You need to install cargo-edit and use cargo upgrade to upgrade dependencies.

Using dataorc in other projects

Before one can use some crates independently in other projects (e.g., NBM), we need to restructure the current workspace. The dataorc-cli crate cannot be included in the workspace, and the Cargo.lock should not be included at the workspace level but in dataorc-cli instead. Crates that should need to be shared externally should not be directly tracked by workspace-hack. Their dependencies should still be included transitively. https://docs.rs/cargo-hakari/latest/cargo_hakari/config/index.html#final-excludes

External documentation

XNAT API: https://wiki.xnat.org/display/XAPI/XNAT+REST+API+Directory

Acknowledgment

This REDCap portion of this project is heavily inspired by the redcap-etl REDCap plugin developed by Indiana University. We used the same transformation rule syntax in this project.