PyHBM

Description

PyHBM is a Python package that enables data harmonization in the human biomonitoring (HBM) field. The user can provide configuration in the well-known Excel format, based on simple instructions. The package can be installed and used through the command line.

Key Points

1. Overview

The Python package offers a comprehensive suite of tools for data validation and transformation. Its modular design ensures flexibility, while the configuration-driven approach simplifies the customization process for users.

2. Features

a. Configurability

  • Users can define data validation and transformation rules using a simple Excel file.
  • Configuration supports a wide range of data types, allowing for versatile use cases.
  • Intuitive syntax and template-driven approach make it accessible to users with varying levels of technical expertise.

b. Data Validation

  • Robust data validation functionalities ensure data quality and adherence to specified criteria.
  • Customizable validation rules can be easily applied based on the user-defined configuration.

c. Data Transformation

  • Transform data based on project-specific rules and conditions.

3. Usage Examples

a. Configuration File

VarnameDescriptionTypeUnitMissingsAllowedMinValueMaxValueAllowedValuesDecimalsAfterCommaConditionalFormulaRemarks
id_subjectUID of the subjectinteger 01inf     
heightheight of the subject at samplingdecimalcm130250 1   
iscedHighest education level of the subject (ISCED scale)categorical 1  1 = Low education (ISCED 0-2); 2 = Medium education (ISCED 3-4); 3 = High education (ISCED >=5) IF ageyears IS < 20 THEN isced IS not empty  

b. Data

id_subjectheightisced
11601
21702
31803

c. Usage (validation)

> pyhbm data/file.xlsx -cb config/codebook.xlsx

With additional flags for output paths (-o) and levels of logging output (-v, -vv).

d. Output

{
    "structural": [],
    "validation": [],
}

The output is in JSON format, separating structural and validation errors.

Relevant technologies

  • Languages:
    • Python
  • Libraries:
    • pandas
    • numpy
    • scipy
    • typer
    • simplejson
    • rdflib
    • rich
    • pytest
    • pytest-cov
    • Faker
    • hypothesis
    • mkdocs
    • ipykernel
  • Other
    • black
    • flake8
    • isort
    • pydocstyle

The package is used in an online tool, which you can find here:

Additional information on the harmonization process that is enabled through the package can be found here:

Web illustrations by Storyset