Getting started

Installation

The library is available on PyPI and can be installed with

$ pip install wadi

Requirements

WaDI requires Python version 3.9 or higher and has the following dependencies:

  • fuzzywuzzy

  • googletrans version 3.1 or higher

  • molmass

  • NumPy

  • Pandas

  • Pint

WaDI workflow and concepts

The typical workflow for WaDI is to import the data from a spreadsheet file, map the names of the features to their alias (the term used in WaDI to indicate the desired name in the final output) and perform actions like unit conversion and merging of columns.

Minimal working example

This example demonstrates how to import an Excel file with stacked data. It does nothing other than to convert the data from ‘stacked’ to ‘wide’ format. A more elaborate version of this example is given in the user guide section.

# Import the library
In [1]: import wadi as wd

Get the folder containing the data that is used within this documentation.

In [2]: from wadi.documentation_helpers import get_data_dir

In [3]: DATA_DIRECTORY = get_data_dir()
# Create an instance of a WaDI DataObject, specify the log file name
In [4]: wdo = wd.DataObject(log_fname='minimal_usage.log', silent=True)

# Import the data. The 'c_dict' dictionary specifies the column names
# for the sample identifiers,  feature names, concentrations and units.
In [5]: wdo.file_reader(DATA_DIRECTORY / 'stacked_data.xlsx',
   ...:     format='stacked',
   ...:     c_dict={'SampleId': 'Sample number',
   ...:             'Features': 'Parameter description',
   ...:             'Units': 'Unit description',
   ...:             'Values': 'Reported value',
   ...:     },
   ...: )
   ...: 

# Get the converted DataFrame
In [6]: df = wdo.get_converted_dataframe()

# Show the result
In [7]: df.head()
Out[7]: 
              1,2-Dichloroethane Chloride  ... Calcium         (ICP-AES) EC 20degC
                            µg/l     mg/l  ...                      mg/l      mS/m
Sample number                              ...                                    
23010701                  < 0.05    100.0  ...                       NaN       NaN
22122401                  < 0.05      NaN  ...                       NaN       NaN
22122402                     NaN     10.0  ...                      38.0      26.0

[3 rows x 6 columns]