wadi.filereader module

class wadi.filereader.FileReader(log_fname='wadi.log', output_dir='wadi_output', silent=False, create_file=False)

Bases: WadiBaseClass

WaDI class for importing data files.

__call__(file_path, format='stacked', c_dict=None, mask=None, lod_column=None, extract_units_from_feature_name=False, pd_reader='read_excel', **kwargs)

This method provides an interface for the user to set the attributes that determine the FileReader object behavior.

Parameters:
  • file_path (str) – The file to be read.

  • format (str, optional) – Specifies if the data in the file are in ‘stacked’ or ‘wide’ format. Permissible formats are defined in VALID_FORMATS. The ‘gef’ format is not implemented (yet). Default: ‘stacked’

  • c_dict (dict, optional) – Only used when the format is ‘stacked’. This dictionary maps column names in the file to the compulsory column names defined in REQUIRED_COLUMNS_S. Default: DEFAULT_C_DICT

  • mask (str, optional) – Name of the column that contains True/False labels. These sometimes occur in stacked data files to indicate if a reported value is below or above the detection limit. If a valid column name is specified, the values marked with False are filtered out from the converted DataFrame. Only used when the format is ‘stacked’. Default: None

  • lod_column (str, optional) – Name of the column that contains information about whether the reported measurement value is below or above the limit of detection (LOD). If a valid column name is specified, the symbol is prefixed to the measurement value. Only used when the format is ‘stacked’. Default: None

  • extract_units_from_feature_name (bool) – Indicates if the feature name also contains the units. Default: False

  • pd_reader (str, optional) – Name of the Pandas function to read the file. Must be a valid function name. While all functions implemented in Pandas could be used in principle, the design of WaDI has not been tested for functions other than read_excel and read_csv. Default: ‘read_excel’.

  • **kwargs (dict, optional) – Dictionary with kwargs for the ‘pd_reader’ function. The kwargs can be a mix of WaDI specific keywords and valid keyword arguments for the ‘pd_reader’ function.

_execute()

This method imports the data from a file format readable by Pandas. Before calling the Pandas reader function, it checks the kwargs specified by the user when the class object was initialized.

_read_file(file_path, pd_reader_name, blocks)

This method calls the specified Pandas reader function to perform the actual data import from file_path. It imports a DataFrame with the data as well as lists with the measurement units and the datatypes (the latter two are not used when the data are in ‘stacked’ format).

Parameters:
  • file_path (str) – The file to be read.

  • pd_reader_name (str) – Name of the Pandas function to read the file.

  • blocks (list) – List with keyword arguments that specify (i) the number of the row with the units, (ii) the datatpe and (iii) any kwargs for the pd_reader function. Note that (i) and (ii) do not apply to ‘stacked’ data.

Returns:

  • df (DataFrame) – Pandas DataFrame with the imported data

  • units (list) – List with the units for each column read.

  • datatypes (list) – List with the datatypes for each column read.

Raises:

ValueError – When index_col is a kwarg in one of the blocks.

Notes

The return values units and datatypes are used when the InfoTable is created for ‘wide’ format data. They are not used when the data format is ‘stacked’.

_read_single_row_as_list(file_path, pd_reader, pd_kwargs, row_number)

This method calls the specified Pandas reader function to read a single row from file_path.

Parameters:
  • file_path (str) – The file to be read.

  • pd_reader_name (str) – Name of the Pandas function to read the file.

  • pd_kwargs (dict) – Keyword arguments for the pd_reader function.

  • row_number (int) – The (zero-based) number of the row to read.

Returns:

result – List with the values read.

Return type:

list