hydro.exchange#

Submodules#

Package Contents#

Classes#

ExchangeBottleFlag

Enum representing a WHP Bottle flag

ExchangeCTDFlag

Enum where members are also (and must be) ints

ExchangeSampleFlag

Enum where members are also (and must be) ints

FileType

Generic enumeration.

_ExchangeData

Dataclass containing exchange data which has been parsed into ndarrays

_ExchangeInfo

Low level dataclass containing the parts of an exchange file

CheckOptions

A simple typed namespace. At runtime it is equivalent to a plain dict.

Functions#

_bottle_get_params(params_units)

Given an ordered iterable of param, unit pairs, return the index of the column in the datafile for known WHP params.

_bottle_get_flags(params_units, whp_params)

Given an ordered iterable of param unit pairs and WHPNames known to be in the file, return the index of the column indicies of the flags for the WHPNames.

_bottle_get_errors(params_units, whp_params)

Given an ordered iterable of param unit pairs and WHPNames known to be in the file, return the index of the column indicies of the errors/uncertanties for the WHPNames.

_ctd_get_header(line[, dtype])

_is_all_dataarray(val)

flatten_cdom_coordinate(dataset)

Takes the a dataset with a CDOM wavelength and explocdes it back into individual variables

add_cdom_coordinate(dataset)

Find all the paraters in the cdom group and add their wavelength in a new coordinate

add_geometry_var(dataset)

Adds a CF-1.8 Geometry container variable to the dataset

add_profile_type(dataset, ftype)

Adds a profile_type string variable to the dataset.

finalize_ancillary_variables(dataset)

Turn the ancillary variable attr into a space seperated string

combine_bottle_time(dataset)

Combine the bottle dates and times if present

check_is_subset_shape(a1, a2[, strict])

Ensure that the shape of the data in a2 is a subset (or strict subset) of the data shape of a1

check_flags(dataset[, raises])

Check WOCE flag values agaisnt their param and ensure that the param either has a value or is "nan"

_get_fill_locs(arr[, fill_values])

_extract_numeric_precisions(data)

Get the numeric precision of a printed decimal number

_is_valid_exchange_numeric(data)

_combine_dt_ndarray(date_arr[, time_arr, time_pad])

sort_ds(dataset)

Sorts the data values in the dataset

check_sorted(dataset)

Check that the dataset is sorted by the rules in sort_ds()

combine_dt(dataset[, is_coord, date_name, time_name, ...])

Combine the exchange style string variables of date and optinally time into a single

set_axis_attrs(dataset)

Set the CF axis attribute on our axis variables (XYZT)

set_coordinate_encoding_fill(dataset)

Sets the _FillValue encoidng to None for 1D coordinate vars

_load_raw_exchange(filename_or_obj, *[, ...])

all_same(ndarr)

Test if all the values of an ndarray are the same value

read_exchange(filename_or_obj, *[, fill_values, ...])

Loads the data from filename_or_obj and returns a xr.Dataset with the CCHDO

Attributes#

exception hydro.exchange.ExchangeDataFlagPairError(error_data)[source]#

Bases: ExchangeDataError

There is a mismatch between what the flag value expects, and the fill/data value.

Examples:

  • something with a flag of 9 has a non fill value

  • something with a flag of 2 as a fill value instead of data

Parameters:

error_data (xarray.Dataset) –

exception hydro.exchange.ExchangeDataInconsistentCoordinateError[source]#

Bases: ExchangeDataError

Error raised if the reported latitude, longitude, and date (and time) vary for a single profile.

A “profile” in an exchange file is a grouping of data rows which all have the same EXPOCODE, STNNBR, and CASTNO. The SAMPNO/CTDPRS is allowed/requried to vary for a single profile and is what identifies samples within one profile.

exception hydro.exchange.ExchangeDataPartialCoordinateError[source]#

Bases: ExchangeDataError

Error raised if values for latitude, longitude, or pressure are missing.

It is OK by the standard to omit the time of day.

exception hydro.exchange.ExchangeDataPartialKeyError[source]#

Bases: ExchangeDataError

Error raised when there is no value for one (or more) of the following parameters.

  • EXPOCODE

  • STNNBR

  • CASTNO

  • SAMPNO (only for bottle files)

  • CTDPRS (only for CTD files)

These form the “composite key” which uniquely identify the “row” of exchange data.

exception hydro.exchange.ExchangeDuplicateKeyError[source]#

Bases: ExchangeDataError

Error raised when there is a duplicate composite key in the exchange file.

This would occur if the exact values for the following parameters occur in more than one data row:

  • EXPOCODE

  • STNNBR

  • CASTNO

  • SAMPNO (only for bottle files)

  • CTDPRS (only for CTD files)

exception hydro.exchange.ExchangeEncodingError[source]#

Bases: ExchangeError

Error raised when the bytes for some exchange file cannot be decoded as UTF-8.

exception hydro.exchange.ExchangeBOMError[source]#

Bases: ExchangeError

Error raised when the exchange file has a byte order mark.

exception hydro.exchange.ExchangeError[source]#

Bases: ValueError

This is the base exception which all the other exceptions derive from. It is a subclass of ValueError.

exception hydro.exchange.ExchangeInconsistentMergeType[source]#

Bases: ExchangeError

Error raised when the merge_ex method is called on mixed ctd and bottle exchange types

exception hydro.exchange.ExchangeMagicNumberError[source]#

Bases: ExchangeError

Error raised when the exchange file does not start with BOTTLE or CTD.

exception hydro.exchange.ExchangeDuplicateParameterError[source]#

Bases: ExchangeParameterError

Error raised when the same parameter/unit pair occurs more than once in the excahnge file.

exception hydro.exchange.ExchangeParameterUnitAlignmentError[source]#

Bases: ExchangeParameterError

Error raised when there is a mismatch between the number of parameters and number of units in the exchange file.

exception hydro.exchange.ExchangeOrphanFlagError[source]#

Bases: ExchangeParameterError

Error raised when there exists a flag column with no corresponding parameter column.

exception hydro.exchange.ExchangeOrphanErrorError[source]#

Bases: ExchangeParameterError

Error raised when there exists an error column with no corresponding parameter column.

exception hydro.exchange.ExchangeParameterUndefError(error_data)[source]#

Bases: ExchangeParameterError

Error raised when the library does not have a definition for a parameter/unit pair in the exchange file.

Parameters:

error_data (List[Tuple[str, Optional[str]]]) –

exception hydro.exchange.ExchangeFlaglessParameterError[source]#

Bases: ExchangeParameterError

Error raised when a parameter has a flag column when it is not supposed to.

exception hydro.exchange.ExchangeFlagUnitError[source]#

Bases: ExchangeParameterError

Error raised if a flag column has a non empty units.

class hydro.exchange.ExchangeBottleFlag(flag)[source]#

Bases: enum.IntEnum

Enum representing a WHP Bottle flag

This flag represents information about the sampling device itself (i.e. the niskin bottle). It should only be used for “BTLNBR_FLAG_W” values and should never be used with CTD files.

property definition#

Prints the human readable flag definition

property cf_def#
property has_value#

Should the data this is a flag for contain a value

NOFLAG = 0#
NO_INFO = 1#
GOOD = 2#
LEAKING = 3#
BAD_TRIP = 4#
NOT_REPORTED = 5#
DISCREPANCY = 6#
UNKNOWN = 7#
PAIR = 8#
NOT_SAMPLED = 9#
class hydro.exchange.ExchangeCTDFlag(flag)[source]#

Bases: enum.IntEnum

Enum where members are also (and must be) ints

property definition#
property cf_def#
property has_value#
NOFLAG = 0#
UNCALIBRATED = 1#
GOOD = 2#
QUESTIONABLE = 3#
BAD = 4#
NOT_REPORTED = 5#
INTERPOLATED = 6#
DESPIKED = 7#
NOT_SAMPLED = 9#
class hydro.exchange.ExchangeSampleFlag(flag)[source]#

Bases: enum.IntEnum

Enum where members are also (and must be) ints

property definition#
property cf_def#
property has_value#
NOFLAG = 0#
MISSING = 1#
GOOD = 2#
QUESTIONABLE = 3#
BAD = 4#
NOT_REPORTED = 5#
MEAN = 6#
CHROMA_MANUAL = 7#
CHROMA_IRREGULAR = 8#
NOT_SAMPLED = 9#
hydro.exchange.CCHDO_VERSION[source]#
hydro.exchange.log[source]#
hydro.exchange.DIMS = ('N_PROF', 'N_LEVELS')[source]#
hydro.exchange.EXPOCODE[source]#
hydro.exchange.STNNBR[source]#
hydro.exchange.CASTNO[source]#
hydro.exchange.SAMPNO[source]#
hydro.exchange.DATE[source]#
hydro.exchange.TIME[source]#
hydro.exchange.LATITUDE[source]#
hydro.exchange.LONGITUDE[source]#
hydro.exchange.CTDPRS[source]#
hydro.exchange.BTLNBR[source]#
hydro.exchange.COORDS[source]#
hydro.exchange.FLAG_SCHEME[source]#
hydro.exchange.GEOMETRY_VARS = ('expocode', 'station', 'cast', 'section_id', 'time')[source]#
hydro.exchange.FILLS_MAP[source]#
class hydro.exchange.FileType[source]#

Bases: enum.Enum

Generic enumeration.

Derive from this class to define new enumerations.

CTD = 'C'[source]#
BOTTLE = 'B'[source]#
hydro.exchange.WHPNameIndex[source]#
hydro.exchange.WHPParamUnit[source]#
hydro.exchange._bottle_get_params(params_units)[source]#

Given an ordered iterable of param, unit pairs, return the index of the column in the datafile for known WHP params.

Exchange files have comma separated parameter names on one line, and the corresponding units on the next. This function will search for this name+unit pair in the builtin database of known WHP parameter names and return a mapping of WHPName to column indicies.

It is currently an error for the parameter in a file to not be in the built in database.

This function will ignore uncertainty (error) columns and flag columns, those are parsed by other functions.

Warning

Convert semantically empty units (e.g. empty string, all whitespace) to None before passing into this function

Note

The parameter name database will convert unambiguous aliases to their canonical exchange parameter and unit pair.

Parameters:

params_units (tuple in the form (str, str) or (str, None)) – Paired (e.g. zip) parameter names and units

Returns:

Mapping of WHPName to column indicies

Return type:

dict with keys of WHPName and values of int

Raises:

ExchangeParameterUndefError – if the parameter unit pair cannot be found in the built in database

hydro.exchange._bottle_get_flags(params_units, whp_params)[source]#

Given an ordered iterable of param unit pairs and WHPNames known to be in the file, return the index of the column indicies of the flags for the WHPNames.

Exchange files can have status flags for some of the parameters. Flag columns must have no units. Some parameters must not have status flags, these include the spatiotemporal parameters (e.g. lat, lon, but also pressure) and the sample identifying parameters (expocode, station, cast, sample, but not bottle id).

Parameters:
  • params_units (tuple in the form (str, str) or (str, None)) – Paired (e.g. zip) parameter names and units

  • whp_params (Mapping of WHPName to int) – Mapping of parameters known to be in the file, this is the output of _bottle_get_params()

Returns:

Mapping of WHPName to column indicies for the status flag column

Return type:

dict with keys of WHPName and values of int

Raises:
hydro.exchange._bottle_get_errors(params_units, whp_params)[source]#

Given an ordered iterable of param unit pairs and WHPNames known to be in the file, return the index of the column indicies of the errors/uncertanties for the WHPNames.

Some parameters may have uncertanties associated with them, this function finds those columns and pairs them with the correct parameter.

Note

There is no programable way to find the error columns for a given unit (e.g. no common suffix like the flags). This must be done via lookup in the built in database of params.

Parameters:
  • params_units (tuple in the form (str, str) or (str, None)) – Paired (e.g. zip()) parameter names and units

  • whp_params (Mapping of WHPName to int) – Mapping of parameters known to be in the file, this is the output of _bottle_get_params()

Returns:

Mapping of WHPName to column indicies for the error column

Return type:

dict with keys of WHPName and values of int

hydro.exchange._ctd_get_header(line, dtype=str)[source]#
hydro.exchange._is_all_dataarray(val)[source]#
Parameters:

val (List[Any]) –

Return type:

typing_extensions.TypeGuard[List[xarray.DataArray]]

hydro.exchange.flatten_cdom_coordinate(dataset)[source]#

Takes the a dataset with a CDOM wavelength and explocdes it back into individual variables

Parameters:

dataset (xarray.Dataset) –

Return type:

xarray.Dataset

hydro.exchange.add_cdom_coordinate(dataset)[source]#

Find all the paraters in the cdom group and add their wavelength in a new coordinate

Parameters:

dataset (xarray.Dataset) –

Return type:

xarray.Dataset

hydro.exchange.add_geometry_var(dataset)[source]#

Adds a CF-1.8 Geometry container variable to the dataset

This allows for compatabiltiy with tools like gdal

Parameters:

dataset (xarray.Dataset) –

Return type:

xarray.Dataset

hydro.exchange.add_profile_type(dataset, ftype)[source]#

Adds a profile_type string variable to the dataset.

This is for ODV compatability

Warning

Currently mixed profile types are not supported

Parameters:
Return type:

xarray.Dataset

hydro.exchange.finalize_ancillary_variables(dataset)[source]#

Turn the ancillary variable attr into a space seperated string

It is nice to have the ancillary variable be a list while things are being read into it

Parameters:

dataset (xarray.Dataset) –

hydro.exchange.combine_bottle_time(dataset)[source]#

Combine the bottle dates and times if present

Raises if only one is present

Parameters:

dataset (xarray.Dataset) –

hydro.exchange.check_is_subset_shape(a1, a2, strict='disallowed')[source]#

Ensure that the shape of the data in a2 is a subset (or strict subset) of the data shape of a1

For a given set of param, flag, and error arrays you would want to ensure that: * errors are a subset of params (strict is allowed) * params are a subset of flags (strict is allowed)

For string vars, the empty string is considered the “nothing” value. For woce flags, flag 9s should be converted to nans (depending on scheme flag 5 and 1 may not have param values)

Return a boolean array of invalid locations

Parameters:
  • a1 (numpy.typing.NDArray) –

  • a2 (numpy.typing.NDArray) –

Return type:

numpy.typing.NDArray[numpy.bool_]

hydro.exchange.check_flags(dataset, raises=True)[source]#

Check WOCE flag values agaisnt their param and ensure that the param either has a value or is “nan” depedning on the flag definition.

Return a boolean array of invalid locations?

Parameters:

dataset (xarray.Dataset) –

class hydro.exchange._ExchangeData[source]#

Dataclass containing exchange data which has been parsed into ndarrays

single_profile: bool[source]#
param_cols: Dict[cchdo.params.WHPName, numpy.ndarray][source]#
flag_cols: Dict[cchdo.params.WHPName, numpy.ndarray][source]#
error_cols: Dict[cchdo.params.WHPName, numpy.ndarray][source]#
param_precisions: Dict[cchdo.params.WHPName, numpy.typing.NDArray[numpy.int_]][source]#
error_precisions: Dict[cchdo.params.WHPName, numpy.typing.NDArray[numpy.int_]][source]#
comments: str[source]#
__post_init__()[source]#
set_expected(params, flags, errors)[source]#

Puts fill columns for expected params which are missing

This can occur when there are disjoint columns in CTD files

Parameters:
  • params (Set[cchdo.params.WHPName]) –

  • flags (Set[cchdo.params.WHPName]) –

  • errors (Set[cchdo.params.WHPName]) –

split_profiles()[source]#

Split into single profile containing _ExchangeData instances

Done by looking at the expocode+station+cast composate keys

str_lens()[source]#

Figure out the length of all the string params

The char size can vary by platform.

Return type:

Dict[cchdo.params.WHPName, int]

hydro.exchange._get_fill_locs(arr, fill_values=('-999',))[source]#
Parameters:

fill_values (Tuple[str, Ellipsis]) –

class hydro.exchange._ExchangeInfo[source]#

Low level dataclass containing the parts of an exchange file

property stamp[source]#

Returns the filestamp of the exchange file

e.g. “BOTTLE,20210301CCHSIOAMB”

property comments[source]#

Returns the comments of the exchange file with leading # stripped

property ctd_headers[source]#

Returns a dict of the CTD headers and their value

property data[source]#

Returns the data block of an exchange file as a tuple of strs. One line per entry.

property post_data[source]#

Returns any post data content as a tuple of strs

property _np_data_block[source]#
stamp_slice: slice[source]#
comments_slice: slice[source]#
ctd_headers_slice: slice[source]#
params_idx: int[source]#
units_idx: int[source]#
data_slice: slice[source]#
post_data_slice: slice[source]#
_raw_lines: Tuple[str, Ellipsis][source]#
params()[source]#

Returns a list of all parameters in the file (including CTD “headers”)

units()[source]#

Returns a list of all the units in the file (including CTD “headers”)

Will have the same shape as params

whp_params()[source]#

Parses the params and units for base parameters

Returns a dict with a WHPName to column index mapping

whp_flags()[source]#

Parses the params and units for flag values

returns a dict with a WHPName to column index of flags mapping

whp_errors()[source]#

Parses the params and units for uncertanty values

returns a dict with a WHPName to column index of errors mapping

finalize(fill_values=('-999',), precision_source='file')[source]#

Parse all the data into ndarrays of the correct dtype and shape

Returns an ExchangeData dataclass

Return type:

_ExchangeData

classmethod from_lines(lines, ftype)[source]#

Figure out the line numbers/indicies of the parts of the exchange file

Parameters:
  • lines (Tuple[str, Ellipsis]) –

  • ftype (FileType) –

hydro.exchange._extract_numeric_precisions(data)[source]#

Get the numeric precision of a printed decimal number

Parameters:

data (numpy.typing.NDArray[numpy.str_]) –

Return type:

numpy.typing.NDArray[numpy.int_]

hydro.exchange._is_valid_exchange_numeric(data)[source]#
Parameters:

data (numpy.typing.NDArray[numpy.str_]) –

Return type:

numpy.bool_

hydro.exchange.ExchangeIO[source]#
hydro.exchange._combine_dt_ndarray(date_arr, time_arr=None, time_pad=False)[source]#
Parameters:
  • date_arr (numpy.typing.NDArray[numpy.str_]) –

  • time_arr (Optional[numpy.typing.NDArray[numpy.str_]]) –

Return type:

numpy.ndarray

hydro.exchange.sort_ds(dataset)[source]#

Sorts the data values in the dataset

Ensures that profiles are in the following order: * Earlier before later (time will increase) * Southerly before northerly (latitude will increase) * Westerly before easterly (longitude will increase)

The two xy sorts are esentially tie breakers for when we are missing “time”

Inside profiles: * Shallower before Deeper (pressure will increase)

Parameters:

dataset (xarray.Dataset) –

Return type:

xarray.Dataset

hydro.exchange.check_sorted(dataset)[source]#

Check that the dataset is sorted by the rules in sort_ds()

Parameters:

dataset (xarray.Dataset) –

Return type:

bool

hydro.exchange.WHPNameAttr[source]#
hydro.exchange.combine_dt(dataset, is_coord=True, date_name=DATE, time_name=TIME, time_pad=False)[source]#

Combine the exchange style string variables of date and optinally time into a single variable containing real datetime objects

This will remove the time variable if present, and replace then rename the date variable. Date is replaced/renamed to maintain variable order in the xr.DataSet

Parameters:
  • dataset (xarray.Dataset) –

  • is_coord (bool) –

  • date_name (cchdo.params.WHPName) –

  • time_name (cchdo.params.WHPName) –

Return type:

xarray.Dataset

hydro.exchange.set_axis_attrs(dataset)[source]#

Set the CF axis attribute on our axis variables (XYZT)

  • longitude = “X”

  • latitude = “Y”

  • pressure = “Z”, addtionally, positive is down

  • time = “T”

Parameters:

dataset (xarray.Dataset) –

Return type:

xarray.Dataset

hydro.exchange.set_coordinate_encoding_fill(dataset)[source]#

Sets the _FillValue encoidng to None for 1D coordinate vars

Parameters:

dataset (xarray.Dataset) –

Return type:

xarray.Dataset

hydro.exchange._load_raw_exchange(filename_or_obj, *, file_seperator=None, keep_seperator=True)[source]#
Parameters:
  • filename_or_obj (ExchangeIO) –

  • file_seperator (Optional[str]) –

Return type:

List[str]

hydro.exchange.all_same(ndarr)[source]#

Test if all the values of an ndarray are the same value

Parameters:

ndarr (numpy.ndarray) –

Return type:

numpy.bool_

class hydro.exchange.CheckOptions[source]#

Bases: TypedDict

A simple typed namespace. At runtime it is equivalent to a plain dict.

TypedDict creates a dictionary type that expects all of its instances to have a certain set of keys, where each key is associated with a value of a consistent type. This expectation is not checked at runtime but is only enforced by type checkers. Usage:

class Point2D(TypedDict):
    x: int
    y: int
    label: str

a: Point2D = {'x': 1, 'y': 2, 'label': 'good'}  # OK
b: Point2D = {'z': 3, 'label': 'bad'}           # Fails type check

assert Point2D(x=1, y=2, label='first') == dict(x=1, y=2, label='first')

The type info can be accessed via Point2D.__annotations__. TypedDict supports two additional equivalent forms:

Point2D = TypedDict('Point2D', x=int, y=int, label=str)
Point2D = TypedDict('Point2D', {'x': int, 'y': int, 'label': str})

By default, all keys must be present in a TypedDict. It is possible to override this by specifying totality. Usage:

class point2D(TypedDict, total=False):
    x: int
    y: int

This means that a point2D TypedDict can have any of the keys omitted.A type checker is only expected to support a literal False or True as the value of the total argument. True is the default, and makes all items defined in the class body be required.

The class syntax is only supported in Python 3.6+, while two other syntax forms work for Python 2.7 and 3.2+

flags: bool[source]#
hydro.exchange.read_exchange(filename_or_obj, *, fill_values=('-999',), checks=None, precision_source='file', file_seperator=None, keep_seperator=True)[source]#

Loads the data from filename_or_obj and returns a xr.Dataset with the CCHDO CF/netCDF structure

Parameters:
  • filename_or_obj (ExchangeIO) –

  • checks (Optional[CheckOptions]) –

Return type:

xarray.Dataset