add/remove param S04P merge example

Demos doing a merge on S04P CTD data, this notebook was made to test the functions on a real dataset

[1]:
from zipfile import ZipFile
from collections import defaultdict

import xarray as xr
import requests

from cchdo.hydro.core import add_param, remove_param
import cchdo.hydro.accessors

The netCDF4 python library really wants things on disk to read, there are alternatives and ways around this, but it’s easy to just write it out

[2]:
ctd_dl = requests.get("https://cchdo.ucsd.edu/data/41655/320620180309_ctd.nc")
with open("320620180309_ctd.nc", "wb") as f:
    f.write(ctd_dl.content)
beamcp = requests.get("https://cchdo.ucsd.edu/data/14754/2018_S04P.zip")
with open("2018_S04P.zip", "wb") as f:
    f.write(beamcp.content)
[3]:
# Load the data, note that I did some exploring of the beamcp input before finalizing this for the "load step"
ctd_data = xr.load_dataset("320620180309_ctd.nc")
with ZipFile("2018_S04P.zip") as zf:
    beamcp = zf.read("2018_S04P.txt").decode("ascii").splitlines()
    beam_cp_cells = [line.split() for line in beamcp]
[4]:
# split the profiles into... profiles
profiles = defaultdict(list)
last_profile = None
for line in beam_cp_cells:
    if len(line) > 2:
        *_, station, cast = line
        last_profile = (station, cast)
        continue
    if last_profile is None:
        continue
    profiles[last_profile].append(line)

The remove and add param functions are what was being tested here, the incoming data repalces the raw data

[5]:
removed_raw_beamcp = remove_param(
    ctd_data, "CTDXMISS [VOLTS]", delete_param=True, require_empty=False
)
new_ctd = add_param(removed_raw_beamcp, "CTDBEAMCP [/METER]")

Make the merge_fq structure for merging, note that the incoming values are kept as strings, this is so the extract precision functions can update the print format in the netCDF file

[6]:
fq_json = []
expocode = new_ctd.expocode[0].item()
for (station, cast), profile in profiles.items():
    cast = int(cast)
    for row in profile:
        if row[1].startswith("-88"):
            continue
        # in the recalibrated ODF file, the last pressure level of station 119 was dropped
        if station == "119" and row[0] == "3210.0":
            continue
        fq_json.append(
            {
                "EXPOCODE": expocode,
                "STNNBR": station,
                "CASTNO": cast,
                "SAMPNO": row[0],
                "CTDBEAMCP [/METER]": row[1],
            }
        )

not quite planned, but need to fix the station ids to remove leading zeros, this was done in the origional merge by see in 2019

[7]:
import numpy as np

new_ctd["station"][:] = np.strings.lstrip(new_ctd.station.values.astype(np.str_), "0")

Do the actual merge, this took like 40 seconds on an m1, kinda long…

[8]:
%%time
merged_ctd = new_ctd.cchdo.merge_fq(fq_json)
CPU times: user 2 s, sys: 45.6 ms, total: 2.05 s
Wall time: 2.07 s
[9]:
merged_ctd.attrs["comments"] = (
    f"Remerged CTDBEAMCP data into ODF resubmission\n{merged_ctd.comments}"
)
merged_ctd.attrs["cchdo_software_version"] = "hydro 1.0.2.9"

Write some output files to examine and share with colleagues

[10]:
merged_ctd.to_netcdf("s04p_merged_ctd.nc")
merged_ctd.cchdo.to_exchange("s04p_merged_ct1.zip")
[ ]: