add/remove param S04P merge example¶
Demos doing a merge on S04P CTD data, this notebook was made to test the functions on a real dataset
[1]:
from zipfile import ZipFile
from collections import defaultdict
import xarray as xr
import requests
from cchdo.hydro.core import add_param, remove_param
import cchdo.hydro.accessors
The netCDF4 python library really wants things on disk to read, there are alternatives and ways around this, but it’s easy to just write it out
[2]:
ctd_dl = requests.get("https://cchdo.ucsd.edu/data/41655/320620180309_ctd.nc")
with open("320620180309_ctd.nc", "wb") as f:
f.write(ctd_dl.content)
beamcp = requests.get("https://cchdo.ucsd.edu/data/14754/2018_S04P.zip")
with open("2018_S04P.zip", "wb") as f:
f.write(beamcp.content)
[3]:
# Load the data, note that I did some exploring of the beamcp input before finalizing this for the "load step"
ctd_data = xr.load_dataset("320620180309_ctd.nc")
with ZipFile("2018_S04P.zip") as zf:
beamcp = zf.read("2018_S04P.txt").decode("ascii").splitlines()
beam_cp_cells = [line.split() for line in beamcp]
[4]:
# split the profiles into... profiles
profiles = defaultdict(list)
last_profile = None
for line in beam_cp_cells:
if len(line) > 2:
*_, station, cast = line
last_profile = (station, cast)
continue
if last_profile is None:
continue
profiles[last_profile].append(line)
The remove and add param functions are what was being tested here, the incoming data repalces the raw data
[5]:
removed_raw_beamcp = remove_param(
ctd_data, "CTDXMISS [VOLTS]", delete_param=True, require_empty=False
)
new_ctd = add_param(removed_raw_beamcp, "CTDBEAMCP [/METER]")
Make the merge_fq structure for merging, note that the incoming values are kept as strings, this is so the extract precision functions can update the print format in the netCDF file
[6]:
fq_json = []
expocode = new_ctd.expocode[0].item()
for (station, cast), profile in profiles.items():
cast = int(cast)
for row in profile:
if row[1].startswith("-88"):
continue
# in the recalibrated ODF file, the last pressure level of station 119 was dropped
if station == "119" and row[0] == "3210.0":
continue
fq_json.append(
{
"EXPOCODE": expocode,
"STNNBR": station,
"CASTNO": cast,
"SAMPNO": row[0],
"CTDBEAMCP [/METER]": row[1],
}
)
not quite planned, but need to fix the station ids to remove leading zeros, this was done in the origional merge by see in 2019
[7]:
import numpy as np
new_ctd["station"][:] = np.strings.lstrip(new_ctd.station.values.astype(np.str_), "0")
Do the actual merge, this took like 40 seconds on an m1, kinda long…
[8]:
%%time
merged_ctd = new_ctd.cchdo.merge_fq(fq_json)
CPU times: user 2 s, sys: 45.6 ms, total: 2.05 s
Wall time: 2.07 s
[9]:
merged_ctd.attrs["comments"] = (
f"Remerged CTDBEAMCP data into ODF resubmission\n{merged_ctd.comments}"
)
merged_ctd.attrs["cchdo_software_version"] = "hydro 1.0.2.9"
Write some output files to examine and share with colleagues
[10]:
merged_ctd.to_netcdf("s04p_merged_ctd.nc")
merged_ctd.cchdo.to_exchange("s04p_merged_ct1.zip")
[ ]: