Select and Interpolate Time Series

This process selects time series with sufficient data for interpolation and returns continuous time series where missing values (NAs) are replaced using linear interpolation (R function zoo::na.approx). The result is a complete time series without gaps. Timeseries with insufficient data are removed. / Split data into sub-tables for each season and HELCOM_ID separately. Create a list to store sub-tables of transparency (R function 'ts_selection_interpolation').

daugava use case AquaINFRA ts_selection_interpolation R

Inputs
Id Title Data Type Description
input_data Input table string URL to the input table containing data for selection and interpolation. This table includes grouping variables (if applicable), the year (or other time identifier) and the value columns to be interpolated. For example, use the result from mean_by_group.
colnames_relevant Column names identifying group(s) string Column name(s) describing relevant values in the dataset. These columns are treated as grouping identifiers, and a combination of all specified columns will be used to define unique groups. For each group, a separate time series is analyzed and interpolated individually.
missing_threshold_percentage Threshold for missing values number Threshold for the allowed percentage of missing data points (NAs). For example, a value of 80 means series with more than 80% missing data will be removed. Example = "80".
colname_year Column name for time string The name of the column containing the year (or other time identifier, such as quarter, month, or day). Example = "year"
colname_value Column name for values string The name of the column containing the values to be considered for interpolation.
min_data_point Minimum number of data points required integer The minimum number of data points required in a time series for it to be included in the interpolation process. Example = "10".
Outputs
Id Title Description
interpolated_time_series Interpolated time series A table containing continuous time series for each unique group defined by rel_cols, with missing values replaced by linear interpolation. Time series with insufficient data based on the missing_threshold_percentage or min_data_point are excluded.

Educational resources and documentation

Jobs

Browse jobs

Links

Execution modes

  • Synchronous
  • Asynchronous