While working at Maersk there was a power-to-speed model used for planning, basically a polynomial inferred from a regression, taking in some extra inputs besides power and outputting speed. For this model there was defined valid input power levels. In this post I will talk about my thoughts on why it seems useful to define the operational range AND data type of an inferred function in an operational setting.
A very basic implementation of the operational range would be. If implementered in Python then I would also use dicts to specify the datatypes.
# dtype.py
# The range of variables in X.
= {'power': [0, 80000]}
X_RANGE
# The range of variables in y.
= {'speed': [0, 30]}
Y_RANGE
# The datatype of the inputs.
= {'power': 'float64'}
X_DTYPE
# The datatype of the output.
= {'power': 'float64'} Y_DTYPE
Useful things you can do when you define the operational range:
To be even more specific one could specify a range on valid speed levels \([s_{min}(p), s_{max}(p)]\) for each power level \(p\). This would naturally lead to a lower bound on how bad the model is:
\[ \lvert \text{prediction}(p) - \text{actual}(p) \rvert \leq s_{max}(p) - s_{min}(p) \]
With a sample of the data one can do similar thing but above could be faster to compute and easier to standardize.
A sample of the data with 1000 rows are however easy to keep
as a pytest
fixture so I would probably do this
anyway.
Feel free to comment here below. A Github account is required.