You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently the metrics we report for a StartUnit job are:
Overall time spent on the job, measured on hirte manager
Time that the unit was in "activating" state, read from unit properties
Time it took for systemd job to run, measured on agent
But when starting a unit with dependency on another node (i.e. requiring proxy service), we can see quite a long time measured on the agent (3), and much shorter time spent in "activating" state.
Please describe the solution you'd like
We need higher resolution metrics on starting service with proxy.
The text was updated successfully, but these errors were encountered:
The dependencies that are resolved via the proxy service are stored within the systemd units, so I think we need a new signal for this (e.g. ProxyExecutionTime). I the same turn, the API can be more aligned:
Signals:
ProxyExecutionTime(s node, s requesting_node, s method, t time_micros)
Each time a proxy is being created, we measure the time it takes and emit a signal when it finishes
AgentExecutionTime(s node, s unit, s method, t time_micros)
Same as the AgentJobMetrics, it returns the time the agent took executing the request.
HirteExecutionTime(s node, s unit, s method, t time_micros)
Same as the StartUnitMetrics, but only returns the overall hirte time measured.
At hirtectl, in the callbacks we are collect the reported times for the node and unit and, since there is the requesting_node field in the ProxyExecutionTime, the dependency chain/tree can be resolved.
When receiving the HirteExecutionTime in hirtectl, all other measurements should have been collected. In a subsequent loop the unit net time is collected for each unit by using the GetUnitProperty (https://github.com/containers/hirte/blob/main/data/org.containers.hirte.Node.xml#L35-L40). And since hirtectl requires the method (start, stop, ...) the required properties can be retrieved, e.g.
start: InactiveExitTimestampMonotonic and ActiveEnterTimestampMonotonic
Please describe what you would like to see
Currently the metrics we report for a StartUnit job are:
But when starting a unit with dependency on another node (i.e. requiring proxy service), we can see quite a long time measured on the agent (3), and much shorter time spent in "activating" state.
Please describe the solution you'd like
We need higher resolution metrics on starting service with proxy.
The text was updated successfully, but these errors were encountered: