Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

metrics: add more detailed measurements for starting proxy service #331

Open
mkemel opened this issue May 31, 2023 · 1 comment
Open

metrics: add more detailed measurements for starting proxy service #331

mkemel opened this issue May 31, 2023 · 1 comment
Labels
backlog This is next up in priority enhancement New feature or request

Comments

@mkemel
Copy link
Member

mkemel commented May 31, 2023

Please describe what you would like to see

Currently the metrics we report for a StartUnit job are:

  1. Overall time spent on the job, measured on hirte manager
  2. Time that the unit was in "activating" state, read from unit properties
  3. Time it took for systemd job to run, measured on agent

image

But when starting a unit with dependency on another node (i.e. requiring proxy service), we can see quite a long time measured on the agent (3), and much shorter time spent in "activating" state.

Please describe the solution you'd like

We need higher resolution metrics on starting service with proxy.

@mkemel mkemel added the jira Issues that are synced to Jira label May 31, 2023
@engelmi
Copy link
Member

engelmi commented Jun 5, 2023

Proposal

The dependencies that are resolved via the proxy service are stored within the systemd units, so I think we need a new signal for this (e.g. ProxyExecutionTime). I the same turn, the API can be more aligned:

Signals:

  • ProxyExecutionTime(s node, s requesting_node, s method, t time_micros)
    Each time a proxy is being created, we measure the time it takes and emit a signal when it finishes
  • AgentExecutionTime(s node, s unit, s method, t time_micros)
    Same as the AgentJobMetrics, it returns the time the agent took executing the request.
  • HirteExecutionTime(s node, s unit, s method, t time_micros)
    Same as the StartUnitMetrics, but only returns the overall hirte time measured.

At hirtectl, in the callbacks we are collect the reported times for the node and unit and, since there is the requesting_node field in the ProxyExecutionTime, the dependency chain/tree can be resolved.
When receiving the HirteExecutionTime in hirtectl, all other measurements should have been collected. In a subsequent loop the unit net time is collected for each unit by using the GetUnitProperty (https://github.com/containers/hirte/blob/main/data/org.containers.hirte.Node.xml#L35-L40). And since hirtectl requires the method (start, stop, ...) the required properties can be retrieved, e.g.

@engelmi engelmi added the enhancement New feature or request label Jul 26, 2023
@mkemel mkemel added backlog This is next up in priority and removed jira Issues that are synced to Jira labels Nov 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backlog This is next up in priority enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants