Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create performance benchmarks for key pgroll features #408

Open
andrew-farries opened this issue Oct 16, 2024 · 2 comments
Open

Create performance benchmarks for key pgroll features #408

andrew-farries opened this issue Oct 16, 2024 · 2 comments
Assignees
Milestone

Comments

@andrew-farries
Copy link
Collaborator

Gather benchmarks data for the following parts of pgroll:

  • Backfill duration - How long does it take to perform backfill operations on a table of some fixed size (say 10^7 rows)?
  • Effect of dual writes - What overhead do the up/down triggers incur on UPDATE heavy tables?
  • read_schema query performance - Benchmark the performance of the read_schema query, run on every DDL statement to capture 'inferred' migrations.

Having these benchmarks in place would allow us to measure performance improvements over time and avoid regressions.

@andrew-farries andrew-farries added this to the v1 milestone Oct 16, 2024
@ryanslade
Copy link
Contributor

I'd like to have a go at this.

In a perfect world we'd probably want to run these against every commit, but I imagine they may take a while to run and I don't want to affect the velocity of getting things into main. Maybe a compromise is that we spin up an environment once a day and run the tests against all new commits?

Apart from actually writing the benchmarks, we need to decide on a few things:

  • How often do we run them? I suggest once a day as mentioned above
  • Where do we run them? I think we may want to spin up a dedicated environment in EC2 so that the results are consistent
  • Where do we store results? Ideally, since this is an open source project, we may want the results to be public. Perhaps we can upload results to a wiki / docs area in this repo?

Anything else?

@andrew-farries
Copy link
Collaborator Author

I think what you suggest is a good start. We want the benchmarks for a couple of reasons:

  • Guard against performance regressions
  • Have benchmarks available as part of the public documentation for the repository.

I suggest running the benchmarks as a separate workflow that is automatically run on changes to main and that can also be invoked manually on branches.

A consistent environment in terms of hardware and probably also software (maybe run the benchmarks in a container) is a must too.

Results could be uploaded to object storage and pulled from there into our docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants