Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Listing options on cmdline #439

Open
wants to merge 30 commits into
base: master
Choose a base branch
from
Open

Listing options on cmdline #439

wants to merge 30 commits into from

Conversation

elshize
Copy link
Member

@elshize elshize commented Sep 28, 2020

I also got rid of the index types macro replacing it with some template metaprogramming.

@elshize elshize self-assigned this Sep 28, 2020
@codecov
Copy link

codecov bot commented Sep 28, 2020

Codecov Report

Merging #439 (9969a8c) into master (a28100d) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #439   +/-   ##
=======================================
  Coverage   93.26%   93.26%           
=======================================
  Files          92       93    +1     
  Lines        4926     4941   +15     
=======================================
+ Hits         4594     4608   +14     
- Misses        332      333    +1     
Impacted Files Coverage Δ
include/pisa/mappable/mappable_vector.hpp 100.00% <ø> (ø)
include/pisa/index_types.hpp 100.00% <100.00%> (ø)
include/pisa/wand_data.hpp 85.55% <100.00%> (ø)
include/pisa/wand_data_raw.hpp 100.00% <100.00%> (ø)
include/pisa/bit_vector.hpp 97.54% <0.00%> (-0.41%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a28100d...9969a8c. Read the comment docs.

try {
with_index(app.index_encoding(), app.index_filename(), [&](auto index) {
intersect<decltype(index), wand_raw_index>(
app.index_filename(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused, if we do the mapping in the with_index function, why do we pass the index filename and not the index already mapped?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have the index here. This is the whole point, we don't know the type, and we can get away with it because we pass a template function that takes any index. Then it's all handled in with_index. It's a workaround to avoid having ifs and elses everywhere we load an index.

Maybe load_and_run would be a better name?

I think I mentioned it somewhere on slack, there might be an alternative to write a function that loads an index to std::variant. Then we would have to run

std::visit([&](auto index) {
    // Do stuff.
}, load_index(...));

I don't see much advantage in this, it's essentially the same. It could make a difference if we want to pass the index around, which might be handy.

I actually suggest having both methods. One shortcut for just "run with this index type", and the other "give me an index, I'll handle matching later on my own".

In the latter case, I also suggest having a custom function, hence my naming. I imagined having:

load_index(encoding, path);
with_index(index_as_variant, fn);
with_index(encoding, path, fn);

Not sure about the name though. Maybe run_for_index? Idk, open to suggestions.

try {
with_index(app.index_encoding(), app.index_filename(), [&](auto index) {
selective_queries<decltype(index)>(
app.index_filename(), app.index_encoding(), app.queries());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

@elshize
Copy link
Member Author

elshize commented May 2, 2021

Overview

I refactor this to simplify the usage. There is a single public way of retrieving a "type" object for an index based on the encoding:

auto index_type = IndexType::resolve("pef"); // or IndexType::resolve_profiling for the indexes with profiling

This is an object that internally holds a variant of index type markers, meaning objects that have the type of the index encoded as a typedef. This is necessary, because there are some situations where we need to know the type before creating the index itself, in particular when compressing an index.

To run some code with the index type, we need to call:

index_type.execute([](auto&& index_type) {
    using Index = typename decltype(index_type)::type;
});

However, as most commonly we want to load an index and do something with it, there's another method that loads the index and runs some code on it:

index_type.load_and_execute(index_path, [](auto&& index) {
    // decltype(index) is something like `block_simdbp_index`
});

which is equivalent to:

index_type.execute([&index_path](auto&& index_type) {
    using Index = typename decltype(index_type)::type;
    auto index = Index(MemorySource::mapped_file(index_path));
});

Adding new index type

All available index types are defined in the pisa::detail::IndexType variant in index_types.hpp. This is what INDEX_TYPES macro used to be, basically. The rest of the machinery is in template code within pisa::detail namespace in the same file.

When adding a new index type (to avoid macros), one needs to edit pisa::detail::index_name function to define its name. Note, however, that because of the constexpr code, if an index is added to pisa::detail::IndexType and not to index_name, then it will fail to compile; so this won't lead to any bugs.

@elshize
Copy link
Member Author

elshize commented May 2, 2021

Note: tests are failing because I'm waiting for #457 to be merged.

@elshize
Copy link
Member Author

elshize commented May 2, 2021

I'll also work on documentation in code.

Copy link
Member

@JMMackenzie JMMackenzie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks much cleaner, I think this is definitely a nice improvement. I left a minor comment about a func name. @amallia May want to also take a look here since I missed the initial review for this one.

};

template <bool Profile, std::size_t I, typename Fn>
auto resolve_index_n(std::string_view encoding, Fn fn) -> bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this called resolve_index_n, am I missing something?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I suppose it's a big inconsistent. What this one does is it attempts to resolve the encoding at position I (template parameter), and returns true if it succeeds. I guess it would be clearer if I used N instead. I'll also add some docs to all of these today.

tools/app.hpp Show resolved Hide resolved
@elshize
Copy link
Member Author

elshize commented May 3, 2021

GCC 7 keeps crashing with an internal bug, even though the same version compiles it just fine locally. I couldn't yet figure out what's wrong. I'll probably look at it again in a few days. It might be some issue with libstdc++ rather than the compiler alone but I'm not sure.

@JMMackenzie
Copy link
Member

RE. failing builds, I think this is a memory consumption issue. For reference, the peak memory usage for builds with a single thread on the evaluate_queries target (which I think is the heaviest target we have):

  • master = 3.6 GB
  • this PR = 5.4 GB

This is using gcc 7.3.0 on Ubuntu.
So this PR does seem to be increasing the build memory substantially!

@elshize
Copy link
Member Author

elshize commented Jun 1, 2021

RE. failing builds, I think this is a memory consumption issue. For reference, the peak memory usage for builds with a single thread on the evaluate_queries target (which I think is the heaviest target we have):

  • master = 3.6 GB
  • this PR = 5.4 GB

This is using gcc 7.3.0 on Ubuntu.
So this PR does seem to be increasing the build memory substantially!

Thanks for looking into this. I'll look into it in a spare moment. If it can't be fixed, we'll need to look for other solutions, this is not an acceptable tradeoff.

@elshize
Copy link
Member Author

elshize commented Jun 6, 2021

@JMMackenzie I just ran the whole compilation from master on my laptop, and I get 5.4 GB peak memory. Can you try again and make sure you have current master and compile all tools and tests? This was Debug build.

@JMMackenzie
Copy link
Member

For posterity, using time -v:

target evaluate_queries
MASTER
gcc   = Maximum resident set size (kbytes): 3823100
clang = Maximum resident set size (kbytes): 4573132
LIST-OPTIONS
gcc   = Maximum resident set size (kbytes): 5651552
clang = Maximum resident set size (kbytes): 7066608

@JMMackenzie
Copy link
Member

Looks like tidy is running out of memory now based on the logs:

Error running '/usr/bin/clang-tidy': Child killed
make[2]: *** [tools/CMakeFiles/evaluate_queries.dir/build.make:63: tools/CMakeFiles/evaluate_queries.dir/evaluate_queries.cpp.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:5563: tools/CMakeFiles/evaluate_queries.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....
Error: Process completed with exit code 137.

@elshize
Copy link
Member Author

elshize commented Jun 12, 2021

Right, the problem from this PR still persists, but fixing the other problem makes some tests pass.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants