Listing options on cmdline #439

elshize · 2020-09-28T01:44:28Z

I also got rid of the index types macro replacing it with some template metaprogramming.

codecov · 2020-09-28T02:50:38Z

Codecov Report

Merging #439 (9969a8c) into master (a28100d) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master     #439   +/-   ##
=======================================
  Coverage   93.26%   93.26%           
=======================================
  Files          92       93    +1     
  Lines        4926     4941   +15     
=======================================
+ Hits         4594     4608   +14     
- Misses        332      333    +1

Impacted Files	Coverage Δ
include/pisa/mappable/mappable_vector.hpp	`100.00% <ø> (ø)`
include/pisa/index_types.hpp	`100.00% <100.00%> (ø)`
include/pisa/wand_data.hpp	`85.55% <100.00%> (ø)`
include/pisa/wand_data_raw.hpp	`100.00% <100.00%> (ø)`
include/pisa/bit_vector.hpp	`97.54% <0.00%> (-0.41%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a28100d...9969a8c. Read the comment docs.

amallia · 2020-10-02T17:37:20Z

tools/compute_intersection.cpp

+    try {
+        with_index(app.index_encoding(), app.index_filename(), [&](auto index) {
+            intersect<decltype(index), wand_raw_index>(
+                app.index_filename(),


I am confused, if we do the mapping in the with_index function, why do we pass the index filename and not the index already mapped?

We don't have the index here. This is the whole point, we don't know the type, and we can get away with it because we pass a template function that takes any index. Then it's all handled in with_index. It's a workaround to avoid having ifs and elses everywhere we load an index.

Maybe load_and_run would be a better name?

I think I mentioned it somewhere on slack, there might be an alternative to write a function that loads an index to std::variant. Then we would have to run

std::visit([&](auto index) { // Do stuff. }, load_index(...));

I don't see much advantage in this, it's essentially the same. It could make a difference if we want to pass the index around, which might be handy.

I actually suggest having both methods. One shortcut for just "run with this index type", and the other "give me an index, I'll handle matching later on my own".

In the latter case, I also suggest having a custom function, hence my naming. I imagined having:

load_index(encoding, path); with_index(index_as_variant, fn); with_index(encoding, path, fn);

Not sure about the name though. Maybe run_for_index? Idk, open to suggestions.

amallia · 2020-10-02T17:37:56Z

tools/selective_queries.cpp

+    try {
+        with_index(app.index_encoding(), app.index_filename(), [&](auto index) {
+            selective_queries<decltype(index)>(
+                app.index_filename(), app.index_encoding(), app.queries());


elshize · 2021-05-02T21:11:24Z

Overview

I refactor this to simplify the usage. There is a single public way of retrieving a "type" object for an index based on the encoding:

auto index_type = IndexType::resolve("pef"); // or IndexType::resolve_profiling for the indexes with profiling

This is an object that internally holds a variant of index type markers, meaning objects that have the type of the index encoded as a typedef. This is necessary, because there are some situations where we need to know the type before creating the index itself, in particular when compressing an index.

To run some code with the index type, we need to call:

index_type.execute([](auto&& index_type) {
    using Index = typename decltype(index_type)::type;
});

However, as most commonly we want to load an index and do something with it, there's another method that loads the index and runs some code on it:

index_type.load_and_execute(index_path, [](auto&& index) {
    // decltype(index) is something like `block_simdbp_index`
});

which is equivalent to:

index_type.execute([&index_path](auto&& index_type) {
    using Index = typename decltype(index_type)::type;
    auto index = Index(MemorySource::mapped_file(index_path));
});

Adding new index type

All available index types are defined in the pisa::detail::IndexType variant in index_types.hpp. This is what INDEX_TYPES macro used to be, basically. The rest of the machinery is in template code within pisa::detail namespace in the same file.

When adding a new index type (to avoid macros), one needs to edit pisa::detail::index_name function to define its name. Note, however, that because of the constexpr code, if an index is added to pisa::detail::IndexType and not to index_name, then it will fail to compile; so this won't lead to any bugs.

elshize · 2021-05-02T21:12:01Z

Note: tests are failing because I'm waiting for #457 to be merged.

elshize · 2021-05-02T21:12:16Z

I'll also work on documentation in code.

JMMackenzie

It looks much cleaner, I think this is definitely a nice improvement. I left a minor comment about a func name. @amallia May want to also take a look here since I missed the initial review for this one.

JMMackenzie · 2021-05-03T06:29:55Z

include/pisa/index_types.hpp

+    };
+
+    template <bool Profile, std::size_t I, typename Fn>
+    auto resolve_index_n(std::string_view encoding, Fn fn) -> bool


Why is this called resolve_index_n, am I missing something?

Well, I suppose it's a big inconsistent. What this one does is it attempts to resolve the encoding at position I (template parameter), and returns true if it succeeds. I guess it would be clearer if I used N instead. I'll also add some docs to all of these today.

tools/app.hpp

…options

elshize · 2021-05-03T20:21:20Z

GCC 7 keeps crashing with an internal bug, even though the same version compiles it just fine locally. I couldn't yet figure out what's wrong. I'll probably look at it again in a few days. It might be some issue with libstdc++ rather than the compiler alone but I'm not sure.

JMMackenzie · 2021-06-01T01:57:21Z

RE. failing builds, I think this is a memory consumption issue. For reference, the peak memory usage for builds with a single thread on the evaluate_queries target (which I think is the heaviest target we have):

master = 3.6 GB
this PR = 5.4 GB

This is using gcc 7.3.0 on Ubuntu.
So this PR does seem to be increasing the build memory substantially!

elshize · 2021-06-01T08:52:04Z

RE. failing builds, I think this is a memory consumption issue. For reference, the peak memory usage for builds with a single thread on the evaluate_queries target (which I think is the heaviest target we have):

master = 3.6 GB

this PR = 5.4 GB

This is using gcc 7.3.0 on Ubuntu.
So this PR does seem to be increasing the build memory substantially!

Thanks for looking into this. I'll look into it in a spare moment. If it can't be fixed, we'll need to look for other solutions, this is not an acceptable tradeoff.

elshize · 2021-06-06T13:09:05Z

@JMMackenzie I just ran the whole compilation from master on my laptop, and I get 5.4 GB peak memory. Can you try again and make sure you have current master and compile all tools and tests? This was Debug build.

JMMackenzie · 2021-06-07T00:02:28Z

For posterity, using time -v:

target evaluate_queries
MASTER
gcc   = Maximum resident set size (kbytes): 3823100
clang = Maximum resident set size (kbytes): 4573132
LIST-OPTIONS
gcc   = Maximum resident set size (kbytes): 5651552
clang = Maximum resident set size (kbytes): 7066608

JMMackenzie · 2021-06-11T06:57:44Z

Looks like tidy is running out of memory now based on the logs:

Error running '/usr/bin/clang-tidy': Child killed
make[2]: *** [tools/CMakeFiles/evaluate_queries.dir/build.make:63: tools/CMakeFiles/evaluate_queries.dir/evaluate_queries.cpp.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:5563: tools/CMakeFiles/evaluate_queries.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....
Error: Process completed with exit code 137.

elshize · 2021-06-12T21:38:24Z

Right, the problem from this PR still persists, but fixing the other problem makes some tests pass.

Listing options on cmdline

0bc8d7a

elshize requested review from JMMackenzie and amallia September 28, 2020 01:44

elshize self-assigned this Sep 28, 2020

Refactor + test + fix benchmark

782fb66

amallia approved these changes Oct 2, 2020

View reviewed changes

Merge branch 'master' into list-options

32983f6

amallia requested changes Oct 2, 2020

View reviewed changes

elshize and others added 15 commits October 6, 2020 09:32

Merge branch 'master' into list-options

c97fbdf

Use template functions to load index

6a2067f

Fix noexcept move constructor

11135b2

Remove noexcept due to boost::function not having it

749288d

Remove unnecessary forwards

4cb2cfd

Remove noexcept from wand_data_raw

81ecfd5

Merge branch 'master' into list-options

2e6c3c5

Merge branch 'master' into list-options

ccbb3b1

Merge branch 'master' into list-options

a24f55b

Merge branch 'master' into list-options

79acb79

Merge branch 'master' into list-options

8533672

Merge branch 'master' into list-options

ab75185

Merge branch 'master' into list-options

2a40ce4

Refactor type resolver.

32ceef7

Refactor names

1969acf

conxtexpr encoding list

5bcc2bc

JMMackenzie approved these changes May 3, 2021

View reviewed changes

Merge branch 'master' into list-options

86940ac

elshize and others added 8 commits May 3, 2021 08:01

Add missing header

b4ae74e

Merge branch 'list-options' of github.com:pisa-engine/pisa into list-…

ca4f978

…options

Remove var name from capture

e386f58

Extract lambda outside of template param pack

5613b23

Document type index code

2fb0d30

CI: use testing PPA for gcc

f6283e1

Add bionic-updates repository

b529854

Install from bionic-updates

950200d

elshize mentioned this pull request May 26, 2021

Search Binding pisa-engine/pypisa#2

Open

Merge branch 'master' into list-options

e7b2253

Merge branch 'master' into list-options

9969a8c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Listing options on cmdline #439

Listing options on cmdline #439

elshize commented Sep 28, 2020

codecov bot commented Sep 28, 2020 •

edited

Loading

amallia Oct 2, 2020

elshize Oct 2, 2020

amallia Oct 2, 2020

elshize commented May 2, 2021

elshize commented May 2, 2021

elshize commented May 2, 2021

JMMackenzie left a comment

JMMackenzie May 3, 2021

elshize May 3, 2021

elshize commented May 3, 2021

JMMackenzie commented Jun 1, 2021

elshize commented Jun 1, 2021

elshize commented Jun 6, 2021

JMMackenzie commented Jun 7, 2021

JMMackenzie commented Jun 11, 2021

elshize commented Jun 12, 2021

Listing options on cmdline #439

Are you sure you want to change the base?

Listing options on cmdline #439

Conversation

elshize commented Sep 28, 2020

codecov bot commented Sep 28, 2020 • edited Loading

Codecov Report

amallia Oct 2, 2020

Choose a reason for hiding this comment

elshize Oct 2, 2020

Choose a reason for hiding this comment

amallia Oct 2, 2020

Choose a reason for hiding this comment

elshize commented May 2, 2021

Overview

Adding new index type

elshize commented May 2, 2021

elshize commented May 2, 2021

JMMackenzie left a comment

Choose a reason for hiding this comment

JMMackenzie May 3, 2021

Choose a reason for hiding this comment

elshize May 3, 2021

Choose a reason for hiding this comment

elshize commented May 3, 2021

JMMackenzie commented Jun 1, 2021

elshize commented Jun 1, 2021

elshize commented Jun 6, 2021

JMMackenzie commented Jun 7, 2021

JMMackenzie commented Jun 11, 2021

elshize commented Jun 12, 2021

codecov bot commented Sep 28, 2020 •

edited

Loading