-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
host group mutex #23
base: master
Are you sure you want to change the base?
host group mutex #23
Conversation
WalkthroughThe changes primarily involve modifications to the Changes
Possibly related PRs
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Caution
Inline review comments failed to post
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (6)
test/any/async_streaming.cpp (1)
142-142
: Consider adding more comprehensive result verificationThe current check
if (vals != std::vector<size_t>{3, 12}) return 1;
verifies the final state ofvals
. While this is a good basic check, consider adding more detailed assertions or error messages to provide better feedback in case of failure.You could enhance the verification like this:
if (vals != std::vector<size_t>{3, 12}) { std::cerr << "Unexpected vals: "; for (const auto& val : vals) { std::cerr << val << " "; } std::cerr << std::endl; return 1; }inc/mkn/gpu/multi_launch.hpp (5)
53-53
: Avoid unnecessaryconst
qualifier on pass-by-value parameterThe
mode_
parameter is passed by value in the constructor. Addingconst
to a pass-by-value parameter has no effect since it doesn't prevent the caller or callee from modifying the local copy. Consider removing the redundantconst
qualifier.Apply this diff to remove the unnecessary
const
:- StreamFunction(Strat& strat_, StreamFunctionMode const mode_) : strat{strat_}, mode{mode_} {} + StreamFunction(Strat& strat_, StreamFunctionMode mode_) : strat{strat_}, mode{mode_} {}
61-63
: Prefer passing fundamental types by value instead of byconst
referenceFor fundamental types like
std::size_t
, passing by value is more efficient than passing byconst
reference due to potential overhead associated with references. Consider changing the function parameters to pass by value.Apply this diff to update the parameter passing:
-std::size_t group_idx_modulo(std::size_t const& gs, std::size_t const& i) { +std::size_t group_idx_modulo(std::size_t gs, std::size_t i) {
68-69
: Prefer passing fundamental types by value instead of byconst
referenceIn the constructor of
StreamGroupFunction
,gs
is astd::size_t
, a fundamental type. Passing it by value is generally more efficient than passing byconst
reference. Similarly, consider removing the unnecessaryconst
from themode_
parameter.Apply this diff:
- StreamGroupFunction(std::size_t const& gs, Strat& strat_, StreamFunctionMode const mode_) + StreamGroupFunction(std::size_t gs, Strat& strat_, StreamFunctionMode mode_)
281-283
: Prefer passing fundamental types by value instead of byconst
referenceIn the constructor of
StreamHostGroupMutexFunction
,gs
is astd::size_t
. Passing fundamental types by value is more efficient than passing byconst
reference. Consider changing the parameter to pass by value.Apply this diff:
- StreamHostGroupMutexFunction(std::size_t const& gs, Strat& strat, Fn&& fn_) + StreamHostGroupMutexFunction(std::size_t gs, Strat& strat, Fn&& fn_)
342-344
: Prefer passing fundamental types by value instead of byconst
referenceIn the
host_group_mutex
method,group_size
is astd::size_t
. Passing fundamental types by value is more efficient than passing byconst
reference. Consider changing the parameter to pass by value.Apply this diff:
- This& host_group_mutex(std::size_t const& group_size, Fn&& fn) { + This& host_group_mutex(std::size_t group_size, Fn&& fn) {
🛑 Comments failed to post (1)
inc/mkn/gpu/multi_launch.hpp (1)
289-296: 🛠️ Refactor suggestion
Ensure mutex is properly acquired before using
In the
run
method, theunique_lock
is constructed withstd::defer_lock
andtry_lock
is used. If the lock is not acquired,fn(i)
is not called, andstrat.status[i]
is set toSFS::FIRST
to retry. This logic may lead to busy waiting and potential performance issues if the mutex is heavily contended. Consider implementing a back-off strategy or using condition variables to avoid tight loops.Consider refactoring the code to include a back-off mechanism:
void run(std::uint32_t const i) override { auto const gidx = Super::group_idx(i); std::unique_lock<std::mutex> lock(mutices[Super::group_idx(i)], std::defer_lock); if (lock.try_lock()) { fn(i); strat.status[i] = SFS::WAIT; // done } else { + // Introduce a small sleep to prevent tight loop in case of contention + std::this_thread::sleep_for(std::chrono::milliseconds(1)); strat.status[i] = SFS::FIRST; // retry } }Committable suggestion was skipped due to low confidence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (5)
inc/mkn/gpu/multi_launch.hpp (4)
61-63
: LGTM: New group_idx_modulo function addedThe
group_idx_modulo
function is a useful addition for group-based operations. Its implementation is correct and improves code readability by encapsulating the group index calculation.Consider marking this function as
constexpr
andinline
to allow for compile-time evaluation and potential performance improvements:-std::size_t group_idx_modulo(std::size_t const& gs, std::size_t const& i) { +constexpr inline std::size_t group_idx_modulo(std::size_t const& gs, std::size_t const& i) { return ((i - (i % gs)) / gs); }
65-75
: LGTM: New StreamGroupFunction struct addedThe
StreamGroupFunction
struct is a well-designed abstraction for group-based stream functions. It properly inherits fromStreamFunction
and provides useful group-related functionality.Consider marking the
group_idx
method asconstexpr
andnoexcept
for potential compile-time optimizations and to indicate it doesn't throw exceptions:- std::size_t group_idx(std::size_t const& i) const { return group_idx_modulo(group_size, i); } + constexpr std::size_t group_idx(std::size_t const& i) const noexcept { return group_idx_modulo(group_size, i); }
267-298
: LGTM: New StreamHostGroupMutexFunction added for group-based mutex synchronizationThe
StreamHostGroupMutexFunction
is a valuable addition for synchronizing host-side operations within groups. Its implementation usingstd::mutex
is appropriate for host-side synchronization.Consider using
std::shared_mutex
instead ofstd::mutex
if multiple readers are expected to access shared resources simultaneously. This could potentially improve performance in scenarios where read operations are more frequent than write operations:- std::vector<std::mutex> mutices; + std::vector<std::shared_mutex> mutices;Also, to prevent potential busy-waiting, you might want to add a small sleep or yield when the lock isn't acquired:
if (lock.try_lock()) { fn(i); strat.status[i] = SFS::WAIT; // done } else { strat.status[i] = SFS::FIRST; // retry + std::this_thread::yield(); // or std::this_thread::sleep_for(std::chrono::milliseconds(1)); }
Line range hint
1-461
: Overall: Excellent additions to enhance group-based operations and synchronizationThe changes introduce well-designed group-based functionality and mutex synchronization to the GPU multi-launch framework. The new features are cohesively integrated with the existing codebase and provide valuable enhancements for parallel processing scenarios.
Consider adding more inline documentation or comments to explain the usage and benefits of the new group-based features and mutex synchronization. This would help users understand when and how to best utilize these new capabilities in their GPU multi-launch operations.
test/any/async_streaming.cpp (1)
135-137
: Remove unnecessarymutable
keyword in lambdaThe lambda function passed to
.host()
does not modify any captured variables, yet it's marked asmutable
. Unless there's a specific reason, consider removing themutable
keyword for clarity.Apply this diff to remove
mutable
:-.host([&](auto i) mutable { +.host([&](auto i) {
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (2)
- inc/mkn/gpu/multi_launch.hpp (5 hunks)
- test/any/async_streaming.cpp (1 hunks)
🧰 Additional context used
🔇 Additional comments (5)
inc/mkn/gpu/multi_launch.hpp (3)
53-53
: LGTM: Improved parameter passing in StreamFunction constructorThe change to pass
mode_
as a constant reference is a good practice. It potentially improves performance by avoiding unnecessary copies without changing the functionality.
Line range hint
222-265
: LGTM: StreamGroupBarrierFunction updated to use new group-based abstractionThe changes to
StreamGroupBarrierFunction
properly integrate it with the newStreamGroupFunction
base class. This improves code consistency and leverages the new group functionality.The
run
method now correctly uses thegroup_idx
method from the base class, which is a good improvement in code reuse and consistency.
339-344
: LGTM: New host_group_mutex method added to ThreadedStreamLauncherThe
host_group_mutex
method is a well-implemented addition to theThreadedStreamLauncher
class. It provides a convenient way to add group-based, mutex-synchronized host functions to the launcher.The method correctly creates and adds a new
StreamHostGroupMutexFunction
to thefns
vector, properly forwarding the provided function and group size.test/any/async_streaming.cpp (2)
119-152
: Functiontest_threaded_host_group_mutex
implementation looks correctThe implementation of the
test_threaded_host_group_mutex
function appears to correctly utilize theThreadedStreamLauncher
withhost_group_mutex
to synchronize host operations across groups. The logic aligns with the intended functionality.
139-139
:⚠️ Potential issueUndefined function
group_idx_modulo
The function
group_idx_modulo
used here is not defined within the provided code or the included headers. This will lead to a compilation error due to the undefined reference.Run the following script to check if
group_idx_modulo
is defined elsewhere in the codebase:Ensure that
group_idx_modulo
is properly defined or included from the appropriate header file.
test/any/async_streaming.cpp
Outdated
return test() + test_threaded() + test_threaded(6) + test_threaded_group_barrier() + | ||
test_threaded_host_group_mutex(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Check the cumulative return value in main
The main
function returns the sum of the test function results. Since each test function returns 0
on success and 1
on failure, the cumulative return value could exceed 1
, making it harder to interpret the specific failure. Consider handling test results individually or using bitwise operations to identify specific test failures.
For clearer result interpretation, you could modify main
as follows:
-return test() + test_threaded() + test_threaded(6) + test_threaded_group_barrier() +
- test_threaded_host_group_mutex();
+int result = 0;
+result |= test();
+result |= test_threaded() << 1;
+result |= test_threaded(6) << 2;
+result |= test_threaded_group_barrier() << 3;
+result |= test_threaded_host_group_mutex() << 4;
+return result;
This way, each test's result occupies a unique bit in the return value.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
return test() + test_threaded() + test_threaded(6) + test_threaded_group_barrier() + | |
test_threaded_host_group_mutex(); | |
int result = 0; | |
result |= test(); | |
result |= test_threaded() << 1; | |
result |= test_threaded(6) << 2; | |
result |= test_threaded_group_barrier() << 3; | |
result |= test_threaded_host_group_mutex() << 4; | |
return result; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (3)
inc/mkn/gpu/multi_launch.hpp (3)
61-63
: Newgroup_idx_modulo
function looks good, but consider adding a safety checkThe new
group_idx_modulo
function is a good addition for calculating group indices. However, to prevent potential undefined behavior, consider adding a check for division by zero.Consider adding a check for
gs != 0
at the beginning of the function:std::size_t group_idx_modulo(std::size_t const& gs, std::size_t const& i) { + if (gs == 0) { + throw std::invalid_argument("Group size cannot be zero"); + } return ((i - (i % gs)) / gs); }
267-298
: Well-implementedStreamHostGroupMutexFunction
with a suggestion for improvementThe new
StreamHostGroupMutexFunction
struct is a good addition that provides mutex-based synchronization for host group functions. The implementation is consistent with the existing code style and error handling.The
run
method's try-lock mechanism prevents deadlocks, which is good. However, it might lead to busy waiting in high-contention scenarios.Consider implementing a backoff strategy or using a condition variable to reduce CPU usage in case of high contention. For example:
void run(std::uint32_t const i) override { std::unique_lock<std::mutex> lock(mutices[Super::group_idx(i)], std::defer_lock); if (lock.try_lock()) { fn(i); strat.status[i] = SFS::WAIT; // done } else { // Implement exponential backoff std::this_thread::sleep_for(std::chrono::milliseconds(backoff_time)); backoff_time = std::min(backoff_time * 2, max_backoff_time); strat.status[i] = SFS::FIRST; // retry } }This approach would reduce CPU usage while still maintaining the non-blocking nature of the current implementation.
Line range hint
385-409
: Improved efficiency inget_work
, but potential race condition introducedThe changes to the
get_work
method improve efficiency by allowing work to resume from where it left off in previous calls. This is a good optimization for scenarios with many work items.However, the introduction of the shared
work_i
variable, which is modified outside the lock, could lead to race conditions in a multi-threaded environment.To address the potential race condition, consider one of the following approaches:
- Move the
work_i = 0
reset inside the lock:std::pair<SFP, std::size_t> get_work() { std::scoped_lock<std::mutex> lk(work_); for (; work_i < datas.size(); ++work_i) { // ... existing code ... } + work_i = 0; if (check_finished()) done = 1; return std::make_pair(SFP::SKIP, 0); } - work_i = 0;
- Use a thread-local variable instead of a shared member variable:
std::pair<SFP, std::size_t> get_work() { + static thread_local std::size_t work_i = 0; std::scoped_lock<std::mutex> lk(work_); for (; work_i < datas.size(); ++work_i) { // ... existing code ... } work_i = 0; if (check_finished()) done = 1; return std::make_pair(SFP::SKIP, 0); } - std::size_t work_i = 0;Either of these approaches would maintain the efficiency improvement while avoiding potential race conditions.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
- inc/mkn/gpu/multi_launch.hpp (8 hunks)
🧰 Additional context used
🔇 Additional comments (5)
inc/mkn/gpu/multi_launch.hpp (5)
53-53
: Improved efficiency inStreamFunction
constructorThe change to pass
mode_
as a const reference is a good optimization. It prevents unnecessary copying and ensures that themode_
parameter cannot be accidentally modified within the constructor.
65-75
: Well-designedStreamGroupFunction
structThe new
StreamGroupFunction
struct is a good addition that provides a clear abstraction for group-based stream functions. It properly inherits fromStreamFunction
and introduces the necessary group-related functionality.The
group_idx
method effectively utilizes thegroup_idx_modulo
function, maintaining consistency in the codebase.
Line range hint
222-265
: ImprovedStreamGroupBarrierFunction
with better inheritanceThe changes to
StreamGroupBarrierFunction
are well-implemented:
- Inheriting from
StreamGroupFunction
improves code organization and reuse.- The constructor properly initializes the base class with the group size.
- The
run
method now uses thegroup_idx
method from the base class, which is more consistent.These modifications enhance the overall design and maintainability of the code.
339-344
: Well-implementedhost_group_mutex
methodThe new
host_group_mutex
method inThreadedStreamLauncher
is a good addition. It provides a clean and consistent way to add mutex-based host group functions to the launcher. The implementation correctly creates and adds a newStreamHostGroupMutexFunction
to thefns
vector, maintaining consistency with other similar methods in the class.
Line range hint
1-445
: Overall assessment of changes tomulti_launch.hpp
The changes to this file significantly enhance the GPU multi-launch framework by introducing well-designed group-based abstractions and improving existing functionality. Key improvements include:
- The new
StreamGroupFunction
provides a solid foundation for group-based operations.- The
StreamHostGroupMutexFunction
adds useful mutex-based synchronization for host group functions.- The
ThreadedStreamLauncher
class has been extended with new capabilities that leverage these group-based abstractions.While these changes are generally positive, there are a few areas that require attention:
- The
group_idx_modulo
function should include a check for division by zero.- The locking mechanism in
StreamHostGroupMutexFunction
could be optimized to reduce potential busy waiting.- The use of the shared
work_i
variable inThreadedStreamLauncher
introduces potential thread safety issues that should be addressed.Addressing these points will further improve the robustness and efficiency of the framework. Overall, this update represents a significant step forward in the capabilities of the GPU multi-launch system.
@@ -383,6 +440,7 @@ struct ThreadedStreamLauncher : public StreamLauncher<Datas, ThreadedStreamLaunc | |||
|
|||
private: | |||
bool joined = false, started = false, done = false; | |||
std::size_t work_i = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reconsider the use of shared work_i
variable
The addition of the work_i
member variable introduces shared mutable state in a multi-threaded environment. This could lead to race conditions and unexpected behavior.
Consider one of the following alternatives:
-
Make
work_i
a local variable within theget_work
method if its state doesn't need to be preserved between calls. -
If the state needs to be preserved, use a thread-local variable:
std::pair<SFP, std::size_t> get_work() {
static thread_local std::size_t work_i = 0;
// ... rest of the method ...
}
- If shared state is necessary, ensure all access to
work_i
is properly synchronized, preferably by keeping all operations on it within the locked section ofget_work
.
These alternatives would help prevent potential race conditions and improve the thread safety of the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Outside diff range and nitpick comments (5)
inc/mkn/gpu/rocm.hpp (1)
208-210
: LGTM: Improved readability in alloc_managed functionThe introduction of the
bytes
variable enhances code clarity by separating the size calculation from thehipMallocManaged
call. This change improves readability and maintainability without altering the function's behavior.Consider adding a comment explaining the purpose of the
bytes
variable for even better clarity:+ // Calculate total bytes to allocate auto const bytes = size * sizeof(T); KLOG(TRC) << "GPU alloced: " << size * sizeof(T); MKN_GPU_ASSERT(hipMallocManaged((void**)&p, bytes));
test/any/async_streaming.cpp (1)
142-142
: Define expected values as constants for clarityThe comparison
if (vals != std::vector<size_t>{3, 12})
uses hardcoded magic numbers, which can hinder code readability. Consider defining these values as named constants or adding comments to explain their significance.inc/mkn/gpu/multi_launch.hpp (3)
61-63
: Simplify group index calculationThe
group_idx_modulo
function can be simplified. Currently, it calculates the group index using:return ((i - (i % gs)) / gs);
This expression is equivalent to integer division
i / gs
for integers. You can simplify the function to:return i / gs;
This makes the code clearer and may improve performance slightly.
61-63
: Avoid passing primitive types by const referenceIn the
group_idx_modulo
function, the parametersgs
andi
are passed asconst&
. Sincestd::size_t
is a primitive type, it's more efficient to pass them by value:std::size_t group_idx_modulo(std::size_t gs, std::size_t i) { return i / gs; }This change avoids unnecessary indirection and aligns with best practices.
361-365
: Avoid passingstd::size_t
by const referenceIn the
host_group_mutex
method,group_size
is passed asstd::size_t const&
. Sincestd::size_t
is a primitive type, it's more efficient to pass it by value:This& host_group_mutex(std::size_t group_size, Fn&& fn) { // ... }This change eliminates unnecessary indirection and follows C++ best practices for passing primitive types.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (3)
- inc/mkn/gpu/multi_launch.hpp (8 hunks)
- inc/mkn/gpu/rocm.hpp (2 hunks)
- test/any/async_streaming.cpp (1 hunks)
🧰 Additional context used
🔇 Additional comments (4)
inc/mkn/gpu/rocm.hpp (2)
48-51
: LGTM: Improved macro formattingThe reformatting of the
MKN_GPU_ASSERT
macro enhances code readability without altering its functionality. This change aligns with best practices for macro definitions.
Line range hint
1-458
: Overall assessment: Minor improvements to code qualityThe changes in this file, while minor, contribute positively to the overall code quality and readability. They align well with the broader objectives of the pull request to enhance the GPU multi-launch framework. No functional changes or potential issues were identified in this file.
test/any/async_streaming.cpp (1)
196-201
: Refactor 'main' to better identify test failuresAs previously mentioned, returning the sum of test results can make it difficult to identify which specific test failed. Consider using bitwise operations or handling each test result individually to improve failure detection.
inc/mkn/gpu/multi_launch.hpp (1)
Line range hint
413-437
: Resolution of previous concurrency issue withwork_i
The potential race condition with the shared variable
work_i
, as previously noted in past reviews, has been addressed. Access towork_i
is now properly synchronized using thework_
mutex in theget_work
method, ensuring thread safety.Also applies to: 471-471
std::uint32_t test_threaded_host_group_mutex(std::size_t const& nthreads = 2) { | ||
using T = double; | ||
KUL_DBG_FUNC_ENTER; | ||
|
||
std::size_t constexpr group_size = 3; | ||
std::vector<size_t> vals((C + 1) / group_size); // 2 values; | ||
std::vector<ManagedVector<T>> vecs(C + 1, ManagedVector<T>(NUM, 0)); | ||
for (std::size_t i = 0; i < vecs.size(); ++i) std::fill_n(vecs[i].data(), NUM, i); | ||
|
||
ManagedVector<T*> datas(C + 1); | ||
for (std::size_t i = 0; i < vecs.size(); ++i) datas[i] = vecs[i].data(); | ||
auto views = datas.data(); | ||
|
||
ThreadedStreamLauncher{vecs, nthreads} | ||
.dev([=] __device__(auto const& i) { views[i][mkn::gpu::idx()] += 1; }) | ||
.host([&](auto i) mutable { | ||
std::this_thread::sleep_for(200ms); | ||
for (auto& e : vecs[i]) e += 1; | ||
}) | ||
.host_group_mutex(group_size, // lambda scope is locked per group | ||
[&](auto const i) { vals[group_idx_modulo(group_size, i)] += i; }) | ||
.dev([=] __device__(auto const& i) { views[i][mkn::gpu::idx()] += 3; })(); | ||
|
||
if (vals != std::vector<size_t>{3, 12}) return 1; | ||
|
||
std::size_t val = 5; | ||
for (auto const& vec : vecs) { | ||
for (auto const& e : vec) | ||
if (e != val) return 1; | ||
++val; | ||
}; | ||
|
||
return 0; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Refactor duplicated code in test functions to improve maintainability
The functions test_threaded_host_group_mutex
and test_threaded_host_group_idx
contain similar code blocks for vector initialization and launcher setup. Refactoring these common sections into reusable helper functions can enhance code maintainability and reduce duplication.
Also applies to: 154-192
KUL_DBG_FUNC_ENTER; | ||
|
||
std::size_t constexpr group_size = 3; | ||
std::vector<size_t> vals((C + 1) / group_size); // 2 values; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adjust the size calculation of 'vals' to handle all group sizes
The current calculation for the size of the vals
vector may not correctly account for cases where (C + 1)
is not perfectly divisible by group_size
. This could lead to incorrect indexing or out-of-bounds errors.
Apply this diff to ensure the size of vals
correctly accommodates all elements:
- std::vector<size_t> vals((C + 1) / group_size); // 2 values;
+ std::vector<size_t> vals((C + group_size - 1) / group_size);
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
std::vector<size_t> vals((C + 1) / group_size); // 2 values; | |
std::vector<size_t> vals((C + group_size - 1) / group_size); |
for (auto& e : vecs[i]) e += 1; | ||
}) | ||
.host_group_mutex(group_size, // lambda scope is locked per group | ||
[&](auto const i) { vals[group_idx_modulo(group_size, i)] += i; }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codebase verification
Include multi_launch.hpp
to define group_idx_modulo
The function group_idx_modulo
is defined in inc/mkn/gpu/multi_launch.hpp
, but this header is not included in test/any/async_streaming.cpp
. Please add the necessary include to prevent compilation errors.
- File: test/any/async_streaming.cpp
🔗 Analysis chain
Ensure 'group_idx_modulo' is defined to prevent compilation errors
The function group_idx_modulo
is used but not defined within this file or included from any header. This will result in a compilation error.
Please define group_idx_modulo
or include the appropriate header where it is declared.
To verify if group_idx_modulo
is defined elsewhere, run the following script:
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Search for the definition of 'group_idx_modulo' in the codebase.
# Test: Expect to find the definition of 'group_idx_modulo'
rg -A 5 "group_idx_modulo"
Length of output: 1176
Script:
#!/bin/bash
# Description: Check if 'multi_launch.hpp' is included in 'test/any/async_streaming.cpp'
rg '#include\s+"inc/mkn/gpu/multi_launch\.hpp"' test/any/async_streaming.cpp
Length of output: 78
std::unique_lock<std::mutex> lock(mutices[Super::group_idx(i)], std::defer_lock); | ||
|
||
if (lock.try_lock()) { | ||
fn(i); | ||
strat.status[i] = SFS::WAIT; // done | ||
} else { | ||
strat.status[i] = SFS::FIRST; // retry | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential data race on strat.status[i]
In the run
method of StreamHostGroupMutexFunction
, strat.status[i]
is modified without synchronization. Since strat.status[i]
may be accessed concurrently by multiple threads in methods like get_work
and is_fn_finished
, this can lead to data races and undefined behavior.
Consider one of the following solutions:
- Protect access to
strat.status[i]
with a mutex or lock. - Use
std::atomic<SFS>
forstrat.status[i]
to ensure thread-safe access.
Apply this diff to change status
to an atomic vector:
- std::vector<SFS> status;
+ std::vector<std::atomic<SFS>> status;
Ensure all reads and writes to status[i]
use atomic operations.
Committable suggestion was skipped due to low confidence.
Summary by CodeRabbit
New Features
Bug Fixes
Refactor