Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flock failure in /autofs on ORNL Crusher #510

Open
adammoody opened this issue Oct 26, 2022 · 0 comments
Open

flock failure in /autofs on ORNL Crusher #510

adammoody opened this issue Oct 26, 2022 · 0 comments

Comments

@adammoody
Copy link
Contributor

In a case where SCR is directed to write to /autofs, the run throws an error from flock() when access the halt file:

SCR v3.0.0: rank 0 on crusher180: NPROCS=8
SCR v3.0.0: rank 0 on crusher180: NNODES=1
SCR v3.0.0: rank 0 on crusher180: Stopping all async flush operations
SCR v3.0.0 ERROR: rank 0 on crusher180: Failed to acquire file lock on /autofs/proj/user/job/.scr/halt.scr: flock(23, 1) errno=524 Unknown error 524 @ /path/to/scr/src/scr_io.c:173
SCR v3.0.0: rank 0 on crusher180: scr_fetch_latest: return code 1, 0.000708 secs

This flock() error does not show up if using the parallel file system.

Running SCR on an NFS file system is known to be problematic. The library and the commands share some common files and they use file locking to control concurrent access. File locking on NFS is not always supported.

If the underlying file system supports locking with fcntl, it may help to switch lock methods with -DSCR_FILE_LOCK=fcntl:

https://scr.readthedocs.io/en/v3.0/users/build.html#cmake

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant