Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature: HPCToolkit data reduction + reading parallel profiles from single application run #141

Open
wants to merge 5 commits into
base: develop
Choose a base branch
from

Conversation

draganaurosgrbic
Copy link

No description provided.

@ilumsden ilumsden added area-readers Issues and PRs involving Hatchet's data readers priority-normal Normal priority issues and PRs status-ready-for-review This PR is ready to be reviewed by assigned reviewers type-feature Requests for new features or PRs which implement new features labels Jul 17, 2024
@pearce8 pearce8 requested a review from dyokelson August 26, 2024 17:05
Copy link
Collaborator

@dyokelson dyokelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @draganaurosgrbic, I've left a couple specific comments already but there are some high-level things that should be addressed before I do a more thorough re-review.

  • Please add comments to significant functions at the declaration level and sporadically throughout the code (even if internal) that would help another developer make changes in the correct place especially functions like _parse_context(), _read_profiles_metadata(), _read_summary_profile(), and _read_cct() where there is a lot of functionality. For example, it looks like _parse_context() is also where some of the data reduction happens - this would be important to comment and note for others or future self.
  • Please add more tests where you turn on and off more of the data reduction flags - e.g., _parallel_profiles_mode, filtering out mpi, openmp, cuda. Ideally there would be at least one test for every single flag, I realize a lot of them have similar logic, but it would be best to be as thorough here as possible. At a minimum we should have one test for filtering out one of the library calls and probably the parallel profiles mode added.

hatchet/graphframe.py Show resolved Hide resolved
hatchet/readers/hpctoolkit_reader_latest.py Show resolved Hide resolved
Copy link
Collaborator

@michaelmckinsey1 michaelmckinsey1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As Dewi said, it would be nice to get as many unit tests as possible. And it would give us a baseline for what to expect from these arguments.

hatchet/graphframe.py Show resolved Hide resolved
min_percentage_of_application_time (int): minimum percentage of application time that nodes in the CCT must have to be imported in Hatchet
min_percentage_of_parent_time (int): minimum percentage of parent time that nodes in the CCT must have to be imported in Hatchet
directory_mapping (dict): Python dictionary that maps file system location to a name
parallel_profiles_mode (bool): fleg whether the reader should extract parallel profiles from the database (true) or the summary profile (false)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running with parallel_profiles_mode:

gf = ht.GraphFrame.from_hpctoolkit_latest(
        dirname=f"hatchet/tests/data/hpctoolkit-gamess",
        parallel_profiles_mode=True,
    )

I get an error:

{
	"name": "KeyError",
	"message": "1195",
	"stack": "---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[9], line 1
----> 1 gf = ht.GraphFrame.from_hpctoolkit_latest(
      2         dirname=f\"hatchet/tests/data/hpctoolkit-gamess\",
      3         #max_depth=10,
      4         parallel_profiles_mode=True,
      5     )

File ~/Documents/repos/hatchet/hatchet/graphframe.py:181, in GraphFrame.from_hpctoolkit_latest(dirname, directory_mapping, parallel_profiles_mode, max_depth, min_percentage_of_application_time, exclude_mpi_function_details, exclude_openmp_function_details, exclude_cuda_function_details, exclude_system_libraries_source_code, exclude_function_call_lines, exclude_no_source_code_instructions, exclude_instructions, exclude_non_function_nodes, label_function_nodes, metric_names, metric_scopes, summary_metrics, profile_ranks)
    159 # import this lazily to avoid circular dependencies
    160 from .readers.hpctoolkit_reader_latest import HPCToolkitReaderLatest
    162 return HPCToolkitReaderLatest(
    163     dirname,
    164     directory_mapping=directory_mapping,
    165     parallel_profiles_mode=parallel_profiles_mode,
    166     max_depth=max_depth,
    167     min_application_percentage_time=min_percentage_of_application_time,
    168     exclude_mpi_function_details=exclude_mpi_function_details,
    169     exclude_openmp_function_details=exclude_openmp_function_details,
    170     exclude_cuda_function_details=exclude_cuda_function_details,
    171     exclude_system_libraries_source_code=exclude_system_libraries_source_code,
    172     exclude_function_call_lines=exclude_function_call_lines,
    173     exclude_no_source_code_instructions=exclude_no_source_code_instructions,
    174     exclude_instructions=exclude_instructions,
    175     exclude_non_function_nodes=exclude_non_function_nodes,
    176     label_function_nodes=label_function_nodes,
    177     metric_names=metric_names,
    178     metric_scopes=metric_scopes,
    179     summary_metrics=summary_metrics,
    180     profile_ranks=profile_ranks,
--> 181 ).read()

File ~/Documents/repos/hatchet/hatchet/readers/hpctoolkit_reader_latest.py:781, in HPCToolkitReaderLatest.read(self)
    778 if self._parallel_profiles_mode:
    779     self._read_profiles_metadata()
--> 781 return self._read_cct()

File ~/Documents/repos/hatchet/hatchet/readers/hpctoolkit_reader_latest.py:669, in HPCToolkitReaderLatest._read_cct(self)
    660 (szChildren, pChildren, ctxId, entryPoint) = safe_unpack(
    661     \"<QQLH\",
    662     meta_db,
   (...)
    665     szEntryPoint,
    666 )
    668 if entryPoint == 1:
--> 669     self._total_execution_time = self._summary_profile[ctxId][
    670         self._time_metric
    671     ]
    673     node = self._store_cct_node(
    674         ctxId,
    675         {\"type\": \"entry\", \"name\": \"entry\"},
    676     )
    677     self._parse_context(
    678         pChildren,
    679         szChildren,
    680         node,
    681         meta_db,
    682     )

KeyError: 1195"
}

If this is expected can we add an error message if the parallel profiles are not available? Like parallel profiles are not available for this database.

hatchet/graphframe.py Outdated Show resolved Hide resolved
@draganaurosgrbic draganaurosgrbic force-pushed the hpctoolkit-data-reduction-parallel-profiles branch from 77abf3a to f11929d Compare September 12, 2024 18:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-readers Issues and PRs involving Hatchet's data readers priority-normal Normal priority issues and PRs status-ready-for-review This PR is ready to be reviewed by assigned reviewers type-feature Requests for new features or PRs which implement new features
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants