Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more Pandas-based Checkpointing and Save/Load Functions #16

Draft
wants to merge 10 commits into
base: develop
Choose a base branch
from

Conversation

ilumsden
Copy link
Collaborator

@ilumsden ilumsden commented Feb 9, 2022

Follow up to hatchet/hatchet#272

This PR adds the following new functions for checkpointing GraphFrames (i.e., saving to/reading from files):

  1. to_pickle and from_pickle (Pickle Format)
  2. to_csv and from_csv
  3. to_excel and from_excel
    These functions utilize similar read/write functions from Pandas. In many cases, these Pandas functions require additional dependencies. Those dependencies will not be required in Hatchet. If the dependency for a particular function is not installed, Pandas will raise an ImportError.

This PR also adds new save and load functions to the GraphFrame class. These functions can be used to simplify the use of checkpointing. Both of these functions only require one argument: the filename. If the filename contains a recognized extension, that format will be used. Otherwise, the optional fileformat parameter can be provided to specify the desired format. If the necessary dependencies are not installed, the ImportError raised by Pandas will be caught. In that case, all remaining formats will be attempted. If no supported format succeeds, an IOError will be raised.

All the new functions added in this PR accepts keyword arguments (i.e., **kwargs). These arguments will be passed to the Pandas function that is eventually invoked to read/write the file. Documentation (i.e., docstrings) will be added that will link to the associated functions' documentation.

Other file formats (e.g., Parquet and Feather) will be added in future PRs.

@ilumsden ilumsden added area-readers Issues and PRs involving Hatchet's data readers area-writers Issues and PRs involving Hatchet's data writers priority-normal Normal priority issues and PRs status-work-in-progress PR is currently being worked on type-feature Requests for new features or PRs which implement new features labels Feb 9, 2022
@ilumsden ilumsden self-assigned this Feb 9, 2022
@ilumsden
Copy link
Collaborator Author

ilumsden commented Feb 9, 2022

Originally from hatchet/hatchet on May 18, 2021

I might wait until hatchet/hatchet#377 is merged before marking this PR ready-for-review. This PR adds some global configuration type data to all the save and load functions to determine the file format to use based on file extension. If this data was placed in the global configuration system, user's would be able to add "rules" telling those functions to save/load files with non-standard extensions using a certain file format.

@ilumsden
Copy link
Collaborator Author

ilumsden commented Feb 9, 2022

Originally from May 22, 2021:

Implementation and testing is now complete. This PR depends on hatchet/hatchet#272, so it definitely shouldn't be reviewed or merged until hatchet/hatchet#272 is merged. I also want to integrate hatchet/hatchet#377, but I might do that in a separate PR.

@slabasan slabasan force-pushed the develop branch 12 times, most recently from 74d7f3e to 837e5e3 Compare August 9, 2022 04:48
@slabasan slabasan force-pushed the develop branch 5 times, most recently from b461833 to 48d44ce Compare August 9, 2022 05:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-readers Issues and PRs involving Hatchet's data readers area-writers Issues and PRs involving Hatchet's data writers priority-normal Normal priority issues and PRs status-work-in-progress PR is currently being worked on type-feature Requests for new features or PRs which implement new features
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant