Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compare Multiple Image Pairs (Like Millions) #33

Open
RoberAlcaraz opened this issue Oct 1, 2024 · 4 comments
Open

Compare Multiple Image Pairs (Like Millions) #33

RoberAlcaraz opened this issue Oct 1, 2024 · 4 comments

Comments

@RoberAlcaraz
Copy link

Hi!
First of all, the work is amazing.
I am currently working in a project that involves comparing millions of image pairs (around 10M). Even though the demo is very useful, this approach takes too much time for this set of pairs.
Is it possible to compare multiple pairs at once doing something like batch processing or maybe doing parallelization? If you have any recommendations on how to approach this, I would greatly appreciate your input.
Thank you in advance :)
Rob

@iago-suarez
Copy link
Collaborator

iago-suarez commented Oct 2, 2024 via email

@RoberAlcaraz
Copy link
Author

Hi Iago,

Thank you for your quick reply!

Indeed, I am trying to compare multiple images to detect whether they show the same individual based on the points and lines of the pattern present in each image. This is why I have so many pairs to compare. Specifically, I have between 4.000 and 5.000 images, which results in a total of $\sum\frac{n(n+1)}{2} \approx 10.000.000$ image pairs to evaluate.

I appreciate your suggestion regarding pre-computing the Wireframe individually. Could you possibly provide an example or some guidance on how I could implement this pre-computation, or how to use batching effectively for speeding up the comparison process?

Any detailed example or reference would be immensely helpful.

Thank you again for your assistance! :)

Best,
Roberto

@iago-suarez
Copy link
Collaborator

iago-suarez commented Oct 3, 2024 via email

@RoberAlcaraz
Copy link
Author

Hi Iago,

I checked the model files and made some modifications to achieve the functionality I wanted:

Modifications

  • wireframe.py: In the _forward method, I introduced the h5py library to enable saving/loading of precomputed results (keypoints, scores, descriptors) to/from an HDF5 file. I added the save_path and image_id arguments to check if precomputed data exists for an image, preventing redundant computation.
import h5py  # Added for HDF5 support

def _forward(self, data, save_path=None, image_id=None):  # Added save_path and image_id
    # Check if precomputed data is available
    if save_path and image_id:  # New block to load precomputed data
        with h5py.File(save_path, "a") as hdf5_file:
            if image_id in hdf5_file:
                grp = hdf5_file[image_id]
                if "lines" in grp and "line_scores" in grp:
                    return {
                        "image": data["image"],
                        "keypoints": torch.tensor(grp["keypoints"]),
                        "keypoint_scores": torch.tensor(grp["keypoint_scores"]),
                        "descriptors": torch.tensor(grp["descriptors"]),
                        "lines": torch.tensor(grp["lines"]),
                        "line_scores": torch.tensor(grp["line_scores"]),
                        "pl_associativity": torch.tensor(grp["pl_associativity"]),
                        "lines_junc_idx": torch.tensor(grp["lines_junc_idx"]),
                    }

    # Original processing code here...

    # Save the computed lines and wireframe if `save_path` and `image_id` are provided
    if save_path and image_id:  # New block to save computed data to HDF5
        with h5py.File(save_path, "a") as hdf5_file:
            grp = hdf5_file.require_group(image_id)
            grp.create_dataset("image", data=data["image"].cpu().detach().numpy(), compression="gzip")
            grp.create_dataset("keypoints", data=all_points.cpu().detach().numpy(), compression="gzip")
            grp.create_dataset("keypoint_scores", data=all_scores.cpu().detach().numpy(), compression="gzip")
            grp.create_dataset("descriptors", data=all_descs.cpu().detach().numpy(), compression="gzip")
            grp.create_dataset("lines", data=lines.cpu().detach().numpy(), compression="gzip")
            grp.create_dataset("line_scores", data=line_scores.cpu().detach().numpy(), compression="gzip")
            grp.create_dataset("pl_associativity", data=pl_associativity.cpu().detach().numpy(), compression="gzip")
            grp.create_dataset("lines_junc_idx", data=lines_junc_idx.cpu().detach().numpy(), compression="gzip")
  • two_view_pipeline.py: I updated the _forward method to prevent redundant computation by adding the ability to check for precomputed data and streamline the use of matcher, filter, and solver components.
import h5py  # Added for HDF5 support

def _forward(self, data, pred):  # Added HDF5 check
    # Run the matcher if it exists in the configuration
    if self.conf.matcher.name:
        pred = {**pred, **self.matcher({**data, **pred})}

    # Run filter and solver if they are part of the pipeline configuration
    if self.conf.filter.name:
        pred = {**pred, **self.filter({**data, **pred})}

    if self.conf.solver.name:
        pred = {**pred, **self.solver({**data, **pred})}

    return pred

Usage

  1. To compute wireframes: This example demonstrates how to compute and save wireframes, skipping images that have already been processed:
with h5py.File(wireframe_results_path, "w") as hdf5_file:
    for img_path in img_paths:
        image_id = f"{os.path.basename(os.path.dirname(img_path))}/{os.path.basename(img_path)}"
        if image_id in hdf5_file:
            print(f"Skipping {image_id}, already processed.")
            continue
        data = {"image": numpy_image_to_torch(cv2.imread(img_path, cv2.IMREAD_GRAYSCALE)).to(device)[None]}
        wireframe_result = wireframe._forward(data, save_path=wireframe_results_path, image_id=image_id)
  1. To compute point and line matches between pairs: This example shows how precomputed features from the HDF5 file are loaded and passed to the TwoViewPipeline for matching:
def compute_point_and_line_matches(pipeline, precomputed_features, results, img_id0, img_id1):
    data = {
        "image0": precomputed_features[img_id0]["image"],
        "image1": precomputed_features[img_id1]["image"],
    }
    pred0, pred1 = precomputed_features[img_id0].copy(), precomputed_features[img_id1].copy()
    del pred0["image"], pred1["image"]
    pred = {**{k + "0": v for k, v in pred0.items()}, **{k + "1": v for k, v in pred1.items()}}
    match_result = pipeline._forward(data, pred)
    results.append((img_id0, img_id1, match_result["match_scores0"], match_result["line_match_scores0"]))

Thank you once again!
Roberto

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants