Discussions

Ask a Question

Change graph threshold after run was completed

Hi, first of all thanks for such a great tool, i've enjoyed using it so far. I've looked through the docs but couldn't find is it possible to adjust the threshold used for graph generation after the run has been complete, without having to compute all embeddings again?

Does FastDup upoad any image data or metadata to the cloud to process its results?

Hi, As the title really. I have image data that needs to be kept local and cannot be uploaded to anywhere offsite for data security and confidentiallity reasons.

Running forever fd.run(model_path='dinov2s', cc_threshold=0.8)

I'm using the google colab notebook located at: <https://colab.research.google.com/github/visual-layer/fastdup/blob/main/examples/dinov2_notebook.ipynb> When I run "fd.run(model_path='dinov2s', cc_threshold=0.8)" the console says below and seems to just run forever (its been about an hour). I'm using CPU in google colab. How long does this usually take? (using the oxford-iit pet dataset) Trying to download dinov2s model from <https://vl-company-website.s3.us-east-2.amazonaws.com/model_artifacts/dinov2/dinov2_vits14.onnx> to /root/dinov2_vits14.onnx Downloading: 10783/? [00:01<00:00, 6307.98KB/s] FastDup Software, (C) copyright 2022 Dr. Amir Alush and Dr. Danny Bickson.

Erorr when using Minio storage

Please help me with this issue Environment - Ubuntu 22.04 - Fastdup ver 1.64 - Python 3.10 - and already installed minio client as the guide When running these code: "coco_csv = 'coco_minitrain_25k/coco_minitrain_25k/annotations/coco_minitrain2017_15images.csv' coco_annotations = pd.read_csv(coco_csv, header=None, names=['filename', 'col_x', 'row_y', 'width', 'height', 'label', 'ext']) work_dir = 'fastdup_coco_25k' fd = fastdup.create(work_dir, 'minio://myminio/fastapi-minio/Fastdup_Input') fd.run(annotations=coco_annotations)" The error shows as below "Traceback (most recent call last): File "/home/anntt/Documents/AI_Testing/dockerExample/FastAPIExample/mainenv/lib/python3.10/site-packages/fastdup/sentry.py", line 132, in inner_function ret = func(\_args, **kwargs) File "/home/anntt/Documents/AI_Testing/dockerExample/FastAPIExample/mainenv/lib/python3.10/site-packages/fastdup/fastdup_controller.py", line 581, in run if fastdup.run(self.\_set_fastdup_input(), work_dir=str(self.\_work_dir), **fastdup_kwargs) != 0: File "/home/anntt/Documents/AI_Testing/dockerExample/FastAPIExample/mainenv/lib/python3.10/site-packages/fastdup/fastdup_controller.py", line 853, in \_set_fastdup_input df_annot['filename'] = df_annot[FD.ANNOT_FILENAME].apply(lambda fname: self.\_input_dir / fname) File "/home/anntt/Documents/AI_Testing/dockerExample/FastAPIExample/mainenv/lib/python3.10/site-packages/pandas/core/series.py", line 4760, in apply ).apply() File "/home/anntt/Documents/AI_Testing/dockerExample/FastAPIExample/mainenv/lib/python3.10/site-packages/pandas/core/apply.py", line 1207, in apply return self.apply_standard() File "/home/anntt/Documents/AI_Testing/dockerExample/FastAPIExample/mainenv/lib/python3.10/site-packages/pandas/core/apply.py", line 1287, in apply_standard mapped = obj.\_map_values( File "/home/anntt/Documents/AI_Testing/dockerExample/FastAPIExample/mainenv/lib/python3.10/site-packages/pandas/core/base.py", line 921, in \_map_values return algorithms.map_array(arr, mapper, na_action=na_action, convert=convert) File "/home/anntt/Documents/AI_Testing/dockerExample/FastAPIExample/mainenv/lib/python3.10/site-packages/pandas/core/algorithms.py", line 1814, in map_array return lib.map_infer(values, mapper, convert=convert) File "lib.pyx", line 2917, in pandas.\_libs.lib.map_infer File "/home/anntt/Documents/AI_Testing/dockerExample/FastAPIExample/mainenv/lib/python3.10/site-packages/fastdup/fastdup_controller.py", line 853, in <lambda> df_annot['filename'] = df_annot[FD.ANNOT_FILENAME].apply(lambda fname: self.\_input_dir / fname) TypeError: unsupported operand type(s) for /: 'str' and 'str' Traceback (most recent call last): File "/home/anntt/Documents/AI_Testing/Fastdup_Mislabel.py", line 19, in <module> fd.run(annotations=coco_annotations) File "/home/anntt/Documents/AI_Testing/dockerExample/FastAPIExample/mainenv/lib/python3.10/site-packages/fastdup/engine.py", line 157, in run return super().run(annotations=annotations, input_dir=input_dir, subset=subset, data_type=data_type, File "/home/anntt/Documents/AI_Testing/dockerExample/FastAPIExample/mainenv/lib/python3.10/site-packages/fastdup/sentry.py", line 138, in inner_function raise ex File "/home/anntt/Documents/AI_Testing/dockerExample/FastAPIExample/mainenv/lib/python3.10/site-packages/fastdup/sentry.py", line 132, in inner_function ret = func(\_args, **kwargs) File "/home/anntt/Documents/AI_Testing/dockerExample/FastAPIExample/mainenv/lib/python3.10/site-packages/fastdup/fastdup_controller.py", line 581, in run if fastdup.run(self.\_set_fastdup_input(), work_dir=str(self.\_work_dir), **fastdup_kwargs) != 0: File "/home/anntt/Documents/AI_Testing/dockerExample/FastAPIExample/mainenv/lib/python3.10/site-packages/fastdup/fastdup_controller.py", line 853, in \_set_fastdup_input df_annot['filename'] = df_annot[FD.ANNOT_FILENAME].apply(lambda fname: self.\_input_dir / fname) File "/home/anntt/Documents/AI_Testing/dockerExample/FastAPIExample/mainenv/lib/python3.10/site-packages/pandas/core/series.py", line 4760, in apply ).apply() File "/home/anntt/Documents/AI_Testing/dockerExample/FastAPIExample/mainenv/lib/python3.10/site-packages/pandas/core/apply.py", line 1207, in apply return self.apply_standard() File "/home/anntt/Documents/AI_Testing/dockerExample/FastAPIExample/mainenv/lib/python3.10/site-packages/pandas/core/apply.py", line 1287, in apply_standard mapped = obj.\_map_values( File "/home/anntt/Documents/AI_Testing/dockerExample/FastAPIExample/mainenv/lib/python3.10/site-packages/pandas/core/base.py", line 921, in \_map_values return algorithms.map_array(arr, mapper, na_action=na_action, convert=convert) File "/home/anntt/Documents/AI_Testing/dockerExample/FastAPIExample/mainenv/lib/python3.10/site-packages/pandas/core/algorithms.py", line 1814, in map_array return lib.map_infer(values, mapper, convert=convert) File "lib.pyx", line 2917, in pandas.\_libs.lib.map_infer File "/home/anntt/Documents/AI_Testing/dockerExample/FastAPIExample/mainenv/lib/python3.10/site-packages/fastdup/fastdup_controller.py", line 853, in <lambda> df_annot['filename'] = df_annot[FD.ANNOT_FILENAME].apply(lambda fname: self.\_input_dir / fname) TypeError: unsupported operand type(s) for /: 'str' and 'str'" Please help me check this issue. Thanks and Best regards

Adding AWS Lambda tutorial to Documentation.

Hello. I've been using fastdup for months and thank you for this project. I saw s3 support to fastdup and wondered can I do it on AWS Lambda. I tried it out and after some struggling I was able to do it. Now I'm going to write a Medium documentation about how to deploy aws lambda container on ECR to Lambda with amazonlinux base image (usually lambda base image is used for container but for example fastdup is not available in lambda image hence had to use amazonlinux base image). I used fastdup in this project. If it will be helpful to other people like me. I'm ready to share with you guys once I've done it. Just to be contributing to the community.

Fastdup version check: warning vs. RuntimeError

The RuntimeError returned by the version check in **init**.py forces users to upgrade to the newest versions of Fastdup, which may not always be compatible with other versions of libraries that the user has intentionally selected for other reasons. Because you release new versions of Fastdup so frequently, this error is encountered again very quickly even after the user resolves his or her dependency issues. I propose switching this version check to a warning rather than a RuntimeError. My temporary fix right now is to catch the RuntimeError during imports, but I think a warning is more appropriate. <https://github.com/visual-layer/fastdup/blob/156a8521741fc9bfce8cceab34d56306ebb0d958/fastdup/**init**.py#L52>

Ignore directory name patterns.

I'm running `fastdup` inside my Synology DS220+ NAS. I'm using Synology Photos to upload my photos to my NAS, but it creates a thumbnail photo of videos and photos inside `/@eaDir` directory. For example: 1. `/Photos/MobileBackup/iPhone/2021/12/IMG_9214.JPG` 2. `/Photos/MobileBackup/iPhone/2021/12/@eaDir/IMG_9214.JPG/SYNOPHOTO_THUMB_SM.jpg` Is there a way I can specify a list of directories not to look into? Or maybe even a regular expresion, for example if the name contains "SYNOPHOTO_THUMB_", or something.

How to define the repeated images or how to select threshold?

Thank you for your project. I recently worked on a batch of data extracted from video captured by some fixed cameras(For object detection tasks, random date and random time sampling, with large intervals). Visualization reveals that this batch of data has a highly consistent background, with only a few different goals (target people, motor vehicles, non-motor vehicles). In fastdup, the result is usually above 0.9 points. The background will have an impact on the model, which may cause the model to learn some special background distribution information, and the generalization will be reduced due to the speculation (sorry I haven't had time to do the comparison experiment,and it may be tested within a week). Have you ever done a similar experiment? Perhaps the definition of duplicate images needs to be extended a bit for object detection tasks? I also used imagedups for a comparative test. (<https://github.com/chinalu/imagedups> use default parameter) .Of the 20,000 images, 515 were determined to be duplicated and cleared. The top 515 in fastdup scored 0.95 or more.How should I choose the threshold (in other experiments such as ocr/ license plate detection data, I found that 0.95 is not a good threshold)? Looking forward to your reply, thank you.

Welcome to fastdup V1.0!

Welcome, we are Visual-Layer, creators of Fastdup. We'd be happy to answer any questions, support topics and feature requests. We are also highly available on the [Fastdup Slack channel](https://join.slack.com/t/visualdatabase/shared_invite/zt-19jaydbjn-lNDEDkgvSI1QwbTXSY6dlA)