Discussions

Ask a Question

Analyzing Object Detection Dataset raise an error

first, the data frame that presented in the example is not with the correct columns img_filename vs filename or bbox_x vs col_x and more. when loading the dataset using fd.run(annotations=coco_annotations, overwrite=True) there is a RunTimeError <br> { "name": "RuntimeError", "message": "Fastdup execution failed", "stack": "--------------------------------------------------------------------------- RuntimeError Traceback (most recent call last) Cell In[8], line 3 1 fd = fastdup.create(work_dir=\"/home/ohad/dataManager/\", 2 input_dir=\"images/train2017/\") ----> 3 fd.run(annotations=coco_annotations, overwrite=True) File ~/miniconda3/envs/dm_test_3_10/lib/python3.10/site-packages/fastdup/engine.py:157, in Fastdup.run(self, input_dir, annotations, embeddings, subset, data_type, overwrite, model_path, distance, nearest_neighbors_k, threshold, outlier_percentile, num_threads, num_images, verbose, license, high_accuracy, cc_threshold, **kwargs) 154 fastdup_func_params['model_path'] = model_path 155 fastdup_func_params.update(kwargs) --> 157 return super().run(annotations=annotations, input_dir=input_dir, subset=subset, data_type=data_type, 158 overwrite=overwrite, embeddings=embeddings, **fastdup_func_params) File ~/miniconda3/envs/dm_test_3_10/lib/python3.10/site-packages/fastdup/sentry.py:144, in v1_sentry_handler.<locals>.inner_function(\*args, **kwargs) 142 else: 143 fastdup_capture_exception(f\"V1:{func.**name**}\", ex) --> 144 raise ex 146 except Exception as ex: 147 fastdup_capture_exception(f\"V1:{func.**name\*\*}\", ex) File ~/miniconda3/envs/dm_test_3_10/lib/python3.10/site-packages/fastdup/sentry.py:135, in v1_sentry_handler.<locals>.inner_function(\_args, **kwargs) 133 try: 134 start_time = time.time() --> 135 ret = func(\_args, **kwargs) 136 fastdup_performance_capture(f\"V1:{func.**name**}\", start_time) 137 return ret File ~/miniconda3/envs/dm_test_3_10/lib/python3.10/site-packages/fastdup/fastdup_controller.py:593, in FastdupController.run(self, input_dir, annotations, subset, embeddings, data_type, overwrite, print_summary, print_vl_datasets_ref, **fastdup_kwargs) 591 # run fastdup - create embeddings 592 if fastdup.run(self.\_set_fastdup_input(), work_dir=str(self.\_work_dir), **fastdup_kwargs) != 0: --> 593 raise RuntimeError('Fastdup execution failed') 595 #fastdup_convert_to_relpath(self.\_work_dir, self.\_filename_prefix) 596 597 # post process - map fastdup-id to image (for bbox this is done in self.\_set_fastdup_input) 598 if self.\_dtype == FD.IMG or self.\_run_mode == FD.MODE_CROP: RuntimeError: Fastdup execution failed" } <br>

Does fastdup support quicktime video (.mov)?

In the folder to scan, together with some photos coming from different iPhones, I have a bunch of .mov files (again, videos coming from iPhones). When I run() fastdup, I get a warning about zip files being present in the folders to examine. I don't have any zip or tar or any other form of compressed files in my folders. Is this error being triggered by the movs? Does fastdup support .mov videos?

All my HEIC images are marked as invalid?

I'm trying to declutter my photograph collection, following this tutorial (<https://dicksonneoh.com/blog/clean_up_your_digital_life/>) but I'm already failing in the first step. I've installed fastdup on a Raspberry Pi, and it tags all my HEIC files as invalid images (ERROR_LOADING_HEIC_IMAGE). I can't find any reference to that error, but the fact that it can't read any of them seems to point to an issue reading HEIC images, I would say. The images are, of course, valid (I can open them) and were taken with either an iPhone 8 or an iPhone 13. How can I make fastdup read HEIC files? Nothing in the documentation seems to indicate that something needs to be done to do it.

Change graph threshold after run was completed

Hi, first of all thanks for such a great tool, i've enjoyed using it so far. I've looked through the docs but couldn't find is it possible to adjust the threshold used for graph generation after the run has been complete, without having to compute all embeddings again?

Does FastDup upoad any image data or metadata to the cloud to process its results?

Hi, As the title really. I have image data that needs to be kept local and cannot be uploaded to anywhere offsite for data security and confidentiallity reasons.

Running forever fd.run(model_path='dinov2s', cc_threshold=0.8)

I'm using the google colab notebook located at: <https://colab.research.google.com/github/visual-layer/fastdup/blob/main/examples/dinov2_notebook.ipynb> When I run "fd.run(model_path='dinov2s', cc_threshold=0.8)" the console says below and seems to just run forever (its been about an hour). I'm using CPU in google colab. How long does this usually take? (using the oxford-iit pet dataset) Trying to download dinov2s model from <https://vl-company-website.s3.us-east-2.amazonaws.com/model_artifacts/dinov2/dinov2_vits14.onnx> to /root/dinov2_vits14.onnx Downloading: 10783/? [00:01<00:00, 6307.98KB/s] FastDup Software, (C) copyright 2022 Dr. Amir Alush and Dr. Danny Bickson.

Erorr when using Minio storage

Please help me with this issue Environment - Ubuntu 22.04 - Fastdup ver 1.64 - Python 3.10 - and already installed minio client as the guide When running these code: "coco_csv = 'coco_minitrain_25k/coco_minitrain_25k/annotations/coco_minitrain2017_15images.csv' coco_annotations = pd.read_csv(coco_csv, header=None, names=['filename', 'col_x', 'row_y', 'width', 'height', 'label', 'ext']) work_dir = 'fastdup_coco_25k' fd = fastdup.create(work_dir, 'minio://myminio/fastapi-minio/Fastdup_Input') fd.run(annotations=coco_annotations)" The error shows as below "Traceback (most recent call last): File "/home/anntt/Documents/AI_Testing/dockerExample/FastAPIExample/mainenv/lib/python3.10/site-packages/fastdup/sentry.py", line 132, in inner_function ret = func(\_args, **kwargs) File "/home/anntt/Documents/AI_Testing/dockerExample/FastAPIExample/mainenv/lib/python3.10/site-packages/fastdup/fastdup_controller.py", line 581, in run if fastdup.run(self.\_set_fastdup_input(), work_dir=str(self.\_work_dir), **fastdup_kwargs) != 0: File "/home/anntt/Documents/AI_Testing/dockerExample/FastAPIExample/mainenv/lib/python3.10/site-packages/fastdup/fastdup_controller.py", line 853, in \_set_fastdup_input df_annot['filename'] = df_annot[FD.ANNOT_FILENAME].apply(lambda fname: self.\_input_dir / fname) File "/home/anntt/Documents/AI_Testing/dockerExample/FastAPIExample/mainenv/lib/python3.10/site-packages/pandas/core/series.py", line 4760, in apply ).apply() File "/home/anntt/Documents/AI_Testing/dockerExample/FastAPIExample/mainenv/lib/python3.10/site-packages/pandas/core/apply.py", line 1207, in apply return self.apply_standard() File "/home/anntt/Documents/AI_Testing/dockerExample/FastAPIExample/mainenv/lib/python3.10/site-packages/pandas/core/apply.py", line 1287, in apply_standard mapped = obj.\_map_values( File "/home/anntt/Documents/AI_Testing/dockerExample/FastAPIExample/mainenv/lib/python3.10/site-packages/pandas/core/base.py", line 921, in \_map_values return algorithms.map_array(arr, mapper, na_action=na_action, convert=convert) File "/home/anntt/Documents/AI_Testing/dockerExample/FastAPIExample/mainenv/lib/python3.10/site-packages/pandas/core/algorithms.py", line 1814, in map_array return lib.map_infer(values, mapper, convert=convert) File "lib.pyx", line 2917, in pandas.\_libs.lib.map_infer File "/home/anntt/Documents/AI_Testing/dockerExample/FastAPIExample/mainenv/lib/python3.10/site-packages/fastdup/fastdup_controller.py", line 853, in <lambda> df_annot['filename'] = df_annot[FD.ANNOT_FILENAME].apply(lambda fname: self.\_input_dir / fname) TypeError: unsupported operand type(s) for /: 'str' and 'str' Traceback (most recent call last): File "/home/anntt/Documents/AI_Testing/Fastdup_Mislabel.py", line 19, in <module> fd.run(annotations=coco_annotations) File "/home/anntt/Documents/AI_Testing/dockerExample/FastAPIExample/mainenv/lib/python3.10/site-packages/fastdup/engine.py", line 157, in run return super().run(annotations=annotations, input_dir=input_dir, subset=subset, data_type=data_type, File "/home/anntt/Documents/AI_Testing/dockerExample/FastAPIExample/mainenv/lib/python3.10/site-packages/fastdup/sentry.py", line 138, in inner_function raise ex File "/home/anntt/Documents/AI_Testing/dockerExample/FastAPIExample/mainenv/lib/python3.10/site-packages/fastdup/sentry.py", line 132, in inner_function ret = func(\_args, **kwargs) File "/home/anntt/Documents/AI_Testing/dockerExample/FastAPIExample/mainenv/lib/python3.10/site-packages/fastdup/fastdup_controller.py", line 581, in run if fastdup.run(self.\_set_fastdup_input(), work_dir=str(self.\_work_dir), **fastdup_kwargs) != 0: File "/home/anntt/Documents/AI_Testing/dockerExample/FastAPIExample/mainenv/lib/python3.10/site-packages/fastdup/fastdup_controller.py", line 853, in \_set_fastdup_input df_annot['filename'] = df_annot[FD.ANNOT_FILENAME].apply(lambda fname: self.\_input_dir / fname) File "/home/anntt/Documents/AI_Testing/dockerExample/FastAPIExample/mainenv/lib/python3.10/site-packages/pandas/core/series.py", line 4760, in apply ).apply() File "/home/anntt/Documents/AI_Testing/dockerExample/FastAPIExample/mainenv/lib/python3.10/site-packages/pandas/core/apply.py", line 1207, in apply return self.apply_standard() File "/home/anntt/Documents/AI_Testing/dockerExample/FastAPIExample/mainenv/lib/python3.10/site-packages/pandas/core/apply.py", line 1287, in apply_standard mapped = obj.\_map_values( File "/home/anntt/Documents/AI_Testing/dockerExample/FastAPIExample/mainenv/lib/python3.10/site-packages/pandas/core/base.py", line 921, in \_map_values return algorithms.map_array(arr, mapper, na_action=na_action, convert=convert) File "/home/anntt/Documents/AI_Testing/dockerExample/FastAPIExample/mainenv/lib/python3.10/site-packages/pandas/core/algorithms.py", line 1814, in map_array return lib.map_infer(values, mapper, convert=convert) File "lib.pyx", line 2917, in pandas.\_libs.lib.map_infer File "/home/anntt/Documents/AI_Testing/dockerExample/FastAPIExample/mainenv/lib/python3.10/site-packages/fastdup/fastdup_controller.py", line 853, in <lambda> df_annot['filename'] = df_annot[FD.ANNOT_FILENAME].apply(lambda fname: self.\_input_dir / fname) TypeError: unsupported operand type(s) for /: 'str' and 'str'" Please help me check this issue. Thanks and Best regards

Adding AWS Lambda tutorial to Documentation.

Hello. I've been using fastdup for months and thank you for this project. I saw s3 support to fastdup and wondered can I do it on AWS Lambda. I tried it out and after some struggling I was able to do it. Now I'm going to write a Medium documentation about how to deploy aws lambda container on ECR to Lambda with amazonlinux base image (usually lambda base image is used for container but for example fastdup is not available in lambda image hence had to use amazonlinux base image). I used fastdup in this project. If it will be helpful to other people like me. I'm ready to share with you guys once I've done it. Just to be contributing to the community.

Fastdup version check: warning vs. RuntimeError

The RuntimeError returned by the version check in **init**.py forces users to upgrade to the newest versions of Fastdup, which may not always be compatible with other versions of libraries that the user has intentionally selected for other reasons. Because you release new versions of Fastdup so frequently, this error is encountered again very quickly even after the user resolves his or her dependency issues. I propose switching this version check to a warning rather than a RuntimeError. My temporary fix right now is to catch the RuntimeError during imports, but I think a warning is more appropriate. <https://github.com/visual-layer/fastdup/blob/156a8521741fc9bfce8cceab34d56306ebb0d958/fastdup/**init**.py#L52>

Ignore directory name patterns.

I'm running `fastdup` inside my Synology DS220+ NAS. I'm using Synology Photos to upload my photos to my NAS, but it creates a thumbnail photo of videos and photos inside `/@eaDir` directory. For example: 1. `/Photos/MobileBackup/iPhone/2021/12/IMG_9214.JPG` 2. `/Photos/MobileBackup/iPhone/2021/12/@eaDir/IMG_9214.JPG/SYNOPHOTO_THUMB_SM.jpg` Is there a way I can specify a list of directories not to look into? Or maybe even a regular expresion, for example if the name contains "SYNOPHOTO_THUMB_", or something.