Discussions
Erorr when using Minio storage
Please help me with this issue
Environment
- Ubuntu 22.04
- Fastdup ver 1.64
- Python 3.10
- and already installed minio client as the guide
When running these code:
"coco_csv = 'coco_minitrain_25k/coco_minitrain_25k/annotations/coco_minitrain2017_15images.csv'
coco_annotations = pd.read_csv(coco_csv, header=None, names=['filename', 'col_x', 'row_y',
'width', 'height', 'label', 'ext'])
work_dir = 'fastdup_coco_25k'
fd = fastdup.create(work_dir, 'minio://myminio/fastapi-minio/Fastdup_Input')
fd.run(annotations=coco_annotations)"
The error shows as below
"Traceback (most recent call last):
File "/home/anntt/Documents/AI_Testing/dockerExample/FastAPIExample/mainenv/lib/python3.10/site-packages/fastdup/sentry.py", line 132, in inner_function
ret = func(\_args, **kwargs)
File "/home/anntt/Documents/AI_Testing/dockerExample/FastAPIExample/mainenv/lib/python3.10/site-packages/fastdup/fastdup_controller.py", line 581, in run
if fastdup.run(self.\_set_fastdup_input(), work_dir=str(self.\_work_dir), **fastdup_kwargs) != 0:
File "/home/anntt/Documents/AI_Testing/dockerExample/FastAPIExample/mainenv/lib/python3.10/site-packages/fastdup/fastdup_controller.py", line 853, in \_set_fastdup_input
df_annot['filename'] = df_annot[FD.ANNOT_FILENAME].apply(lambda fname: self.\_input_dir / fname)
File "/home/anntt/Documents/AI_Testing/dockerExample/FastAPIExample/mainenv/lib/python3.10/site-packages/pandas/core/series.py", line 4760, in apply
).apply()
File "/home/anntt/Documents/AI_Testing/dockerExample/FastAPIExample/mainenv/lib/python3.10/site-packages/pandas/core/apply.py", line 1207, in apply
return self.apply_standard()
File "/home/anntt/Documents/AI_Testing/dockerExample/FastAPIExample/mainenv/lib/python3.10/site-packages/pandas/core/apply.py", line 1287, in apply_standard
mapped = obj.\_map_values(
File "/home/anntt/Documents/AI_Testing/dockerExample/FastAPIExample/mainenv/lib/python3.10/site-packages/pandas/core/base.py", line 921, in \_map_values
return algorithms.map_array(arr, mapper, na_action=na_action, convert=convert)
File "/home/anntt/Documents/AI_Testing/dockerExample/FastAPIExample/mainenv/lib/python3.10/site-packages/pandas/core/algorithms.py", line 1814, in map_array
return lib.map_infer(values, mapper, convert=convert)
File "lib.pyx", line 2917, in pandas.\_libs.lib.map_infer
File "/home/anntt/Documents/AI_Testing/dockerExample/FastAPIExample/mainenv/lib/python3.10/site-packages/fastdup/fastdup_controller.py", line 853, in <lambda>
df_annot['filename'] = df_annot[FD.ANNOT_FILENAME].apply(lambda fname: self.\_input_dir / fname)
TypeError: unsupported operand type(s) for /: 'str' and 'str'
Traceback (most recent call last):
File "/home/anntt/Documents/AI_Testing/Fastdup_Mislabel.py", line 19, in <module>
fd.run(annotations=coco_annotations)
File "/home/anntt/Documents/AI_Testing/dockerExample/FastAPIExample/mainenv/lib/python3.10/site-packages/fastdup/engine.py", line 157, in run
return super().run(annotations=annotations, input_dir=input_dir, subset=subset, data_type=data_type,
File "/home/anntt/Documents/AI_Testing/dockerExample/FastAPIExample/mainenv/lib/python3.10/site-packages/fastdup/sentry.py", line 138, in inner_function
raise ex
File "/home/anntt/Documents/AI_Testing/dockerExample/FastAPIExample/mainenv/lib/python3.10/site-packages/fastdup/sentry.py", line 132, in inner_function
ret = func(\_args, **kwargs)
File "/home/anntt/Documents/AI_Testing/dockerExample/FastAPIExample/mainenv/lib/python3.10/site-packages/fastdup/fastdup_controller.py", line 581, in run
if fastdup.run(self.\_set_fastdup_input(), work_dir=str(self.\_work_dir), **fastdup_kwargs) != 0:
File "/home/anntt/Documents/AI_Testing/dockerExample/FastAPIExample/mainenv/lib/python3.10/site-packages/fastdup/fastdup_controller.py", line 853, in \_set_fastdup_input
df_annot['filename'] = df_annot[FD.ANNOT_FILENAME].apply(lambda fname: self.\_input_dir / fname)
File "/home/anntt/Documents/AI_Testing/dockerExample/FastAPIExample/mainenv/lib/python3.10/site-packages/pandas/core/series.py", line 4760, in apply
).apply()
File "/home/anntt/Documents/AI_Testing/dockerExample/FastAPIExample/mainenv/lib/python3.10/site-packages/pandas/core/apply.py", line 1207, in apply
return self.apply_standard()
File "/home/anntt/Documents/AI_Testing/dockerExample/FastAPIExample/mainenv/lib/python3.10/site-packages/pandas/core/apply.py", line 1287, in apply_standard
mapped = obj.\_map_values(
File "/home/anntt/Documents/AI_Testing/dockerExample/FastAPIExample/mainenv/lib/python3.10/site-packages/pandas/core/base.py", line 921, in \_map_values
return algorithms.map_array(arr, mapper, na_action=na_action, convert=convert)
File "/home/anntt/Documents/AI_Testing/dockerExample/FastAPIExample/mainenv/lib/python3.10/site-packages/pandas/core/algorithms.py", line 1814, in map_array
return lib.map_infer(values, mapper, convert=convert)
File "lib.pyx", line 2917, in pandas.\_libs.lib.map_infer
File "/home/anntt/Documents/AI_Testing/dockerExample/FastAPIExample/mainenv/lib/python3.10/site-packages/fastdup/fastdup_controller.py", line 853, in <lambda>
df_annot['filename'] = df_annot[FD.ANNOT_FILENAME].apply(lambda fname: self.\_input_dir / fname)
TypeError: unsupported operand type(s) for /: 'str' and 'str'"
Please help me check this issue.
Thanks and Best regards
Posted by Nguyen Thuy An 28 days ago
Adding AWS Lambda tutorial to Documentation.
Hello. I've been using fastdup for months and thank you for this project. I saw s3 support to fastdup and wondered can I do it on AWS Lambda. I tried it out and after some struggling I was able to do it.
Now I'm going to write a Medium documentation about how to deploy aws lambda container on ECR to Lambda with amazonlinux base image (usually lambda base image is used for container but for example fastdup is not available in lambda image hence had to use amazonlinux base image). I used fastdup in this project.
If it will be helpful to other people like me. I'm ready to share with you guys once I've done it. Just to be contributing to the community.
Posted by atahan bulus 3 months ago
Fastdup version check: warning vs. RuntimeError
The RuntimeError returned by the version check in **init**.py forces users to upgrade to the newest versions of Fastdup, which may not always be compatible with other versions of libraries that the user has intentionally selected for other reasons. Because you release new versions of Fastdup so frequently, this error is encountered again very quickly even after the user resolves his or her dependency issues.
I propose switching this version check to a warning rather than a RuntimeError. My temporary fix right now is to catch the RuntimeError during imports, but I think a warning is more appropriate.
<https://github.com/visual-layer/fastdup/blob/156a8521741fc9bfce8cceab34d56306ebb0d958/fastdup/**init**.py#L52>
Posted by Michael Chifala 5 months ago
Ignore directory name patterns.
I'm running `fastdup` inside my Synology DS220+ NAS.
I'm using Synology Photos to upload my photos to my NAS, but it creates a thumbnail photo of videos and photos inside `/@eaDir` directory.
For example:
1. `/Photos/MobileBackup/iPhone/2021/12/IMG_9214.JPG`
2. `/Photos/MobileBackup/iPhone/2021/12/@eaDir/IMG_9214.JPG/SYNOPHOTO_THUMB_SM.jpg`
Is there a way I can specify a list of directories not to look into? Or maybe even a regular expresion, for example if the name contains "SYNOPHOTO_THUMB_", or something.
Posted by Pablo Garcia 5 months ago
How to define the repeated images or how to select threshold?
Thank you for your project. I recently worked on a batch of data extracted from video captured by some fixed cameras(For object detection tasks, random date and random time sampling, with large intervals). Visualization reveals that this batch of data has a highly consistent background, with only a few different goals (target people, motor vehicles, non-motor vehicles). In fastdup, the result is usually above 0.9 points. The background will have an impact on the model, which may cause the model to learn some special background distribution information, and the generalization will be reduced due to the speculation (sorry I haven't had time to do the comparison experiment,and it may be tested within a week). Have you ever done a similar experiment? Perhaps the definition of duplicate images needs to be extended a bit for object detection tasks? I also used imagedups for a comparative test. (<https://github.com/chinalu/imagedups> use default parameter) .Of the 20,000 images, 515 were determined to be duplicated and cleared. The top 515 in fastdup scored 0.95 or more.How should I choose the threshold (in other experiments such as ocr/ license plate detection data, I found that 0.95 is not a good threshold)?
Looking forward to your reply, thank you.
Posted by Yadong Wang 6 months ago
Welcome to fastdup V1.0!
Welcome, we are Visual-Layer, creators of Fastdup. We'd be happy to answer any questions, support topics and feature requests. We are also highly available on the [Fastdup Slack channel](https://join.slack.com/t/visualdatabase/shared_invite/zt-19jaydbjn-lNDEDkgvSI1QwbTXSY6dlA)
Posted by Amir Markovitz 9 months ago