Welcome to fastdup docs page

Extract insights from your image & video datasets. Find and remove duplicates, outliers, mislabels, and non-useful images from your datasets - fast and at scale!

fastdup is a powerful free tool designed to rapidly extract valuable insights from your image & video datasets. Assisting you to increase your dataset quality and reduce your data operations costs at an unparalleled scale.

GitHub Stars GitHub Contributors Downloads

Quality

Achieve high-quality datasets effortlessly with fastdup. Detect and eliminate anomalies and outliers, identify duplicate and near-duplicate images and videos, and recognize of similarity clusters scale.

Cost

fastdup can also help you in reducing your data operations costs by facilitating the intelligent sampling of high-quality or novel datasets prior to labeling, as well as support the quality assessment of labeled data.

Scale

fastdup supports image and video use-cases at different stages of the data pipeline, before labeling, after labeling, and during inference.

Big Datasets Are a Mess

We believe we can help you achieve better results. It only takes three lines of code to get started.

Read our blog post, where we expose quality issues present in many widely used academic datasets.

As a preview, the left clip showcases our LAION 400M image dataset analysis. This was done using a single CPU instance in just a few hours.

VL Profiler

Introducing VL Profiler - A faster and easier way to diagnose and visualize dataset issues

The team behind fastdup also recently launched VL Profiler, a no-code cloud-based platform that lets you leverage fastdup in the browser.

vl-profiler

VL Profiler lets you find duplicates/near-duplicates, outliers, mislabels and non-useful images.

Use VL Profiler for free to analyze issues on your dataset with up to 1,000,000 images.

Interactive Exploration

Not convinced yet? Interact with a collection of dataset like ImageNet-21K, COCO, and DeepFashion here. No sign-ups needed.

[New] Introducing fastdup V1.0 🎉

  1. Clean & simple API: The new API is simpler to use, and fully backward compatible with older API
  2. Native Windows support: Windows now has first-class, full feature support in fastdup
  3. Amazing documentation: New and imporved fasdtdup documentation
  4. Sleek galleries: New and improved galleries to get a better view of your data
  5. Extensive labels support: Improved support for handling image and bounding box labels
  6. Support for additional image formats: Apple’s HEIC+HEIF, 16 bit grayscale TIFF
  7. Support for Python3.10
  8. Fully backward compatible to previous API

Register now to gain early access to fastdup enterprise:

Gain access to our hosted cloud-based visual data store, which offers advanced visualization and quality metrics for labels and metadata, and enables you to explore, slice, share, and export your data effortlessly.
Find insights quickly, send for annotations and asses the quality of results. Export reports to PDFs and HTML to share on slack or with stakeholders.