A crazy fast scanner for media image files, useful to pre-sort media from digital estates.
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

826 B

crazy-fast-media-image-scan

A crazy fast scanner for media image files, useful to pre-sort media from digital estates.

The project is a little research project to evaluate if a random sample based media type scanner with details on file level is possible.

The ideas are following:

  • random sampling to improve scanning (we need very fast, not very accurate results)
  • category check (what kind of data could be there in general?)
  • filetype identification using bigram based estimation, learned by decision tree over files (using format-corpus https://github.com/openpreserve/format-corpus and Mime::Types)

TODO ideas

  • plot typebased output (color?) to see distribution over media
  • improved autotune to scan only "few seconds"
  • scan EWF images, too

This will be the base for an upcoming standalone application.