||5 months ago|
|bin||6 months ago|
|lib/File/FormatIdentification||5 months ago|
|t||5 months ago|
|Changes||6 months ago|
|LICENSE.txt||8 months ago|
|README.md||6 months ago|
|dist.ini||5 months ago|
A crazy fast scanner for media image files, useful to pre-sort media from digital estates.
The project is a little research project to evaluate if a random sample based media type scanner with details on file level is possible.
The ideas are following:
- random sampling to improve scanning (we need very fast, not very accurate results)
- category check (what kind of data could be there in general?)
- filetype identification using bigram based estimation, learned by decision tree over files (using format-corpus https://github.com/openpreserve/format-corpus and Mime::Types)
- plot typebased output (color?) to see distribution over media
- improved autotune to scan only "few seconds"
- scan EWF images, too
This will be the base for an upcoming standalone application.