Browse Source

Released 0.071

master v0.071
Andreas Romeyke 1 month ago
parent
commit
db75e0e9fd
2 changed files with 338 additions and 60 deletions
  1. +53
    -3
      Changes
  2. +285
    -57
      README.mkdn

+ 53
- 3
Changes View File

@ -1,11 +1,61 @@
==================================================
Changes from 2020-01-04 00:00:00 +0000 to present.
Changes from 2020-01-15 00:00:00 +0000 to present.
==================================================
------------------------------------------
version 0.070 at 2021-01-03 13:19:59 +0000
version 0.071 at 2021-01-14 14:09:17 +0000
------------------------------------------
Change: 9bbadc5fecc04ff09dc0de084b5ce71a589f737a
Author: Andreas Romeyke <art1@andreas-romeyke.de>
Date : 2021-01-14 15:04:42 +0000
- refactoring, extracted __handle_nonportable_local_entry() from
__file_find()
Change: ef518a167a1b70558ec7b2765f23d2f0456f243d
Author: Andreas Romeyke <art1@andreas-romeyke.de>
Date : 2021-01-14 12:16:18 +0000
- refactored, using Archive::BagIt instead Archive::BagIt::Base
Change: 102d47a9baff0a411f499d975904ecbae3b79780
Author: Andreas Romeyke <art1@andreas-romeyke.de>
Date : 2021-01-14 12:05:04 +0000
- fixed broken test bag
Change: 702efc1c1bf15b8513662f02995e5593456230e0
Author: Andreas Romeyke <art1@andreas-romeyke.de>
Date : 2021-01-14 12:04:47 +0000
- fixed tc access to private routine
Change: f2827d9a70466656412dcd920d14e1406a756aeb
Author: Andreas Romeyke <art1@andreas-romeyke.de>
Date : 2021-01-14 12:04:11 +0000
- removed, because Archive::BagIt::DotBagIt removed
Change: be36b96ac8c01ff746ed2a2ba06f9de7c8556a80
Author: Andreas Romeyke <art1@andreas-romeyke.de>
Date : 2021-01-14 11:50:18 +0000
- refactoring, replaced Archive::BagIt by Archive::BagIt::Base -
added dummy for Archive::BagIt::Base for backwards compatibility
Change: e1dcd82f379b62835d51973e548aef25de412cd7
Author: Andreas Romeyke <art1@andreas-romeyke.de>
Date : 2021-01-14 11:49:27 +0000
- removed DotBagIt
Change: 90733792184aa5187e8a966aa3f52c7c65cea181
Author: Andreas Romeyke <art1@andreas-romeyke.de>
Date : 2021-01-03 14:20:20 +0000
Released 0.070
Change: 331ea58f0c7cb18009d53800f8f791f9ba550521
Author: Andreas Romeyke <art1@andreas-romeyke.de>
Date : 2021-01-03 14:18:12 +0000
@ -1648,5 +1698,5 @@ version 0.055 at 2020-04-15 09:44:51 +0000
Adding a plugin structure
================================================
Plus 9 releases after 2020-01-04 00:00:00 +0000.
Plus 9 releases after 2020-01-15 00:00:00 +0000.
================================================

+ 285
- 57
README.mkdn View File

@ -1,103 +1,142 @@
# NAME
Archive::BagIt - An interface to make and verify bags according to the BagIt standard
Archive::BagIt - The main module to handle bags.
# VERSION
version 0.070
version 0.071
# WARNING
# NAME
This is experimental software for the moment and under active development.
Achive::BagIt - The main module to handle Bags
Under the hood, the module Archive::BagIt::Base was adapted and extended to
support BagIt 1.0 according to RFC 8493 (\[https://tools.ietf.org/html/rfc8493\](https://tools.ietf.org/html/rfc8493)).
# SOURCE
Also: Check out Archive::BagIt::Fast if you are willing to add some extra dependencies to get
better speed by mmap-ing files.
The original development version was on github at [http://github.com/rjeschmi/Archive-BagIt](http://github.com/rjeschmi/Archive-BagIt)
and may be cloned from there.
# SUBROUTINES
The actual development version is available at [https://art1pirat.spdns.org/art1/Archive-BagIt](https://art1pirat.spdns.org/art1/Archive-BagIt)
## new
# Conformance to RFC8493
An Object Oriented Interface to a bag. Opens an existing bag.
The module should fulfill the RFC requirements, with following limitations:
my $bag = Archive::BagIt->new('/path/to/bag');
- only encoding UTF-8 is supported
- version 0.97 or 1.0 allowed
- version 0.97 requires tag-/manifest-files with md5-fixity
- version 1.0 requires tag-/manifest-files with sha512-fixity
- BOM is not supported
- Carriage Return in bagit-files are not allowed
- fetch.txt is unsupported
## make\_bag
At the moment only filepaths in linux-style are supported.
A constructor that will make and return a bag from a directory
To get an more detailled overview, see the testsuite under `t/verify_bag.t` and corresponding test bags from the BagIt conformance testsuite of Library of Congress under `bagit_conformance_suite/`.
If a data directory exists, assume it is already a bag (no checking for invalid files in root)
See [https://datatracker.ietf.org/doc/rfc8493/?include\_text=1](https://datatracker.ietf.org/doc/rfc8493/?include_text=1) for details.
## verify\_bag
# TODO
An interface to verify a bag.
- enhanced testsuite
- reduce complexity
- use modern perl code
- add flag to enable very strict verify
You might also want to check [Archive::BagIt::Fast](https://metacpan.org/pod/Archive%3A%3ABagIt%3A%3AFast) to see a more direct way of
accessing files (and thus faster).
# FAQ
## get\_checksum
## How to access the manifest-entries directly?
This is the checksum for the bag, md5 of the manifest-md5.txt
Try this:
## version
foreach my $algorithm ( keys %{ $self->manifests }) {
my $entries_ref = $self->manifests->{$algorithm}->manifest_entries();
# $entries_ref returns a hashref of form:
# $entries_ref->{$algorithm}->{$file} = $digest;
}
Returns the bagit version according to the bagit.txt file.
Similar for tagmanifests
## payload\_files
## How fast is `Archive::BagIt::Fast`?
Returns an array with all of the payload files (those files that are below the data directory)
It depends. On my system with SSD and a 38MB bag with 48 payload files the results for `verify_bag()` are:
## non\_payload\_files
Rate Base Fast
Base 102% -- -10%
Fast 125% 11% --
Returns an array with files that are in the root of the bag, non-manifest files
On network filesystem (CIFS, 1Gb) with same Bag:
## manifest\_files
Rate Fast Base
Fast 2.20/s -- -11%
Base 2.48/s 13% --
Return an array with the list of manifest files that exist in the bag
But you should measure which variant is best for you. In general the default `Archive::BagIt` is fast enough.
## tagmanifest\_files
## How to update an old bag of version v0.97 to v1.0?
Return an array with the list of tagmanifest files
You could try this:
# SOURCE
use Archive::BagIt;
my $bag=Archive::BagIt->new( $my_old_bag_filepath );
$bag->load();
$bag->store();
The original development version is on github at [http://github.com/rjeschmi/Archive-BagIt](http://github.com/rjeschmi/Archive-BagIt)
and may be cloned from there.
## How to create UTF-8 based paths under MS Windows?
The actual development version is available at [https://art1pirat.spdns.org/art1/Archive-BagIt](https://art1pirat.spdns.org/art1/Archive-BagIt)
For versions < Windows10: I have no idea and suggestions for a portable solution are very welcome!
For Windows 10: Thanks to [https://superuser.com/questions/1033088/is-it-possible-to-set-locale-of-a-windows-application-to-utf-8/1451686#1451686](https://superuser.com/questions/1033088/is-it-possible-to-set-locale-of-a-windows-application-to-utf-8/1451686#1451686)
you have to enable UTF-8 support via 'System Administration' -> 'Region' -> 'Administrative'
\-> 'Region Settings' -> Flag 'Use Unicode UTF-8 for worldwide language support'
# SUPPORT
Hint: The better way is to use only portable filenames. See [perlport](https://metacpan.org/pod/perlport) for details.
You can find documentation for this module with the perldoc command.
# SYNOPSIS
perldoc Archive::BagIt
This modules will hopefully help with the basic commands needed to create
and verify a bag. This part supports BagIt 1.0 according to RFC 8493 (\[https://tools.ietf.org/html/rfc8493\](https://tools.ietf.org/html/rfc8493)).
You can also look for information at:
You only need to know the following methods first:
- RT: CPAN's request tracker (report bugs here)
## read a BagIt
[http://rt.cpan.org/NoAuth/Bugs.html?Dist=Archive-BagIt](http://rt.cpan.org/NoAuth/Bugs.html?Dist=Archive-BagIt)
use Archive::BagIt;
- AnnoCPAN: Annotated CPAN documentation
#read in an existing bag:
my $bag_dir = "/path/to/bag";
my $bag = Archive::BagIt->new($bag_dir);
[http://annocpan.org/dist/Archive-BagIt](http://annocpan.org/dist/Archive-BagIt)
## construct a BagIt around a payload
- CPAN Ratings
use Archive::BagIt;
my $bag2 = Archive::BagIt->make_bag($bag_dir);
[http://cpanratings.perl.org/d/Archive-BagIt](http://cpanratings.perl.org/d/Archive-BagIt)
## verify a BagIt-dir
- Search CPAN
use Archive::BagIt;
[http://search.cpan.org/dist/Archive-BagIt/](http://search.cpan.org/dist/Archive-BagIt/)
# Validate a BagIt archive against its manifest
my $bag3 = Archive::BagIt->new($bag_dir);
my $is_valid1 = $bag3->verify_bag();
# SYNOPSIS
# Validate a BagIt archive against its manifest, report all errors
my $bag4 = Archive::BagIt->new($bag_dir);
my $is_valid2 = $bag4->verify_bag( {report_all_errors => 1} );
This modules will hopefully help with the basic commands needed to create
and verify a bag. My intention is not to be strict and enforce all of the
specification. The reference implementation is the java version
and I will endeavour to maintain compatibility with it.
## read a BagIt-dir, change something, store
Because all methods operate lazy, you should ensure to parse parts of the bag \*BEFORE\* you modify it.
Otherwise it will be overwritten!
use Archive::BagIt;
my $bag5 = Archive::BagIt->new($bag_dir); # lazy, nothing happened
$bag5->load(); # this updates the object representation by parsing the given $bag_dir
$bag5->store(); # this writes the bag new
# METHODS
## Constructor
The constructor sub, will create a bag with a single argument,
use Archive::BagIt;
@ -105,13 +144,202 @@ and I will endeavour to maintain compatibility with it.
my $bag_dir = "/path/to/bag";
my $bag = Archive::BagIt->new($bag_dir);
or use hashreferences
#construct bag in an existing directory
my $bag2 = Archive::BagIt->make_bag($bag_dir);
use Archive::BagIt;
# Validate a BagIt archive against its manifest
my $bag3 = Archive::BagIt->new($bag_dir);
my $is_valid = $bag3->verify_bag();
#read in an existing bag:
my $bag_dir = "/path/to/bag";
my $bag = Archive::BagIt->new(
bag_path => $bag_dir,
);
The arguments are:
- `bag_path` - path to bag-directory
- `force_utf8` - if set the warnings about non portable filenames are disabled (default: enabled)
The bag object will use $bag\_dir, BUT an existing $bag\_dir is not read. If you use `store()` an existing bag will be overwritten!
See `load()` if you want to parse/modify an existing bag.
## has\_force\_utf8()
to check if force\_utf8() was set.
If set it ignores warnings about potential filepath problems.
## bag\_path(\[$new\_value\])
Getter/setter for bag path
## metadata\_path()
Getter for metadata path
## payload\_path()
Getter for payload path
## checksum\_algos()
Getter for registered Checksums
## bag\_version()
Getter for bag version
## bag\_encoding()
Getter for bag encoding.
HINT: the current version of Archive::BagIt only supports UTF-8, but the method could return other values depending on given Bags.
## bag\_info(\[$new\_value\])
Getter/Setter for bag info. Expects/returns an array of HashRefs implementing simple key-value pairs.
HINT: RFC8493 does not allow \*reordering\* of entries!
## has\_bag\_info()
returns true if bag info exists.
## errors()
Getter to return collected errors after a `verify_bag()` call with Option `report_all_errors`
## digest\_callback()
This method could be reimplemented by derived classes to handle fixity checks in own way. The
getter returns an anonymous function with following interface:
my $digest = $self->digest_callback;
&$digest( $digestobject, $filename);
This anonymous function MUST use the `get_hash_string()` function of the `Archive::BagIt::Role::Algorithm` role,
which is implemented by each `Archive::BagIt::Plugin::Algorithm::XXXX` module.
See `Archive::BagIt::Fast` for details.
## get\_baginfo\_values\_by\_key($searchkey)
Returns all values which match $searchkey, undef otherwise
## is\_baginfo\_key\_reserved\_as\_uniq($searchkey)
returns true if key is reserved and should be uniq
## is\_baginfo\_key\_reserved( $searchkey )
returns true if key is reserved
## verify\_baginfo()
checks baginfo-keys, returns true if all fine, otherwise returns undef and the message is pushed to `errors()`.
## delete\_baginfo\_by\_key( $searchkey )
deletes an entry of given $searchkey if exists
## exists\_baginfo\_key( $searchkey )
returns true if a given $searchkey exists
## append\_baginfo\_by\_key($searchkey, $newvalue)
Appends a key value pair to bag\_info.
HINT: check return code if append was successful, because some keys needs to be uniq.
## add\_or\_replace\_baginfo\_by\_key($searchkey, $newvalue)
It replaces the first entry with $newvalue if $searchkey exists, otherwise it appends.
## forced\_fixity\_algorithm()
Getter to return the forced fixity algorithm depending on BagIt version
## manifest\_files()
Getter to find all manifest-files
## tagmanifest\_files()
Getter to find all tagmanifest-files
## payload\_files()
Getter to find all payload-files
## non\_payload\_files()
Getter to find all non payload-files
## plugins()
Getter/setter to algorithm plugins
## manifests()
Getter/Setter to all manifests (objects)
## algos()
Getter/Setter to all registered Algorithms
## load\_plugins
As default SHA512 and MD5 will be loaded and therefore used. If you want to create a bag only with one or a specific
checksum-algorithm, you could use this method to (re-)register it. It expects list of strings with namespace of type:
Archive::BagIt::Plugin::Algorithm::XXX where XXX is your chosen fixity algorithm.
## load()
Triggers loading of an existing bag
## verify\_bag($opts)
A method to verify a bag deeply. If `$opts` is set with `{return_all_errors}` all fixity errors are reported.
The default ist to croak with error message if any error is detected.
HINT: You might also want to check Archive::BagIt::Fast to see a more direct way of accessing files (and thus faster).
## calc\_payload\_oxum()
returns an array with octets and streamcount of payload-dir
## calc\_bagsize()
returns a string with human readable size of paylod
## create\_bagit()
creates a bagit.txt file
## create\_baginfo()
creates a bag-info.txt file
Hint: the entries 'Bagging-Date', 'Bag-Software-Agent', 'Payload-Oxum' and 'Bag-Size' will be automagically set,
existing values in internal bag-info representation will be overwritten!
## store()
store a bagit-obj if bagit directory-structure was already constructed.
## init\_metadata()
A constructor that will just create the metadata directory
This won't make a bag, but it will create the conditions to do that eventually
## make\_bag( $bag\_path )
A constructor that will make and return a bag from a directory,
It expects a preliminary bagit-dir exists.
If there a data directory exists, assume it is already a bag (no checking for invalid files in root)
# AVAILABILITY


Loading…
Cancel
Save