Ingest datasets metadata into MDS, upload datasets to S3.
Precondition: There is a collection of HDF5 files in a POSIX file system that Hyrax can access.
Postcondition: Those files are moved to S3 and users can access the data in those files using Hyrax in exactly the same way as when those files were in the POSIX file system.
The software must:
Move data to S3
Put DMR++ responses in the MDS using pathname that match POSIX filesystem.
Put all the metadata responses in the MDS - this should be done in the 'Streamline MDS population' story
Implement this by adding functionality to the scripts/utility(ies) that are developed to 'populate the MDS.'
Use the site map command to find the files to operate on
For each file, populate MDS and move to S3
Write the path of each file successfully operated on into a file
Given that some dataset foo.h5 has DDS, ..., DMR++ responses in the MDS, and is now only stored in S3, how does a user know about foo.h5?
A design for this involves three things:
We need access to files to build the DMR++ response.
Users need to reference these datasets using some path, and we have to be able to show that path to the users, even though the data files are no longer on the POSIX file system where the server initially read them to build the DMR++ responses.
Users will provide the pathnames they are shown when they make requests and the MDS will use those paths (really strings, which can be arbitrary) to look for different responses.
The structure we show users for the data must mirror the structure used in the MDS.
The software that does this should also be responsible for moving the data files to S3. This means that it will use the S3 access URL when it writes the DMR++ response that is added to the MDS.