Metadata#
EDL metadata and data compliance is what makes a regular directory hierarchy into an structure
readable by tools supporting EDL.
In order for a directory to be recognized as part of an EDL hierarchy, a manifest.toml
file
must be present.
Directory/Unit Names#
A directory name is also the name of the EDL unit, be it a collection, group or dataset. In order to be somewhat sane and also portable to non-UNIX systems, some constraints apply to directory naming:
Only printable characters may be used.
No special characters and punctuation is permitted, except for dots, hyphens, underscores and plus-signs (
.-_+
).Names must not start or end with a dot.
Unicode characters are permitted, but users are encouraged to stick to ASCII characters if possible. EDL implementations must convert from the system’s native encoding to UTF-8 when reading names.
The maximum length of a name is capped at 255 characters.
MS-DOS device names (such as AUX) are forbidden, as Windows will not permit directories of that name.
EDL names are handled in a case-sensitive way, yet two units in the same directory must not have the same name when both are converted to lowercase letters.
It is recommended to not start unit names with a number, and encouraged to only use lowercase names
Common Metadata#
The manifest.toml
file contains basic information about an organizational unit in EDL.
It must be a valid TOML 1.0 file and pass TOML
compliance tests.
A minimal file may look like this:
format_version = "1"
type = "group"
collection_id = "49db9875-c0a2-4f70-8ba4-ec00a4e6be9c"
time_created = 2020-05-08T17:23:06+02:00
The following toplevel keys can or must be present in every manifest.toml
file:
format_version
#
[required, string] A string containing the version of the EDL specification used to create this unit and metadata. May be used in future in case changes are made to EDL.
type
#
[required, string] A string denoting the type of this organizational unit. May be one of collection
, group
or dataset
, depending
on the respective unit type. Some unit types may require the presence of additional metadata.
collection_id
#
[required, string] A Universally unique identifier (UUID) version 4 which is unique for the collection that the respective unit is part of. This ID is purposefully not human-readable. It is intended as a unique identifier for the given experiment, to be used by data processing pipelines and other programs handling the data.
Within EDL, the last characters of this UUID may also be used in names for data, to make it possible to infer where the data originated from in case a user copied it out of the EDL layout and sent it somewhere else without its metadata.
In case no collection UUID exists (yet), the ID string may be an invalid UUID consisting only of zeros.
time_created
#
[required, datetime] The creation time of this EDL unit, as an RFC 3339 formatted date-time with offset. The offset must be present in case the data is shifted between group members in different timezones. For this field, TOMLs native date-time type support is used.
generator
#
[optional, string] Name and version of the tool that generated this metadata/data/EDL unit.
Collections#
Collections are the root of each EDL directory tree. Their type
is collection
.
In addition to the common keys in their manifest.toml
, they may also contain the following entries:
generator
#
[recommended, string] Name and version of the tool that generated this data collection, if there was one. This is usually used by tools like Syntalos.
Manifest Example:#
collection_id = "49db9875-c0a2-4f70-8ba4-ec00a4e6be9c"
format_version = "1"
generator = "Syntalos 1.0"
time_created = 2020-05-08T17:23:06.000662+02:00
type = "collection"
[[authors]]
email = "rick@c137.local"
name = "Rick Sanchez"
[[authors]]
email = "morty@c137.local"
name = "Morty Smith"
Groups#
Groups are named containers for more groups or datasets. Their type
is group
.
They may contain any of the common keys in their manifest.toml
metadata.
Datasets#
Datasets are EDL units which contain the actual experiment data. they are leafs in the directory
hierarchy. Their type
is dataset
.
In addition to the common keys in their manifest.toml
, they may also contain the following entries:
data
#
[required, table] This block briefly describes the data of the dataset. It may have the following keys:
media_type
#
[maybe-optional, string] The MIME/MediaType of the contained data, if one is associated
with the given media. In case no media type can be determined, the file_type
key becomes a required key. Either the media_type
or file_type
key or both must be present.
file_type
#
[maybe-optional, string] The file-type of the contained data. This is usually the file extension of the contained data without the dot,
but may be any agreed-on string to indicate a specific type of data. In case a media_type
could be determined, this key becomes optional,
otherwise it is required. Either the media_type
or file_type
key or both must be present.
summary
#
[optional, string] This optional field can contain a human-readable description string that can provide some information about what the files are about. Values could be for example “Videos recorded from the overview camera” or “Electrophysiology data from silicon probes”.
parts
#
[required, array of tables] Array of tables with one entry for data part. Since the data is potentially very big, DAQ tools may decide to chunk
it into smaller bits to make the impact of data corruption while writing less severe and to permit data processing in smaller chunks.
This is especially common with video files.
In case data is not chunked, this array is still present, but contains only one entry. Each table entry must have a fname
key with the filename
of the respective chunk as string value. The filename must be a path relative to the dataset directory (which almost always means it is the file
base name, without any path segment). An entry may have an optional start-at-zero index
key with an integer value attached to it as well, to make
the ordering of the individual chunks explicit. In case the index is not explicitly defined, data will be read in list order. Explicit indexing is
occasionally useful when whole chunks may be transparently taken out of the data analysis.
data_aux
#
[optional, table] This block describes auxiliary data to the primary data of this dataset. This may for example be a frame-number to timestamp mapping
file for a video file, or time-sync information files. Auxiliary data is usually so tightly coupled to its primary data that you will never want to have it
separate from the primary data in its own dataset.
A data_aux
table follows the same semantics as a data
table, with the same key names and permitted values.
Manifest Example:#
collection_id = "49db9875-c0a2-4f70-8ba4-ec00a4e6be9c"
format_version = "1"
time_created = 2020-05-08T17:23:06+02:00
type = "dataset"
[data]
media_type = "video/x-matroska"
[[data.parts]]
fname = "video_1.mkv"
index = 0
[[data.parts]]
fname = "video_2.mkv"
index = 1
[data_aux]
media_type = "text/csv"
[[data_aux.parts]]
fname = "video_1_timestamps.csv"
index = 0
[[data_aux.parts]]
fname = "video_2_timestamps.csv"
index = 1
Custom Metadata#
Custom metadata follows no defined specification. Users and programs may add it arbitrarily as TOML 1.0 data to attributes.toml
files which are
shipped alongside the well-defined manifest.toml
files in the same directory. Usually attributes files contain additional metadata describing an
actual dataset (such as explanations for an array dataset, or additional information for an experiment run).
Syntalos Metadata#
The Syntalos DAQ system uses the attributes.toml
file of the collection root node it creates to add a
bunch of additional metadata.
This behavior is restricted to the attributes.toml
file of the main collection, all other attributes files are exclusively in the domain of Syntalos
modules without interference from the main engine.
The additional metadata includes the following fields:
machine_node
#
[required, string] A string consisting of the recording machine’s hostname followed by the operating system name and version in square brackets.
recording_length_msec
#
[required, number] Full length of the recording run in milliseconds.
subject_id
#
[optional, string] Name of the test subject, as entered in the “Subject” form in Syntalos.
subject_group
#
[optional, string] Group of the test subject.
subject_comment
#
[optional, string] Experimenter comment for the test subject.
success
#
[required, boolean] Boolean, indicating whether the run was successful or failed.
failure_reason
#
[optional, string] In case success
was false, this field contains a string with the last error message received by the system,
and which module emitted it.
modules
#
[required, array of tables] List of all Syntalos modules that were active during the run. The id
key contains the machine-readable
string ID of the respective module, while the name
key contain the user-defined name that was given to the module during the run.
Example Attributes File#
machine_node = "glados [Debian 10]"
recording_length_msec = 1078556.0
subject_id = "TAX-010"
success = true
[[modules]]
id = "camera-tis"
name = "TIS Camera"
[[modules]]
id = "miniscope"
name = "Miniscope"
[[modules]]
id = "canvas"
name = "MS Canvas"
[[modules]]
id = "videorecorder"
name = "Overview Recorder"
[[modules]]
id = "videorecorder"
name = "Scope Recorder"
[[modules]]
id = "canvas"
name = "OV Canvas"