Avram Specification

Avram is a schema language for MARC and related formats such as PICA and MAB.

Introduction

MARC and related formats are used since decades as the basis for library automation. Several variants, dialects and profiles exist for different applications. The Avram schema language shall allow to specify individual formats based on MARC, PICA and similar standards for documentation, validation, and requirements engineering.

The schema format is named after Henriette D. Avram (1919-2006) who devised MARC as the first automated cataloging system in the 1960s.

This document consists of specification of the schema format and validation rules.

Conformance requirements

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.

Data types

A timestamp is a date or datetime as defined with XML Schema datatype datetime (-?YYYY-MM-DDThh:mm:ss(\.s+)?(Z|[+-]hh:mm)?) date (-?YYYY-MM-DD(Z|[+-]hh:mm)?), gYearMonth (-?YYYY-MM), or gYear (-?YYYY).

A regular expression is a string that conforms to the ECMA 262 (2015) regular expression grammar. The expression is interpreted as Unicode pattern.

Schema format

An Avram Schema is a JSON object given as JSON document or any other format that encodes a JSON document. In contrast to RFC 7159, all object keys MUST be unique. The schema MUST contain:

The schema SHOULD contain keys documenting the format defined by the schema:

The schema MAY contain:

Example:

{
  "$schema": "https://format.gbv.de/schema/avram/schema.json",
  "title": "MARC 21 Format for Classification Data",
  "description": "MARC format for classification numbers and captions associated with them",
  "url": "https://www.loc.gov/marc/classification/",
  "fields": { }
}

String values such as values of key title and description SHOULD NOT be empty if the corresponding key is given.

Field schedule

A field schedule is a JSON object that maps field identifiers to field definitons.

Example:

{
  "010": { "label": "Library of Congress Control Number" },
  "084": { "label": "Classification Scheme and Edition" }
}

Field identifier

A field identifiers is can be any non-empty string that uniquely identfies a field. The identifier consists of a field tag, optionally followed by / and a field occurrence. Applications SHOULD add further restrictions on field identifier syntax.

For formats based on MARC a field identifiers MUST be field tags being

For PICA-based formats

Examples:

Field definition

A field definition is a JSON object that SHOULD contain:

The field definition MAY further contain:

If a field definition is given in a field schedule, the tag and occurrence MUST either match the corresponding field identifier or both be missing.

A field definition MUST NOT mix keys for fixed fields (position), variable fields (subfields and deprecated-subfields), and alternatives (types).

Example:

In the following example MARC field 007 byte position 00 has the fixed value c for Electronic resources:

{
  "tag": "007",
  "label": "Physical Description",
  "types": {
    "Electronic resource": {
      "positions": {
        "00": {
          "label": "Category of material",
          "url": "https://www.loc.gov/marc/bibliographic/bd007c.html",
          "codes": {
            "c": {
              "label": "Electronic resource"
            }
          }
        }
      }
    },
    ...

Field types

Field types are alternative sets of positions or subfield schedules as part of a field definition. A specification of field types is a JSON object maps type names to JSON objects either all having field positions or all having field subfields.

Positions

Fixed fields can be specified with a JSON object that maps character positions to data element definitions. A character position is sequence of digits (e.g. 09) or two sequences separated by - (e.g. 12-16). A sequence of digit MUST NOT consists of zeroes only. It is RECOMMENDED to use sequences of two digits. If two sequences are given, the second interpreted as number MUST NOT be smaller than the first interpreted as number. A data element definition is a JSON object that SHOULD contain:

The data element definition MAY further contain:

A data element definition MUST NOT contain more than one of the keys codes and pattern.

Subfield schedule

A subfield schedule is a JSON object that maps subfield codes to subfield definitions. A subfield code is a single character. A subfield definition is a JSON object that SHOULD contain:

The subfield schedule MAY further contain:

Indicators

An indicator is either the value null or a JSON object that SHOULD contain:

The indicator MAY further contain:

Codelist

A codelist is a JSON object that maps values to descriptions. Each description is a JSON object with optional key label.

Example:

{
  " ": {
    "label": "No specified type"
  },
  "a": {
    "label": "Archival"
  },
  "x": { }
}

Metaschema

A JSON Schema to validate Avram Schemas is available at https://format.gbv.de/schema/avram/schema.json.

Applications MAY extend the metaschema for particular formats, for instance the further restrict the allowed set of field identifiers.

Validation rules

Rules how to validate records against Avram Schemas have not been specified explicitly yet.

An Avram schema can be used to check:

References

Normative references

Informative references

Changes

0.2.0 (2018-03-09)

0.1.0 (2018-02-20)