Avram Specification

Avram is a schema language for MARC and related formats such as PICA and MAB.


MARC and related formats are used since decades as the basis for library automation. Several variants, dialects and profiles exist for different applications. The Avram schema language allows to specify individual formats based on MARC, PICA and similar standards for documentation, validation, and requirements engineering. The schema language is named after Henriette D. Avram (1919-2006) who devised MARC as the first automated cataloging system in the 1960s.

This document consists of specification of the schema format and validation rules. It is managed in a git repository at https://github.com/gbv/avram together with test files for implementations.

Conformance requirements

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.

Data types

A timestamp is a date or datetime as defined with XML Schema datatype datetime (-?YYYY-MM-DDThh:mm:ss(\.s+)?(Z|[+-]hh:mm)?) date (-?YYYY-MM-DD(Z|[+-]hh:mm)?), gYearMonth (-?YYYY-MM), or gYear (-?YYYY).

A regular expression is a string that conforms to the ECMA 262 (2015) regular expression grammar. The expression is interpreted as Unicode pattern.

A language is a natural language identifier as defined with XML Schema datatype language.

Schema format

An Avram Schema is a JSON object given as JSON document or any other format that encodes a JSON document. In contrast to RFC 7159, all object keys MUST be unique. String values SHOULD NOT be the empty string.

The schema MUST contain:

The schema SHOULD contain keys documenting the format defined by the schema:

The schema MAY contain:

Multiple schemas with same title, description, url and/or profile MAY exist but all schemas with same profile URI MUST include same field definition for fields with same field identifier.

  "fields": { },
  "title": "MARC 21 Format for Classification Data",
  "description": "MARC format for classification numbers and captions associated with them",
  "url": "https://www.loc.gov/marc/classification/",
  "profile": "http://format.gbv.de/marc/classification",
  "language": "en",
  "$schema": "https://format.gbv.de/schema/avram/schema.json",

Field schedule

A field schedule is a JSON object that maps field identifiers to field definitons.

  "010": { "label": "Library of Congress Control Number" },
  "084": { "label": "Classification Scheme and Edition" }

Field identifier

A field identifiers is can be any non-empty string that uniquely identifies a field. The identifier consists of a field tag, optionally followed by

Applications SHOULD add further restrictions on field identifier syntax.


Field definition

A field definition is a JSON object that SHOULD contain:

The field definition MAY further contain:

A field definition MUST NOT contain keys for fixed fields (position), keys for variable fields (subfields and/or deprecated-subfields), and keys for alternatives (types).

If a field definition is given in a field schedule, values of tag, occurrence and counter MUST either be missing or be used to automatically derive the corresponding field identifier.

Additional rules for MARC-based formats

Additional rules for PICA-based formats


Field types

Field types are alternative sets of positions or subfield schedules as part of a field definition. A specification of field types is a JSON object maps type names to JSON objects either all having field positions or all having field subfields.

Note: field types make Avram schemas more complicated. An alternative is to provide multiple schemas, one for each type.

  "Map": {
    "positions": { "00": { "codes": { "a": { } } } } 
  "Electronic resource": { 
    "positions": { "00": { "codes": { "c": { } } } } 
  "Globe": { 
    "positions": { "00": { "codes": { "d": { } } } } 


Fixed fields can be specified with a JSON object that maps character positions to data element definitions. A character position is sequence of digits (e.g. 09) or two sequences separated by - (e.g. 12-16). A sequence of digit MUST NOT consists of zeroes only. It is RECOMMENDED to use sequences of two digits. If two sequences are given, the second interpreted as number MUST NOT be smaller than the first interpreted as number. A data element definition is a JSON object that SHOULD contain:

The data element definition MAY further contain:

A data element definition MUST NOT contain more than one of the keys codes and pattern.


Subfield schedule

A subfield schedule is a JSON object that maps subfield codes to subfield definitions. A subfield code is a single character. A subfield definition is a JSON object that SHOULD contain:

The subfield schedule MAY further contain:



An indicator is either the value null or a JSON object that SHOULD contain:

The indicator MAY further contain:

Indicator codelist values MUST consist of a single character not being #.

  "label": "Type",
  "codes": {
    " ": "Abbreviated key title",
    "0": "Other abbreviated title"


A codelist is a JSON object that maps values to descriptions. Each description is a JSON object with optional keys label and/or description.

  " ": {
    "label": "No specified type"
  "a": {
    "label": "Archival"
  "x": { }


A JSON Schema to validate Avram Schemas is available at https://format.gbv.de/schema/avram/schema.json.

Applications MAY extend the metaschema for particular formats, for instance the further restrict the allowed set of field identifiers.

Validation rules

Rules how to validate records against Avram Schemas have not been specified explicitly yet.

An Avram schema can be used to check:

The value of schema key count if used for validation MUST match the number of records that have been analyzed.


Normative references

Informative references


0.6.0 (2020-09-15)

0.5.0 (2020-08-04)

0.4.0 (2019-05-09)

0.3.0 (2018-03-16)

0.2.0 (2018-03-09)

0.1.0 (2018-02-20)