Introduction
MD-Models is a markdown-based specification language for research data management.
It is designed to be easy to read and write, and to be converted to various programming languages and schema languages.
# Hello MD-Models
This is a simple markdown file that defines a model.
### Object
Enrich your objects with documentation and communicate intent to domain experts.
This is a simple object definition:
- string_attribute
- type: string
- description: A string attribute
- integer_attribute
- type: integer
- description: An integer attribute
Core Philosophy
The primary motivation behind MD-Models is to reduce cognitive overhead and maintenance burden by unifying documentation and structural definition into a single source of truth. Traditional approaches often require maintaining separate artifacts:
- Technical schemas (JSON Schema, XSD, ShEx, SHACL)
- Programming language implementations
- Documentation for domain experts
- API documentation
This separation frequently leads to documentation drift and increases the cognitive load on both developers and domain experts.
A Little Anecdote
When I began my journey in research data management, I was frequently overwhelmed by the intricate tools and standards in use. As a researcher suddenly thrown into a blend of software engineering, format creation, and data management, it felt like I was plunged into deep water without a safety net.
Data management, by its very nature, spans multiple disciplines and demands a thorough understanding of the domain, the data itself, and the available tools. Yet, even the most impressive tools lose their value if they don’t cater to the needs of domain experts. I came to realize that those experts are best positioned to define the structure and purpose of the data, but the overwhelming complexity of existing tools and standards often prevents their active participation.
MD-Models is my response to this challenge. It makes building structured data models easier by enabling domain experts to document the data’s intent and structure in a clear and manageable way. Markdown is an ideal choice for this task. It is simple to read and write, and it effectively communicates the necessary intent. Moreover, its semi-structured format allows for effortless conversion into various schema languages and programming languages, eliminating the need for excessive boilerplate code.
Quickstart
In order to get started with MD-Models, you can follow the steps below.
Installation
In order to install the command line tool, you can use the following command:
cargo install mdmodels
Writing your first MD-Models file
MD-Models files can be written in any editor that supports markdown. In the following is a list of recommended editors:
We also provide a web-editor at mdmodels.vercel.app that can be used to write and validate MD-Models files. This editor not only features a syntax higlighted editor, but also ...
- Live preview of the rendered MD-Models file
- Graph editor to visualize the relationships between objects
- Automatic validation of the MD-Models file
- Export to various schema languages and programming languages
Packages
The main Rust crate is compiled to Python and WebAssembly, allowing the usage beyond the command line tool. These are the main packages:
-
Core Python Package: Install via pip:
# Mainly used to access the core functionality of the library pip install mdmodels-core
-
Python Package: Install via pip:
# Provides in-memory data models, database support, LLM support, etc. pip install mdmodels
-
NPM Package: Install via npm:
# Mainly used to access the core functionality of the library npm install mdmodels-core
Examples
The following projects are examples of how to use MD-Models in practice:
Syntax
This section describes the syntax of MD-Models. It is intended to be used as a reference for the syntax and semantics of MD-Models.
Objects
Objects are the building blocks of your data structure. Think of them as containers for related information, similar to how a form organizes different fields of information about a single topic.
What is an Object?
An object is simply a named collection of properties. For example, a Person
object might have properties like name
, age
, and address
. In our system, objects are defined using a straightforward format that's easy to read and write, even if you're not a programmer.
How to Define an Object
You start objects by declaring its name using a level 3 heading (###
) followed by the name of the object. In the example below, we define an object called Person
.
### Person
This is an object definition.
Great! Now we have a named object. But what's next?
Object Properties
Objects can have properties, which define the specific data fields that belong to the object. Properties are defined using a structured list format with the following components:
- The property name - starts with a dash (
-
) followed by the name - The property type - indicates what kind of data the property holds
- Optional metadata - additional specifications like descriptions, constraints, or validation rules
Here's the basic structure:
### Person (schema:object)
- name
- type: string
- description: The name of the person
Lets break this down:
- name
- The name of the property- type: string
- The type of the property, because we expect a name to be a string (e.g. "John Doe")- description: The name of the person
- A description of the property
The name of the property and its type are required. The description is optional, but it is a good practice to add it. Later on we will see that a thourough description can be used to guide a large language model to extract the information from a text.
By default, properties are optional. If you want to make a property required, you need to bold the property name using either
__name__
or**name**
. Replacename
with the name of the property.
Property Types
The data type of a property is very important and generally communicates what kind of data the property holds. Here is a list of the supported base types:
string
- A string of charactersinteger
- A whole numberfloat
- A floating point numbernumber
- A numeric value (integer or float)boolean
- A true or false value
Arrays
While these types are the building blocks, they fail to capture the full range of data types that can be used in a data model. For example, we need to be able to express that a property is an array/list of strings, or an array/list of numbers. This is where the array
notation comes in.
We define an array of a given type by placing empty square brackets after the type. For example, an array of strings would be written as string[]
[^inspired by TypeScript].
### Person (schema:object)
- an_array_of_strings
- type: string[]
- description: An array of strings
- an_array_of_numbers
- type: number[]
- description: An array of numbers
Connecting Objects
Now we know how to define singular and array properties, but we often need to create relationships between objects in our data models. For example, a Person
object might have an address
property that references an Address
object. This relationship is easily established by using another object's name as a property's type.
### Person
- name
- type: string
- address
- type: Address
### Address
- street
- type: string
- city
- type: string
- zip
- type: string
This approach allows you to build complex, interconnected data models that accurately represent real-world relationships between entities. You can create both one-to-one relationships (like a person having one address) and one-to-many relationships (by using array notation).
Property Options
When defining properties in your data model, you can apply various options to control their behavior, validation, and representation. These options are defined using the - option: value
syntax. In the following sections, we will look at the different options that are available.
General Options
Option | Description | Example |
---|---|---|
description | Provides a description for the property | - description "The name of the person" |
example | Provides an example value for the property | - example "John Doe" |
JSON Schema Validation Options
These options map to standard JSON Schema validation constraints, allowing you to enforce data integrity and validation rules in your models. When you use these options, they will be translated into corresponding JSON Schema properties during schema generation, ensuring that your data adheres to the specified constraints. This provides a standardized way to validate data across different systems and implementations that support JSON Schema.
Option | Description | Example |
---|---|---|
minimum | Specifies the minimum value for a numeric property | - minimum: 0 |
maximum | Specifies the maximum value for a numeric property | - maximum: 100 |
minitems | Specifies the minimum number of items for an array property | - minitems: 1 |
maxitems | Specifies the maximum number of items for an array property | - maxitems: 10 |
minlength | Specifies the minimum length for a string property | - minlength: 3 |
maxlength | Specifies the maximum length for a string property | - maxlength: 50 |
pattern or regex | Specifies a regular expression pattern that a string property must match | - pattern: "^[a-zA-Z0-9]+$" |
unique | Specifies whether array items must be unique | - unique: true |
multipleof | Specifies that a numeric value must be a multiple of this number | - multipleof: 5 |
exclusiveminimum | Specifies an exclusive minimum value for a numeric property | - exclusiveminimum: 0 |
exclusivemaximum | Specifies an exclusive maximum value for a numeric property | - exclusivemaximum: 100 |
Format Options
The following options are used to define how the property should be represented in different formats.
Option | Description | Example |
---|---|---|
xml | Specifies that the property should be represented in XML format | - xml: someName |
A note on the xml
option
The xml
option has multiple effects:
Element
will be set as an element in the XML Schema.@Name
will be set as an attribute in the XML Schema.someWrapper/Element
will wrap the element in a parent element calledsomeWrapper
.
Semantic Options
The following options are used to define semantic annotations. Read more about semantic annotations in the Semantics section.
Option | Description | Example |
---|---|---|
term | Specifies the term for the property in the ontology | - term: schema:name |
SQL Database Options
Database options allow you to specify how properties should be represented in relational database systems. MD-Models supports the following options:
Option | Description | Example |
---|---|---|
pk | Indicates whether the property is a primary key in a database | - primary key: true |
LinkML Specific Options
Options specific to the LinkML specification:
Option | Description | Example |
---|---|---|
readonly | Indicates whether the property is read-only | - readonly: true |
recommended | Indicates whether the property is recommended | - recommended: true |
Custom Options
You can also define custom options that aren't covered by the predefined ones:
- name
- MyKey: my value
Example Usage
Here's how you might use these options in a data model:
### Person (schema:object)
- id
- type: string
- primary key: true
- description: The unique identifier for the person
- name
- type: string
- description: The name of the person
- example: "John Doe"
- age
- type: integer
- description: The age of the person
- minimum: 0
These options help to define constraints, provide validation rules, and give hints to code generators about how properties should be treated in the resulting applications and schemas.
Enumerations
Sometimes you want to restrict the values that can be assigned to a property. For example, you might want to restrict the categories of a product to a set of predefined values. A product might be of category book
, movie
, music
, or other
. This is where enumerations come in.
Defining an enumeration
To define an enumeration, we start the same as we do for any other type, by using a level 3 heading (###) and then the name of the type.
### ProductCategory
BOOK = "book"
MOVIE = "movie"
MUSIC = "music"
OTHER = "other"
We are defining a key and value here, where the value is the actual value of the enumeration and the key is an identifier. This is required, because when we want to re-use the enumeration in a programming language, we need to be able to refer to it by a key. For instance, in python we can pass an enumeration via the following code:
from model import ProductCategory, Product
product = Product(
name="Inception",
category=ProductCategory.MOVIE
)
print(product)
{
"name": "Inception",
"category": "movie"
}
Similar to how we can use an object as a type for a property, we can also use an enumeration as a type for a property:
### Product
- name
- type: string
- category
- type: ProductCategory
Descriptions
This section further highlights the usage of descriptions in MD-Models. Since we are using markdown, we can enrich our data model with any additional information that we want to add. This not only includes text, but also links and images.
Text
To add a text description to an object, we can use the following syntax:
### Product
A product is a physical or digital item that can be bought or sold.
- name
- type: string
- description: The name of the product
Links
To add a link to an object, we can use the following syntax:
### Product
[Additional information](https://www.google.com)
- name
- type: string
- description: The name of the product
Images
To add an image to an object, we can use the following syntax:
### Product

- name
- type: string
- description: The name of the product
Please note that tables can be used within object definitions, but can under circumstances lead to parsing errors. It is therefore recommended to only use tables in sections.
Sections
Since objects and enumerations can get quite complex, we can use sections to group related information together. The level 2 heading (##
) can be used to create a new section:
## Store-related information
This is section contains information about the store.
### Product
[...]
### Customer
[...]
## Sales-related information
This section contains information about the sales.
### Order
[...]
### Invoice
[...]
Within these sections, you can add any of the previously mentioned elements, including tables. This is very useful to breathe life into your data model and communicate intent and additional information. Treat this as the non-technical part you would usually add in an additional document. It should be noted, that the parsers will ignore these sections, so they will not be included in the generated code.
Best Practices
- Use sections to group related information together.
- Use links to reference external sources.
- Use images to visually represent complex concepts.
- Use tables to represent concepts that are better understood in a table format.
Semantics
MD-Models supports a variety of semantic annotations to help you add meaning to your data model. Most commonly, you want to annotate objects and properties with a semantic type to allow for better interoperability and discoverability. For this, ontologies are used:
Ontologies
Ontologies are a way to add semantic meaning to your data model. They are a collection of concepts and relationships between them and are specific to the domain of your data model. For instance, the schema.org ontology is a collection of concepts and relationships that span across many domains. This is very useful when you want to connect to other data models that employ similar concepts, but use different names for them.
Typically these relations are defined as triples, consisting of a subject, predicate and object. For instance, the statement "John is a person" can be represented as the triple (John, is a, person)
. The first element of the triple is the subject, the second is the predicate and the third is the object.
With MD-Models, you can define the is a
predicate as an object annotation for an object definition. On the other hand, you can define the predicate as a property annotation for a property definition.
How to annotate objects
Objects are annotated at the level 3 heading of the object definition. The annotation is followed by a whitespace and enclosed in parentheses. Typically, these annotations are expressed in the form of a URI, which points to a definition of the concept in the ontology. But this is a verbose way and can be simplified by using a prefix. We will be using the schema
prefix in the following examples. More on how to use prefixes can be found in the preambles section.
We want to express - "A Product
is a schema:Product
".
### Product (schema:Product)
- name
- type: string
How to annotate properties
Properties are annotated using an option, as defined in the Property Options section. We utilize the keyword term
to add a semantic type to the property. Properties can function in one of two ways:
- If the type of the property is a primitive type, the
term
option describes anis a
relationship and thus the object in the sense of the triple. - If the type of the property is an object or an array of objects, the
term
option describes the relationship (predicate) between the subject (object) and the object (type).
Object-valued properties
We want to express - "A Product
is ordered by a Person
".
### Product
- orders
- type: Person[]
- term: schema:orderedBy
The annotation effectively describes the relationship between the orders
property and the Person
type. Given that a Person
is also annotated with a term, one can then build a Knowledge Graph that connects the orders
property to the Person
type in a semantically rich way, which can be used for a variety of purposes, such as semantic search and discovery.
Primitive-valued properties
We want to express - "The name
of a Product
is a schema:name
".
### Product
- name
- type: string
- term: schema:name
Naturally, since the
name
property is part of theProduct
object, it builds the relationship "AProduct
has aname
". In terms of triples, this is represented as(Product, has, name)
.
Once these annotations are defined, they are automatically added to the generated code and schemes, if supported. Semantic annotations are currently supported in the following language templates:
python-dataclass
(JSON-LD)python-pydantic
(JSON-LD)typescript
(JSON-LD)shacl
(Shapes Constraint Language)shex
(Shape Expressions)
Preamble
The preamble is the first section of your data model. It is used to provide metadata about the data model, such as the name, version, and author.
---
id: my-data-model
prefix: md
repo: http://mdmodel.net/
prefixes:
schema: http://schema.org/
nsmap:
tst: http://example.com/test/
imports:
common.md: common.md
---
Frontmatter Keys
The frontmatter section of your MD-Models document supports several configuration keys that control how your data model is processed and interpreted. Here's a detailed explanation of each available key:
id
- Type: String (Optional)
- Description: A unique identifier for your data model. This can be used to reference your model from other models or systems.
- Example:
id: my-data-model
prefixes
- Type: Map of String to String (Optional)
- Description: Defines namespace prefixes that can be used throughout your model to reference external vocabularies or schemas. This is particularly useful for semantic annotations.
- Example:
prefixes: schema: http://schema.org/ foaf: http://xmlns.com/foaf/0.1/
nsmap
- Type: Map of String to String (Optional)
- Description: Similar to prefixes, defines namespace mappings that can be used in your model. This is often used for XML-based formats or when integrating with systems that use namespaces.
- Example:
nsmap: tst: http://example.com/test/ ex: http://example.org/
repo
- Type: String
- Default:
http://mdmodel.net/
- Description: Specifies the base repository URL for your model. This can be used to generate absolute URIs for your model elements.
- Example:
repo: https://github.com/myorg/myrepo/
prefix
- Type: String
- Default:
md
- Description: Defines the default prefix to use for your model elements when generating URIs or qualified names.
- Example:
prefix: mymodel
imports
- Type: Map of String to String
- Default: Empty map
- Description: Specifies other models to import into your current model. The key is the alias or name to use for the import, and the value is the location of the model to import. The location can be either a local file path or a remote URL.
- Example:
imports: common: common.md external: https://example.com/models/external.md
Import Types
The imports
key supports two types of imports:
-
Local Imports: References to local files on your filesystem
imports: common: ./common/base.md
-
Remote Imports: References to models hosted on remote servers (URLs)
imports: external: https://example.com/models/external.md
When importing models, the definitions from the imported models become available in your current model, allowing you to reference and extend them. This is useful for creating modular and reusable data models.
Full example
The following is a full example of an MD-Models files that defines a data model for a research publication.
---
id: research-publication
prefix: rpub
prefixes:
- schema: https://schema.org/
---
### ResearchPublication (schema:Publication)
This model represents a scientific publication with its core metadata, authors,
and citations.
- __doi__
- Type: Identifier
- Term: schema:identifier
- Description: Digital Object Identifier for the publication
- XML: @doi
- title
- Type: string
- Term: schema:name
- Description: The main title of the publication
- authors
- Type: [Author](#author)[]
- Term: schema:authored
- Description: List of authors who contributed to the publication
- publication_year
- Type: integer
- Term: schema:datePublished
- Description: Year when the publication was published
- Minimum: 1900
- Maximum: 2100
- citations
- Type: integer
- Term: schema:citation
- Description: Number of times this publication has been cited
- Default: 0
### Author (schema:Person)
The `Author` object is a simple object that has a name and an email address.
- __name__
- Type: string
- Term: schema:name
- Description: The name of the author
- __email__
- Type: string
- Term: schema:email
- Description: The email address of the author
Best practices
-
Use Descriptive Names
- Object names should be PascalCase (e.g.,
ResearchPublication
) - Attribute names should be in snake_case (e.g.,
publication_year
) - Use clear, domain-specific terminology
- Object names should be PascalCase (e.g.,
-
Identifiers
- Mark primary keys with double underscores (e.g.,
__doi__
) - Choose meaningful identifier fields
- Mark primary keys with double underscores (e.g.,
-
Documentation
- Always include object descriptions
- Document complex attributes
- Explain any constraints or business rules
-
Semantic Mapping
- Use standard vocabularies when possible
- Define custom terms in your prefix map
- Maintain consistent terminology
-
Validation Rules
- Include range constraints for numbers
- Specify default values when appropriate
- Document any special validation requirements
Common Patterns
Array Types
- tags
- Type: string[]
- Description: List of keywords describing the publication
Object References
- main_author
- Type: Author
- Description: The primary author of the publication
Required Fields
- __id__
- Type: Identifier
- Description: Unique identifier for the object
Remember that MD-Models aims to balance human readability with technical precision. Your object definitions should be clear enough for domain experts to understand while maintaining the structure needed for technical implementation.
Command Line Interface
To be added
Code generation
To be added
Pipelines
To be added
Schema validation
To be added
Large Language Models
To be added
Exporters
To be added
Programming languages
To be added
Schema languages
To be added
API specifications
To be added
Documentation
To be added
Examples
To be added
Hello MD-Models
To be added
Union types
To be added
Database models
To be added
FAQ
To be added