Olmsted Dataset

Description: Olmsted dataset input file.

Required: ['dataset_id', 'clones']

Type: object

Properties:

paper

{Paper info}

ident

Description: UUID specific to the given object

Type: string

build

{Build info}

samples

Description: Information about each of the samples

Type: array

Array Items:

{Sample}

seeds

Description: Information about each of the seed sequences

Type: array

Array Items:

{Seed}

clones

Description: Information about each of the clonal families

Type: array

Array Items:

{Clone}

dataset_id

Description: Unique identifier for a collection of data

Type: string

subjects

Description: Information about each of the subjects

Type: array

Array Items:

{Subject}

Paper info

Description: Information about a paper corresponding to this dataset

Required: ['authorstring']

Type: object

Properties:

url

Description: Link to online version of the paper.

Type: string

authorstring

Description: String to be displayed citing authors, e.g. "Doe, et. al.".

Type: string

Build info

Description: Information about how a dataset was built.

Required: ['commit']

Type: object

Properties:

commit

Description: Commit sha of whatever build system you used to process the data

Type: string

time

Description: Time at which build was initiated

Type: string

Sample

Description: A sample is a collection of sequences.

Required: ['locus']

Type: object

Properties:

locus

Description: B-cell Locus.

Type: string

ident

Description: UUID specific to the given object

Type: string

timepoint_id

Description: Timepoint associated with this sample (may choose "merged" if data has been combined from multiple timepoints)

Type: string

sample_id

Description: Sample id

Type: string

Seed

Description: A sequence of interest among other clonal family members.

Required: ['seed_id']

Type: ['object', 'null']

Properties:

ident

Description: UUID specific to the given object

Type: string

seed_id

Description: Seed id

Type: string

Clone

Description: Clonal family of sequences deriving from a particular reassortment event

Required: ['unique_seqs_count', 'mean_mut_freq', 'v_alignment_start', 'v_alignment_end', 'j_alignment_start', 'j_alignment_end']

Type: object

Properties:

j_call

Description: AIRR: J gene with allele of the inferred ancestor of the clone. For example, IGHJ4*02.

Type: string

clone_id

Description: AIRR: Identifier for the clone.

Type: string

seed_id

Description: Seed sequence id if any.

Type: ['string', 'null']

sample_id

Description: sample id associated with this clonal family.

Type: string

v_call

Description: AIRR: V gene with allele of the inferred ancestral of the clone. For example, IGHV4-59*01.

Type: string

d_call

Description: AIRR: D gene with allele of the inferred ancestor of the clone. For example, IGHD3-10*01.

Type: string

subject_id

Description: Id of subject from which the clonal family was sampled.

Type: string

junction_length

Description: AIRR: Number of nucleotides in the junction. (see AIRR 'junction': Nucleotide sequence for the junction region of the inferred ancestor of the clone, where the junction is defined as the CDR3 plus the two flanking conserved codons.)

Type: integer

d_alignment_start

Description: AIRR: Start position of the D segment in both the sequence_alignment and germline_alignment fields (1-based closed interval).

Type: integer

unique_seqs_count

Description: Number of unique sequences in the clone

Type: integer

d_alignment_end

Description: AIRR: End position of the D segment in both the sequence_alignment and germline_alignment fields (1-based closed interval).

Type: integer

total_read_count

Description: Number of total reads represented by sequences in the clone.

Type: integer

v_alignment_start

Description: AIRR: Start position in the V segment in both the sequence_alignment and germline_alignment fields (1-based closed interval).

Type: integer

trees

Description: Phylogenetic trees, and possibly ancestral sequence reconstructions.

Type: array

Array Items:

{Tree}

j_alignment_end

Description: AIRR: End position of the J segment in both the sequence_alignment and germline_alignment fields (1-based closed interval).

Type: integer

j_alignment_start

Description: AIRR: Start position of the J segment in both the sequence_alignment and germline_alignment fields (1-based closed interval).

Type: integer

has_seed

Description: Does this clone have a seed sequence (see Seed schema) in it?

Type: boolean

ident

Description: UUID specific to the given object

Type: string

mean_mut_freq

Description: Mean mutation frequency across sequences in the clone.

Type: number

germline_alignment

Description: AIRR: Assembled, aligned, full-length inferred ancestor of the clone spanning the same region as the sequence_alignment field of nodes (typically the V(D)J region) and including the same set of corrections and spacers (if any).

Type: string

v_alignment_end

Description: AIRR: End position in the V segment in both the sequence_alignment and germline_alignment fields (1-based closed interval).

Type: integer

junction_start

Description: AIRR: Junction region start position in the alignment (1-based closed interval).

Type: integer

Tree

Description: Phylogenetic tree and possibly ancestral state reconstruction of sequences in a clonal family.

Required: ['newick', 'nodes']

Type: object

Properties:

ident

Description: UUID specific to the given object

Type: string

newick

Description: AIRR: Newick string of the tree edges.

Type: string

clone_id

Description: AIRR: Identifier for the clone.

Type: string

downsampling_strategy

Description: If applicable, the downsampling method applied to the set of clonal sequences before passing them to a phylogenetic inference tool.

Type: string

tree_id

Description: AIRR: Identifier for the tree.

Type: string

nodes

Description: AIRR: Dictionary of nodes in the tree, keyed by sequence_id string.

Type: object

Object with values of type:

{Node}

downsampled_count

Description: If applicable, the maximum number of sequences kept in the downsampling process.

Type: integer

Node

Description: Information about the phylogenetic tree nodes and the sequences they represent

Required: ['sequence_id', 'sequence_alignment', 'sequence_alignment_aa']

Type: object

Properties:

sequence_alignment_aa

Description: Amino acid sequence of the node, aligned to the germline_alignment for this clone, including any indel corrections or spacers.

Type: string

cluster_multiplicity

Description: If clonal family sequences were downsampled by clustering, the cummulative number of times sequences in cluster were observed.

Type: ['integer', 'null']

timepoint_id

Description: Timepoint associated with sequence, if any.

Type: ['string', 'null']

sequence_id

Description: AIRR: Identifier for this node that matches the id in the newick string and, where possible, the sequence_id in the source repertoire.

Type: string

lbr

Description: Local branching rate (derivative of lbi; see https://arxiv.org/abs/2004.11868).

Type: ['number', 'null']

lbi

Description: Local branching index (see https://arxiv.org/abs/2004.11868).

Type: ['number', 'null']

cluster_timepoint_multiplicities

Description: Sequence multiplicity, broken down by timepoint, including sequences falling in the same cluster if clustering-based downsampling was performed.

Type: array

Array Items:

{Timepoint multiplicity}

timepoint_multiplicities

Description: Sequence multiplicity, broken down by timepoint.

Type: array

Array Items:

{Timepoint multiplicity}

multiplicity

Description: Number of times sequence was observed in the sample. The presence of a given sequence in a clonal family may represent many identical such sequences in the original sample.

Type: ['integer', 'null']

sequence_alignment

Description: AIRR: Nucleotide sequence of the node, aligned to the germline_alignment for this clone, including any indel corrections or spacers.

Type: string

affinity

Description: Affinity of the antibody for some antigen. Typically inverse dissociation constant k_d in simulation, and inverse ic50 in data.

Type: ['number', 'null']

Timepoint multiplicity

Description: Multiplicity at a specific time.

Type: object

Properties:

multiplicity

Description: Number of times sequence was observed at the given timepoint

Type: ['integer', 'null']

timepoint_id

Description: Id associated with the timepoint in question

Type: string

Subject

Description: Subject from which the clonal family was sampled.

Required: ['subject_id']

Type: object

Properties:

subject_id

Description: Subject id

Type: string

ident

Description: UUID specific to the given object

Type: string