A dynamic programming approach to learn the distribution of rearrangement scenarios from large numbers of non-productive sequences in an efficient way. This approach is based on a Hidden Markov Models (HMM) formulation of the problem, and learns its parameters using a modified BaumWelch (BW) algorithm to avoid the full enumerations of all scenarios. We tested our software tool on sequence data for both the alpha and beta chains of the T cell receptor. To test the validity of our algorithm, we also generated synthetic sequences produced by a known model, and confirmed that its parameters could be accurately inferred back from the sequences. The inferred model can be used to generate synthetic sequences, to calculate the probability of generation of any receptor sequence, as well as the theoretical diversity of the repertoire. We estimate this diversity to be ≈ 1023 for human T cells. The model gives a baseline to investigate the selection and dynamics of immune repertoires.

User report

0 user reviews

0 user reviews

No review has been posted.

repgenHMM forum

No open topic.

repgenHMM versioning

No versioning.

repgenHMM classification

repgenHMM specifications

Software type:
Standalone
Restrictions to use:
None
Programming languages:
C++
Stability:
Stable
Interface:
Command line interface
Operating system:
Unix/Linux
Computer skills:
Advanced

repgenHMM support

Maintainer

Credits

Publications

Institution(s)

Laboratoire de physique théorique, UMR8549, CNRS and École normale supérieure, Paris, France; Laboratoire de physique statistique, UMR8550, CNRS and École normale supérieure, Paris, France

Funding source(s)

This work was supported by European Research Council Starting Grant n. 306312.

Link to literature

By using OMICtools you acknowledge that you have read and accepted the terms of the end user license agreement.