[…] For sequence analysis, we developed a new bioinformatics framework using MUSCLE [, ] for the initial fast and accurate multiple nucleotide sequence alignment. Eventually, the sequence distances were calculated with ClustalW []. For nucleotide alignments of IGHV and IGHJ, default values of MUSCLE were used. To improve the biological significance of the assignment of germline and sample IGHDs, we tested three different procedures by changing parameters for gap opening/extension and including a scoring matrix for matches, transversions, and transitions. To determine the germline origin using the new bioinformatics framework, only the FRHs were aligned to avoid interference with the highly diversified CDRH []. To analyze possible gene conversion events, FR1-3Hs and CDR1-3Hs were extracted and aligned separately to the corresponding regions of the IGHV reference sequences to find the most similar germline segment.For testing our bioinformatics tool, we established a sample sequence set for a detailed analysis of the transcribed bovine immunoglobulin repertoire. Blood samples were taken from 10 animals per cattle breed: Aubrac (A), German Simmental (GS), German Black Pied (GBP), and Holstein Friesian (HF).In total, 160 IgG heavy chain sequences per breed (n = 640 sequences) were investigated as described above. The variable regions were identified and extracted at the 5’ end (N-terminal end) using the nucleotide motif GCCTCCACC coding for AlaSerThr marking the start of the first constant region of all bovine IgGs. Due to premature Stop-codons or incompletely amplified variable regions, 131 sequences were excluded from further analyses. Consequently, 509 sequences remained: 137 in A, 116 in GS, 111 in GBP, and 145 in HF. Sequences analyzed are published under accession numbers KT761498-KT762006. […]

Pipeline specifications

Software tools MUSCLE, Clustal W
Application Nucleotide sequence alignment
Organisms Bos taurus
Chemicals Amino Acids