![]() Lastly, downstream analyses such as selection pressure, and lineage construction are performed. After gene assignment, sequences are collapsed across samples and grouped into clones based on one of three methods (see text). Next, using a conserved region anchoring method, sequences are either assigned V- and J-genes or labeled as “unidentifiable” which optionally can be corrected by local alignment. To start, sequences are optionally pre-processed with pRESTO to remove poor-quality sequences and mask bases below a user-defined threshold. A general overview of the ImmuneDB pipeline. ![]() We then extend the methods to T cells, light chains, and other species (Figure 1).įigure 1. The methods below describe the ImmuneDB pipeline in the context of human B-cell heavy chain rearrangements. ImmuneDB's usage of MySQL also allows for rapid querying and data-sharing using a variety of existing tools. For interoperability with other systems, ImmuneDB can output data in AIRR, Change-O, VDJtools, and genbank formats. With either method, it can infer clonal associations, calculate selection pressure, generate lineages, and make all resulting information available both from the command line and as a web-interface. It can take as input raw FASTA/FASTQ sequence files, or import pre-annotated sequences from an array of formats including the Change-O data standard ( 5) and the AIRR data standard currently being implemented and further refined ( 2). ImmuneDB provides an easy to use immune-receptor sequence database, which has been optimized for and tested with datasets of up to hundreds of millions of sequences ( 1). This paper describes ImmuneDB ( 10) and introduces new features added since its original publication including: additional importing & exporting formats, a more flexible metadata system, extra clonal assignment methods, integration of a novel allele detection tool ( 11), and the ability to analyze other species and light chains. ![]() Lacking from this space, however, is a system to store fully-annotated sequences, their inferred germline sequences, clonal associations, and study-specific metadata. With this increase in experimental data output, many tools have been created for pre-processing sequences ( 3), germline association and clonal inference ( 4– 7), and post-processing analysis ( 8, 9). It is now commonplace to have hundreds of thousands or even millions of sequences from a single sample or individual ( 1, 2). The study of immune cell populations has been revolutionized by next-generation sequencing. ImmuneDB is freely available on GitHub at, on PyPi at, and a Docker container is provided at. We show that the biological conclusions drawn would be similar with either tool, while ImmuneDB provides the additional benefits of integrating other common tools and storing data in a database. To validate ImmuneDB, we compare its results to those of another pipeline, MiXCR. Alternatively, pre-annotated data can be imported and analyzed data can be exported in a variety of common Adaptive Immune Receptor Repertoire (AIRR) file formats. It can take raw sequencing data as input and annotate receptor gene usage, infer clonotypes, aggregate results, and run common downstream analyses such as calculating selection pressure and constructing clonal lineages. ![]() Unlike most existing tools, which utilize flat-files, ImmuneDB stores data in a well-structured MySQL database, enabling efficient data queries. ImmuneDB is a system for storing and analyzing high-throughput immune receptor sequencing data. 4Department of Human Biology, Faculty of Sciences, University of Haifa, Haifa, Israel.3Department of Microbiology and Immunology, College of Medicine, Drexel University, Philadelphia, PA, United States.2Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States.1School of Biomedical Engineering Science and Health Systems, Drexel University, Philadelphia, PA, United States.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |