Asian Journal of Microbiology, Biotechnology & Environmental Sciences Paper

Vol, 25, Issue, 4, 2023; Page No.(818-826)

ESTABLISHMENT OF A BIOINFORMATICS PIPELINE FOR THE DETECTION OF PATHOGENIC BACTERIA

RAJAPAKSHA R.W.P.M., VIVEHANANTHAN K. AND ATTANAYAKA D.P.S.T.G.

Abstract

Next-generation sequencing-based methods using partial 16S rRNA gene amplicons are extensively applied today in studies of the plant metagenomes. NGS sequencing creates huge sets of raw data making analysis a challenging task. Lack of computational and bioinformatics knowledge and tools for analyzing high throughput data to interpret correct biological variations is a major problem. In addition, downstream analysis of NGS data with the available bioinformatics platforms create various challenges in inferring microbial composition. The available commercial software are expensive and individual opensource tools are usually operate stand alone as they are not combined for a user-friendly workflow. Therefore, beginners in bioinformatics might find analysis procedures are complicated, expensive, and time consuming with the associated learning. In the present study, a bioinformatics pipeline is developed to analyze the 16S rDNA amplicons of plant metagenome. Microbial DNA was extracted from imported seed potato tubers. Extracted DNA was sequenced using Ion Torrent Next Generation Sequencing technology by amplifying 400 bp V1-V2 region of 16S rRNA gene for the detection of bacterial pathogens. The pipeline was built by stringing together many command line tools; Quality checking of raw fastq data using FastQC, trimming of low-quality data with Trimmomatric, alignment of trimmed data using the BWA-MEM algorithm, removal of duplicate reads with Picard Mark Duplicates tool and finally generation of the phylogenetic tree and taxonomic profile with MEGAN 5. The developed pipeline was used to analyze the 16S rRNA next generation sequences and the reliability of the results has been checked with the use of mock communities for validation. The pipeline often can be executed on laptop sized machines to obtain the output in a couple of hours enabling easy access for the researchers.