Skip to main content
Advertisement
  • Loading metrics

De novo assembled mitogenome analysis of Trichuris trichiura from Korean individuals using nanopore-based long-read sequencing technology

  • James Owen Delaluna ,

    Contributed equally to this work with: James Owen Delaluna, Heekyoung Kang

    Roles Data curation, Formal analysis, Investigation, Methodology, Software, Visualization, Writing – original draft, Writing – review & editing

    Affiliation Department of Tropical Medicine and Parasitology and Institute of Endemic Diseases, Seoul National University College of Medicine, Seoul, Republic of Korea

  • Heekyoung Kang ,

    Contributed equally to this work with: James Owen Delaluna, Heekyoung Kang

    Roles Data curation, Investigation, Resources, Validation, Writing – original draft, Writing – review & editing

    Affiliation Department of Tropical Medicine and Parasitology and Institute of Endemic Diseases, Seoul National University College of Medicine, Seoul, Republic of Korea

  • Yuan Yi Chang,

    Roles Investigation, Resources, Writing – review & editing

    Affiliations Department of Tropical Medicine and Parasitology and Institute of Endemic Diseases, Seoul National University College of Medicine, Seoul, Republic of Korea, Department of Biomedical Sciences, Seoul National University College of Medicine, Seoul, Republic of Korea

  • MinJi Kim,

    Roles Investigation, Resources, Writing – review & editing

    Affiliations Department of Tropical Medicine and Parasitology and Institute of Endemic Diseases, Seoul National University College of Medicine, Seoul, Republic of Korea, Department of Biomedical Sciences, Seoul National University College of Medicine, Seoul, Republic of Korea

  • Min-Ho Choi,

    Roles Conceptualization, Resources

    Affiliation Department of Tropical Medicine and Parasitology and Institute of Endemic Diseases, Seoul National University College of Medicine, Seoul, Republic of Korea

  • Jun Kim ,

    Roles Methodology, Software, Supervision, Validation, Writing – review & editing

    junkim@cnu.ac.kr (JK); hbsong@snu.ac.kr (HBS)

    Affiliation Department of Convergent Bioscience and Informatics, College of Bioscience and Biotechnology, Chungnam National University, Daejeon, Republic of Korea

  • Hyun Beom Song

    Roles Conceptualization, Funding acquisition, Resources, Supervision, Writing – review & editing

    junkim@cnu.ac.kr (JK); hbsong@snu.ac.kr (HBS)

    Affiliations Department of Tropical Medicine and Parasitology and Institute of Endemic Diseases, Seoul National University College of Medicine, Seoul, Republic of Korea, Department of Biomedical Sciences, Seoul National University College of Medicine, Seoul, Republic of Korea

Abstract

Knowledge about mitogenomes has been proven to be essential in human parasite diagnostics and understanding of their diversity. However, the lack of substantial data for comparative analysis is still a challenge in Trichuris trichiura research. To provide high quality mitogenomes, we utilized long-read sequencing technology of Oxford Nanopore Technologies (ONT) to better resolve repetitive regions and to construct de novo mitogenome assembly minimizing reference biases. In this study, we got three de novo assembled mitogenomes of T. trichiura isolated from Korean individuals. These circular complete mitogenomes of T. trichiura are 14,508 bp, 14,441 bp, and 14,440 bp in length. A total of 37 predicted genes were identified consisting of 13 protein-coding genes (PCGs), 22 transfer RNA (tRNAs) genes, two ribosomal RNA (rRNA) genes (rrnS and rrnL), and two non-coding regions. Interestingly, the assembled mitogenome has up to six times longer AT-rich regions than previous reference sequences, thus proving the advantage of long-read sequencing in resolving unreported non-coding regions. Furthermore, variant detection and phylogenetic analysis using concatenated protein coding genes, cox1, rrnL, and nd1 genes confirmed the distinct molecular identity of this newly assembled mitogenome while at the same time showing high genetic relationship with sequences from China or Tanzania. Our study provided a new set of reference mitogenome with better contiguity and resolved repetitive regions that could be used for meaningful phylogenetic analysis to further understand disease transmission and parasite biology.

Author summary

Human trichuriasis, a neglected tropical disease caused by human whipworm Trichuris trichiura remains persistent in South Korea. Despite its medical importance, genomic data about their mitochondrial DNA is scarce. In this study, we used the long-read sequencing technology of Oxford Nanopore Technologies to provide high-quality complete mitogenomes of three T. trichiura isolated from Korean individuals. Interestingly, our assembled mitogenomes produced up to six times longer AT-rich regions that were not reported by previous reference mitogenome sequence proving the advantage of long-read sequencing over the short-read sequencing technologies. Also, comparative analysis through variant detection and phylogenetics confirmed the distinction of our newly assembled mitogenomes over the existing references in the database. Provision of these mitogenome information is fundamental in identifying genetic markers leading to a more reliable and precise helminth diagnostics.

Introduction

Human whipworm infection caused by Trichuris trichiura, is a common parasitic health problem that is categorized as Neglected Tropical Disease. Together with other parasites such as roundworm (Ascaris lumbricoides) and hookworm (Ancylostoma and Necator spp.), they are considered as the triad of soil-transmitted helminth (STH) infections that infest less privileged communities with poor sanitation and hygienic practice [1]. In 2015, the World Health Organization data showed that more than 1.5 billion people are affected by STH infections [2], whereas whipworm infection alone affects 477 million people worldwide [3,4]. In South Korea, the prevalence of soil-transmitted helminthiasis was higher than 60% in 1960s, but has been curved down to less than 1% prevalence since 1992 [5] and they are considered to be close to elimination. However, unlike ascariasis and hookworm infection, the trichuriasis is relatively persistent with prevalence as low as 0.4% [6].

Like the most helminths, whipworm shows high host specificity in parasitism [7,8]. The specific species, Trichuris trichiura, can infect human, while T. suis and T. vulpis can infect pigs and dogs, respectively [911]. Humans acquire the whipworm infection by ingesting soil or food contaminated with embryonated eggs that were released from adult worm of Trichuris trichiura in human intestine [12]. However, there is evidence showing whipworm infection across species [13,14]. Therefore, it is worthwhile to perform genetic analysis on whipworms isolated from human in the areas where human to human transmission is less likely [15].

More importantly, although these human whipworms have been studied for a long time, still there is scarcity of information about its mitochondrial genome (mitogenome). Currently, the only reference T. trichiura complete mitogenome available in the NCBI database is from China (GU385218) [10], Uganda (KT449826.1) [16], and another unpublished mitogenome from a group of Japanese researchers (AP017704.1). There are no records of T. trichiura mitogenome of Korean origin. Availability of complete mitogenome information of a specific human parasite is fundamental to parasitology research and provides further insights to its diagnostics, drug resistant strain identification, disease transmission, and phylogeographic and phylogenetic relationships [17].

In this study, we applied the long-read sequencing technology of Oxford Nanopore Technologies (ONT) utilizing third generation sequencing technology (TGS) to sequence complete mitogenomes of T. trichiura. Currently available T. trichiura reference mitogenomes were sequenced using next generation short-read sequencing (NGS) [10]. While NGS utilize 25–250 nucleotide reads [18], TGS utilize even >100-kb long reads [19] with the capability to better span the previously unknown regions of the genome, thus a single mitogenome can be covered even by a single read [20]. Among the TGS platforms, ONT offers a very cost effective approach in sequencing non-model organisms [21,22] by using its single use adapter (Flongle) that can be mounted in a portable sequencing platform (MinION) while maintaining the same sequencing integrity [23,24]. Using the ONT long-read sequencing, we provided de novo assembled complete mitogenomes of three T. trichiura isolated from Korean individuals.

Materials and methods

Parasite collection and DNA extraction

The parasite samples were collected from hospitals in South Korea from 2020–2021 and referred for morphologically diagnosis to the department of Tropical Medicine and Parasitology, Seoul National University College of Medicine. As anonymized and residual materials were used, formal consents were not obtained, and they were considered exempt from requiring research ethics approval by Institutional Review Board of Seoul National University Hospital.

The samples were washed in DEPC water to remove ethanol, frozen in liquid nitrogen, then homogenized with stainless steel beads in TissueLyser LT (QIAGEN) at 50 Hz, 3 cycles, 2 minutes per cycle. Then, genomic DNA was extracted using DNeasy tissue and a blood kit (QIAGEN) according to the manufacturer’s instructions.

Oxford Nanopore sequencing

MinION library preparation.

Extracted DNA of adult T. trichiura was prepared using the ONT MinION sequencing Kit (SQK-LSK109) with slight modifications on the manufacturer’s protocol. Briefly, DNA repair and tailing were performed using NEBNext FFPE DNA repair Mix (cat. no. M6630S) and NEBNext Ultra II End repair / dA-tailing Module reagents (cat. no. E7546S) (New England Biolabs) by adding 24 μL of DNA samples with the reagents to make 30 μL reaction and then incubated at 20°C for 5 min and 65°C for 5 min. The repaired/end-prepped DNA was then cleaned up using AMPure XP beads (cat. no. A63880, Beckman Coulter Inc.) following manufacturer’s protocol. Samples were kept on the magnetic rack and washed twice with 200 μL freshly prepared 70% ethanol in nuclease-free water. Then 30 μL DNA sample was mixed with adapter reagents to produce 50 μL reaction and incubated at room temperature for 10 min followed by final clean up with AMPure XP product. Beads were washed off using Fragment buffer in the ONT kit before resuspending with 7 μL Elution buffer at room temperature for 10 min followed by additional incubation at 37°C to optimize recovery. Eluted sample was quantified using the Qubit fluorometer and adjusted to ensure that sample concentration is within the recommendation (3–20 fmol, purity 1.8) before loading onto the flow cell. Lastly, flush buffer (117 μL) and flush tether (3 μL) were loaded in a Flongle flow cell in preparation for the sequencing. The final genomic library was mixed with sequencing buffer and loading beads to make a total of 30 μL reaction and ready for sequencing.

MinION sequencing.

Long-read sequencing was performed using Flongle flow cells inserted in the MinION portable sequencer (Oxford Nanopore Technologies) connected to the computer. MinKNOW software (v22.03.5) was used to assess flow cell quality and monitor pore activity. Prepared genomic library was then loaded to the flow cell and sequencing was run for 24-h period following manufacturer’s recommendation. Only the FASTQ output files (raw reads) were used in assembly steps.

Genome assembly, polishing, and circularization of mitogenomes

Preparation of mitochondrial ONT reads.

Raw reads that specifically span the mitochondrial genome of the reference whole genome were utilized for the assembly to remove any non-mitochondrial genomic sequence in our long-read sequencing data. To obtain these mitochondrial reads, firstly, the mitochondrial region of the reference whole genome (GCA_000613005.1_TTRE2.1) was annotated using a previously published mitogenome (GU385218) downloaded from the NCBI genome database. Annotation was performed by aligning GU385218 to GCA_000613005.1_TTRE2.1 using nucmer and show-tilling from the MUMmer package [25,26]. This annotated mitogenome in the reference genome serves as a template to characterize raw mitochondrial ONT reads. Secondly, our ONT raw whole-genome reads were mapped to the annotated reference mitogenome using minimap2 [27], and alignments were sorted and indexed using SAMtools [28]. Thirdly, the ONT raw reads mapped to the reference mitogenome were extracted and categorized as mitochondrial reads in our samples using SAMtools and seqtk.

De novo mitogenome assembly.

We performed reference-guided mitochondrial read preparation and de novo assembly of the reads based on the method introduced by Schneeberger et al. [29]: raw reads were initially mapped to a reference genome, the spanning reads were extracted as mitochondrial reads, and these mitochondrial reads were used to create a complete mitochondrial contig. Briefly, extracted ONT reads having >20× coverage were assembled using Canu [30] with the parameter “14k” as the estimated Trichuris trichiura mitogenome size. For our three assembled mitogenome based on ONT long reads, each mitogenome sequence was polished using the alignment information of their corresponding raw reads by minimap2 and RACON [31]. This polishing step is repeated once again. The polished mitogenomes are then circularized after removing overhang regions using the Geneious prime (version 2022.2.2). These circularized mitogenomes were rearranged in relation to the starting nucleotides of the reference mitogenomes using nucmer and show-coords in the MUMmer package. Visualization of the circularized mitogenome and its features was conducted in the CGView tool (cgview.ca). All scripts for each step are compiled in S1 File.

Genome annotation and analysis.

Assembled mitogenomes were annotated in terms of variants and genes using Geneious prime (version 2022.2.2). Variants in the mitogenomes were detected in the galaxy platform (usegalaxy.org). First, bcftools mpileup and bcftools calling were used to produce a VCF file, then variants were annotated using SnpEff [32]. Variants called were tabulated and used to create a Venn diagram (Venny.com). The copy number variation (CNV) of AT-rich region were detected using Tandem Repeat Finder (tandem.bu.edu). Pairwise comparison was performed in Geneious prime whereas concatenated protein coding genes (PCGs), cox1, and rrnS regions of each mitogenome were extracted and aligned with the corresponding reference sequences to report variability and create a heatmap using R studio (version R.4.1.2). Also, all mitogenomes were aligned to each other and compared using dnadiff in the MUMmer package [25,26]. Then, principal component analysis (PCA) was performed using average nucleotide identity (ANI) of each pair of mitogenomes. Results were tabulated to produce a PCA graph in R studio (version R.4.1.2). All scripts for each step are compiled in S1 File.

Phylogenetic analysis.

Phylogenetic analysis was performed using Maximum Likelihood (ML) method. Our assembled mitogenomes were compared with cox1 sequences, rrnL sequences, and nd1 sequences of Trichuris sp. from non-human hosts and T. trichiura from other countries. The highly conserved cox1 region was chosen to confirm the clustering between our assembled mitogenomes and Trichuris spp. while Trichinella spiralis is used as an outgroup. Lists of sequences used in this analysis are in S2 File. Datasets were aligned using ClustalW using default parameters. Then aligned sequences were trimmed manually in MEGA X to remove unaligned codons and nucleotides. The ML tree was generated using iQTree [33]. Best-fit substitution model was determined using the jModelTest [34] on CIPRES [35]. For the cox1 and nd1 dataset, the mtREV + G model was chosen [36], while HKY G+I model was chosen for the rrnL dataset. Bootstrap support for topology was set to (mini-heuristic option) 1000 replications. Nucleotide substitution rates were also shown in the consensus phylogenetic tree. All trees were saved in newick format and visualized in Figtree v1.4.4 (http://tree.bio.ed.ac.uk/software/figtree/).

Results

Characteristics of de novo assembled mitochondrial genomes using a long-read sequencing approach

We sequenced genomic DNA of three Trichuris trichiura samples from Korean individuals and assembled the sequencing reads into three mitogenomes (GenBank Accession numbers: TTK1—ON646012, TTK2—ON711246, TTK3—ON682760). Initially, the analysis was intended for whole genome assembly but due to the low sequencing depths of the whole genome data, we shifted and focused on the mitochondrial genome analysis, as the mitochondrial genome has much higher copy number than that of the nuclear genome. Nearly 15,128 high-quality long reads were de novo assembled into a 14.5-kb sequence with an N50 of up to 4234 bp. All assembled complete mitogenomes have raw read sequencing depths of 74×, 30×, and 52×, respectively (S1 Table) (raw data are available in the Sequence Read Archive under the accession number PRJNA823754). Among the three mitogenomes, TTK1 exhibited the longest mitogenome length (14,508 bp) and the best resolved AT-rich region (see below) (Fig 1). All of our mitogenomes were composed of ~69.0% A + T bases, which is slightly higher than those of previously published reference mitogenomes (~60.8%). This AT-bias can be explained by an unresolved AT-repeat cluster in the previously published mitogenomes, as they were assembled using short-read sequencing technologies. Indeed, the length of the AT-repeat cluster in our mitogenome (255 copies of AT sequences) was six times longer than any of the previous reference mitogenomes (Table 1). The raw-read depth of the AT-repeat cluster of our mitogenome was comparable to that of the other mitogenome sequences (ratio 1.09; 3.09~3.51% while references has 0.18~0.62%), which suggests that the AT-repeat cluster was assembled with highest reliability. This emphasize the superior capability of long-read sequencing in resolving a long span of AT repeats in the genome compared to that of short-reads sequencing.

thumbnail
Fig 1. Trichuris trichiura complete mitogenome circular map showing annotated features.

Schematic representation of the mitogenome including 13 Protein coding genes, 2 rRNAs and 22 tRNAs. AT-rich region that span 6× longer than reference mitogenomes lies between nd1 and nd2. Inner most ring shows the GC content and GC skew of each regions.

https://doi.org/10.1371/journal.pntd.0011586.g001

thumbnail
Table 1. Features of de novo assembled Trichuris trichiura mitogenomes from Korean patients and published reference mitogenomes.

https://doi.org/10.1371/journal.pntd.0011586.t001

Sequence annotation and variant identification compared to T. trichiura reference sequences

We annotated genes in our mitogenome based on nucleotide similarity with the gene sequences of the reference. Consistent with all previously published mitogenomes, 37 predicted genes and two non-coding regions were identified that consist of 13 PCGs (cox1–3, nad1–6, nad4L, cytb, atp6, and atp8), 22 tRNA genes, and two rRNA genes (rrnS and rrnL). The gene order and length of the PCGs are the same with that of the reference mitogenomes, but the nucleotide positions varied due to the difference in the length of AT-repeat cluster (S2 Table).

Genetic variants were called between the assembled and reference mitogenomes. When setting the reference mitogenome with TTCN, the most recently published mitogenome of T. trichiura, TTK1 had higher number of variants in all genes than TTK2 and TTK3 with variants ranging from 4.7%–9.2% in each PCG. In particular, nd2, nd3, nd4 and nd4L regions showed hypervariability (5.26%~9.15%) while cox1 was found to be the least variable in TTK1 (Table 2).

thumbnail
Table 2. Variants called in each gene region between our mitogenomes and the reference sequence.

*gene length of each PCGs in the TTCN mitogenome | ** variant ratio = gene variant count / gene length (bp).

https://doi.org/10.1371/journal.pntd.0011586.t002

In addition, we also assessed possible impacts of these genetic variants using SnpEff. Synonymous and non-synonymous genetic variants differ between our assembled mitogenomes. The overall transition and transversion ratio ranges from 14.46–18.80 while the missense and silent mutation ratio was 0.42–0.67 (S3 Table).

With three mitogenomes we obtained and three mitogenomes from previous studies, we tried to figure out relationships in all six mitogenomes in terms of pairwise variation analysis using concatenated PCGs, cox1, and rrnL sequences (Fig 2). Concatenated PCGs, cox1 and rrnL sequences revealed similar patterns of sequence differences in the six mitogenomes, but not perfectly overlapped (S3 Table).

thumbnail
Fig 2. Heatmap of mitogenome variability.

Pairwise results of concatenated PCGs, cox1 region, and rrn L region of our assembled mitogenomes (TTK1-K3) and of Japan (TTJP), China (TTCN), and Uganda (TTUG) reference mitogenomes.

https://doi.org/10.1371/journal.pntd.0011586.g002

Impressively, we found that T. trichiura mitogenomes are extremely divergent, as some mitogenomes exhibited ~18% sequence differences even in the most conserved cox1 gene (S4 Table). Specifically, the Ugandan mitogenome was most different from any other mitogenomes, and this feature was not dependent on sequencing technologies, as pairs of short-read sequencing-based mitogenomes also exhibited 16.8%–20.7% of sequence differences. TTK1 is similar to the mitogenome from Japan, while both TTK2 and TTK3 were more similar to the mitogenome from China (Fig 3).

thumbnail
Fig 3. Genetic similarities between our assembled mitogenomes and the reference mitogenomes.

(A) Venn diagrams showing the unique and shared single nucleotide variations (SNVs) of TTK1 mitogenome in relation to each reference mitogenome. (B) Principal component analysis (PCA) using average nucleotide identity of each mitogenome.

https://doi.org/10.1371/journal.pntd.0011586.g003

Phylogenetics analysis based on the mitogenome

To confirm the molecular identity of our assembled sequences and its relationship to previously reported sequences, phylogenetic analyses were performed using three mitochondrial markers namely: cox1, rrnL, and nd1 sequences. Consensus phylogenetic trees constructed using ML method exhibited similar tree topology, which also remain consistent throughout the different mitochondrial markers used.

Two major clades corresponding to T. trichiura and T. suis were observed in the tree produced using cox1 genes of whipworms collected from pigs, non-human primates and humans from different geographic regions (Fig 4). T. suis population consisting of worms from China, Uganda, Denmark, and Spain formed its own separate subclade while in the same major clade with whipworms derived from Old world monkeys; C. g. kikuyensis (mantled guereza), P. ursinus (chacma baboon) and C. sabaeus (green monkey). Leaf monkeys (T. francoisi) formed two separate sister relationship within the T. trichiura clade. The first group clustered with most of the human-derived Trichuris while the second group formed separate from the rest of the Trichuris sp. Similarly, Japanese macaque (M. fuscata) formed its own subclade under the T. trichiura clade while in a sister relationship with a subclade composed of T. trichiura sequences from human in Uganda together with most of the non-human primate derived worms from Chacma baboons (P. hamadrayas), Barbary ape (M. sylvanus) and Papio species.

thumbnail
Fig 4. Maximum likelihood tree showing the amount of genetic variability that has occurred between Trichuris species extracted from pigs, human and non-human primates.

Analysis is based on cox1 genes of our assembled sequence together with sequences reported and used by Cavallero et al., 2019 [37] and Doyle et al., 2022 [38]. A total of 106 sequences were used including the outgroup. Bootstrap values of branches with less than 85 value were not presented. Triangles represent collapsed cluster of group of taxa or clade within the tree.

https://doi.org/10.1371/journal.pntd.0011586.g004

In addition, a separate subclade of whipworms from humans showed three separate monophyletic groups where Group 1 consists of sequences from Chinese patients together with our assembled sequences TTK2 and TTK3. Group 2 consists of our assembled sequence TTK1, a sequence from a human in Tanzania, and a NHP derived sequence from a Guinea baboon (P. papio) while Group 3 is composed of the rest of the T. trichiura sequences from other geographic locations namely; Japan, Ecuador, Honduras, Cameroon and Tanzania (Fig 4).

To provide a better resolution to the relationship between human-derived and NHP-derived Trichuris, we use took advantage of the variable nature of rrnL sequences to detect intraspecies relationship (Fig 5). The results of rrnL phylogenetic analysis confirmed the relationship observed using cox1 gene markers. However, in the rrnL tree, the Guinea baboon (P. papio) sequence did not form a sister relationship as previously observed in cox1 tree, rather it was positioned in between the monophyletic group of China and our TTK2 and TTK3 sequences and the group of TTK1 other human derived sequences except for Uganda and China. Consistently, human whipworms from Uganda clustered with NHP-derived worms from P. hamadrayas, M. sylvanus, and Papio sp. monkeys. Meanwhile, some of the sequences of Trichuris from T. francoisi and M. fuscata formed their separate cluster, respectively.

thumbnail
Fig 5. Maximum likelihood tree showing the amount of genetic variability that has occurred between Trichuris species extracted from human and non-human primates.

Analysis is based on rrnL genes of our assembled sequence together with sequences reported and used by Cavallero et al., 2019 [37] and Doyle et al., 2022 [38]. A total of 98 sequences were used including the outgroup. Bootstrap values of branches with less than 60 value were not presented. Triangles represent collapsed cluster of group of taxa or clade within the tree.

https://doi.org/10.1371/journal.pntd.0011586.g005

Then, hypervariable nd1 sequence was used to produce a detailed tree of whipworms from human excluding sequences from human in Uganda that were consistently clustered with NHP derived sequences. The nd1 tree confirmed the phylogenetic relationship of human and NHP derived whipworm sequences using cox1 and rrnL gene markers (Fig 6). P. papio formed a sister relationship with our assembled sequence TTK1 and sequence form human in Tanzania similar to Group 1 in cox1 results. The results also emphasized the geographical distribution of human whipworms where China and Korea (TTK2 and TTK3) formed its own cluster like Group 2 in cox 1 results, which is separate from Ecuador, Honduras, Cameron and Japan (Group 3). In addition, Group 3 in cox 1 tree could be divided into 2 subgroups that hold Japan sequence and Tanzania sequence in each subgroup while Ecuador, Honduras, Cameroon sequences are distributed in both subgroups, not segregated.

thumbnail
Fig 6. Maximum likelihood tree showing the amount of genetic variability between Trichuris species extracted from human and non-human primates.

Analysis is based on nd1 genes of our assembled sequence together with sequences reported and used by Cavallero et al., 2019 [37] and Doyle et al., 2022 [38]. A total of 39 sequences were used including the outgroup. Bootstrap values of branches with less than 60 value were not presented. Triangles represent collapsed cluster of group of taxa or clade within the tree.

https://doi.org/10.1371/journal.pntd.0011586.g006

Discussion

Knowledge about mitogenomes has been proven essential in human parasite diagnostics, however the lack of substantial data for comparison is still a challenge in T. trichiura research. Here we generated three complete mitogenomes of T. trichiura isolated from Korean patients using ONT long-read sequencing technology. To our knowledge, we report the first complete mitogenome of Trichuris trichiura from Korea. The main feature of our assembled mitogenome is the longer span of the AT-repeat cluster that has not been perfectly resolved due to the limitations of short-read sequencing technique.

Our findings emphasize the advantage of TGS long-read sequencing (>10 kb read-length) in terms of resolving repeats over the second generation or NGS short-read sequencing technologies (<500 bp). Illumina sequencing uses short reads by synthesis while ONT sequencing directly allow the DNA strands to pass a nanopore and detect the changes in ionic current which correspond to specific nucleotide [23]. Currently, the major issue of using TGS is the higher error rate than that of NGS given the single-molecule sequencing chemistry of TGS. However, this is compensated by increasing the sequencing depth and the use of updated platforms with reliable quality control measures [17]. Illumina deep sequencing has the advantage of better variant detection but is limited to incomplete reconstruction of the genomic information while ONT offers portability and cost effective construction of complete genomic information despite its higher error rate [39]. Among the two TGS platforms, PacBio sequencing can produce higher base-level quality of reads with lower error rate compared to ONT or even NGS [40,41]. However, ONT is gaining popularity through its reliability in producing slightly longer mappable reads at a lower cost (1,000–2,000 USD) [42]. Availability of cheaper alternatives such as ONT sequencing will open more opportunities for non-model species to be sequenced, thus lessen the scarcity of genomic information in the database. Thus in this study we opted to use ONT to report the first T. trichiura complete mitogenome from Korea.

Just like any other eukaryotic organisms, nematodes follow a strict maternal inheritance in their mitochondrial genomes [43]. The role and interaction of mitochondrial and nuclear genomes are fundamental in ensuring efficient ATP synthesis by coordinating protein function and RNA production [44]. Interestingly, it was shown that the relevance of mitochondrial genomes to the nuclear genomes lies in the fact that mitonuclear interaction has been observed to have effects on the nuclear genome’s physiology and direction of evolution [45]. In fact, similar tree topologies have been observed for Nematoda phylogenetic analysis using mitochondrial and nuclear genomes [46,47]. Thus, mitochondrial genomes may serve as a complementary solution in instances where nuclear genomes fail to provide better species resolution or when morphologic and genetic data are contradicting [48].

Compared to the available reference complete mitogenome sequences of T. trichiura from humans, there is a substantial extension in the length of our assembled mitogenome of T. trichiura, mostly attributing to the previously unresolved AT repeats in the non-coding region. Mitogenomes are known to have high mutation rates due to its susceptibility to deletions particularly in non-coding regions with high repeats [49]. These highly clustered repeats in non-coding sections of mitogenomes have been suggested to be negatively associated with mammalian longevity [50]. These variations in the copy number of mitochondrial AT repeats have been used to measure intraspecific genetic variability [51]. In the setting of infectious diseases such as schistosomiasis [52] and tuberculosis [53], the repeats serve as genetic markers in resolving intraspecific differences, super-infection, and mixed infections. They may also play an important role in DNA replication and transcription as initiation sites for polymerase binding and termination [54]. Thus, it is also important to correctly assemble and resolve AT repeats in the non-coding region, which could be accomplished by sequencing with ONT in this study.

Pairwise comparison of our mitogenomes with reference sequences revealed relatively conserved PCGs and highly variable regions. Consistent previous studies [46,55], we also report cox1 as the most conserved PCGs. So far, cox1 is widely used in mitochondrial gene analyses to further understand species diversity, diagnostics, and population variation [36]. On the other hand, NADH dehydrogenase subunit (nad) genes showed highest variability among PCGs. Between cox1 and high-variable regions, the latter is more recommended for species prospecting [55]. These variable regions can even be used to differentiate between isolates of the same species collected from one host. That is, if multiple whipworms were extracted from one patient, we can amplify these regions from each worm to potentially detect if they came from the same maternal origin.

Phylogenetic analysis results provided evidence that our assembled mitogenomes were indeed T. trichiura sequences yet distinct from the reference sequences in the database. Using the highly conserved cox1 gene as mitogenome marker we report separate clades of T. trichiura and T. suis and all our sequences belong to T. trichiura. This distinction is consistent with previous reports [14,16,46]. Within the clades of T. trichiura, distinction between human and NHP-derived whipworms were clearly demonstrated by phylogenetic analysis using cox1 and rrnL genes, which is consistently shown by several studies [9,37,46,56].

However, some sequences support the transmission of Trichuris species shared between human and non-human primates [16]. T. trichiura from Uganda was clustered with a few Trichuris from non-human primates across all mitogenome markers used in this analysis, and one Trichuris from Guinea baboon (P. papio) somewhat clustered with our TTK1 and T. trichiura from Tanzania by cox1 and nd1 gene analysis. Similar finding was described in a previous study [37] which utilized cox1, cob and concatenated protein-coding genes to report the complex relationship among Trichuris sp. that infects primates.

Within T. trichiura from human, we identified 3 groups: TTK2 and TTK3 clustered with sequences from China, TTK1 clustered with some sequences from Tanzania and a sequence from Guinea baboon, and rest of the T. trichiura sequences from other geographic locations. By using the hypervariable region nd1 gene, the last group could be divided into two subgroup that is not generally segregated by locations. Additional sequences from different locations are needed to clearly demonstrate intraspecies clustering. The close phylogenetic relationship between T. trichiura sequence from Korea and China was first reported by Hong et al., [57] where they amplified ancient DNA collected from latrines in mummified sites in Korea. Interestingly, this is the first report of a close phylogenetic relationship between T. trichiura sequence from Korea and Tanzania.

Since this topology is consistent even when highly variable genomic region is used for the same analysis, there is a high chance that our samples are local infections given the evidence of its significant distinction from other reference sequences from neighboring countries. Although in a previous study, Hawash et al., [16] suggested dispersal of Trichuris infection from Africa to Asia and TTK1 is clustered with sequence from Tanzania, other two mitogenomes are distinct from previously reported T. trichiura from Africa and closer to sequences from a neighboring Asian country, China.

In this study, we provided information about complete mitogenomes of T. trichiura isolated from Korean individuals. The cost-effectiveness and portability of using ONT in producing complete mitogenome sequences rather than the more laborious whole genome sequences is very promising for resource limited countries with rampant parasitic infections. Our de novo assembled mitogenomes were longer than any other references benefiting from long-read sequencing. Furthermore, comparative analysis revealed that they were not clustered with Trichuris spp. isolated from non-human hosts but were clustered either with mitogenomes from China or Tanzania. Our study provided a new set of reference mitogenome with better contiguity and resolved repetitive regions that could be used for meaningful phylogenetic analysis to further understand disease transmission and parasite biology.

Supporting information

S2 Table. Features of annotated protein coding genes our mitogenome and the reference sequences.

https://doi.org/10.1371/journal.pntd.0011586.s002

(DOCX)

S3 Table. Snpeff variant calling results.

T. trichiura China (GU385218) is used as reference genome for variant calling since it is the closest published reference mitogenome.

https://doi.org/10.1371/journal.pntd.0011586.s003

(DOCX)

S4 Table. Variability of mitogenomes.

Pairwise comparison results between assembled mitogenomes and reference sequences.

https://doi.org/10.1371/journal.pntd.0011586.s004

(DOCX)

S2 File. List of sequences included in the analysis.

https://doi.org/10.1371/journal.pntd.0011586.s006

(DOCX)

References

  1. 1. Hotez PJ, Fenwick A, Savioli L, Molyneux DH. Rescuing the bottom billion through control of neglected tropical diseases. Lancet. 2009;373(9674):1570–5. pmid:19410718
  2. 2. Organization WH. Investing to overcome the global impact of neglected tropical diseases: third WHO report on neglected tropical diseases 2015: World Health Organization; 2015.
  3. 3. James SL, Abate D, Abate KH, Abay SM, Abbafati C, Abbasi N, et al. Global, regional, and national incidence, prevalence, and years lived with disability for 354 diseases and injuries for 195 countries and territories, 1990–2017: a systematic analysis for the Global Burden of Disease Study 2017. The Lancet. 2018;392(10159):1789–858. pmid:30496104
  4. 4. Degenhardt L, Dicker D, Duan L, Erskine H, Feigin VL, Ferrari AJ, et al. Global, regional, and national incidence, prevalence, and years lived with disability for 301 acute and chronic diseases and injuries in 188. Lancet. 2015;386(9995):743–800.
  5. 5. Hong ST, Yong TS. Review of Successful Control of Parasitic Infections in Korea. Infect Chemother. 2020;52(3):427–40. pmid:32869557
  6. 6. Cho S, Jeong B, Lee S. National survey of intestinal parasitic infections in Korea, 8th report (2013)[Internet]. Cheongju: Korea Centers for Disease Control and Prevention [cited 2019 Jul 25].
  7. 7. Di Filippo MM, Berrilli F, De Liberato C, Di Giovanni V, D’Amelio S, Friedrich KG, et al. Molecular characterization of Trichuris spp. from captive animals based on mitochondrial markers. Parasitology International. 2020;75:102043. pmid:31881362
  8. 8. Cavallero S, Montalbano Di Filippo M, Rondón S, Liberato CD, D’Amelio S, Friedrich KG, et al. Nuclear and mitochondrial data on Trichuris from Macaca fuscata support evidence of host specificity. Life. 2020;11(1):18. pmid:33396199
  9. 9. Ravasi DF, O’Riain MJ, Davids F, Illing N. Phylogenetic evidence that two distinct Trichuris genotypes infect both humans and non-human primates. PLoS One. 2012;7(8):e44187. pmid:22952922
  10. 10. Liu GH, Gasser RB, Su A, Nejsum P, Peng L, Lin RQ, et al. Clear genetic distinctiveness between human- and pig-derived Trichuris based on analyses of mitochondrial datasets. PLoS Negl Trop Dis. 2012;6(2):e1539. pmid:22363831
  11. 11. Cutillas C, de Rojas M, Ariza C, Ubeda JM, Guevara D. Molecular identification of Trichuris vulpis and Trichuris suis isolated from different hosts. Parasitology Research. 2007;100(2):383–9. pmid:17004099
  12. 12. Else KJ, Keiser J, Holland CV, Grencis RK, Sattelle DB, Fujiwara RT, et al. Whipworm and roundworm infections. Nature Reviews Disease Primers. 2020;6(1):1–23.
  13. 13. Areekul P, Putaporntip C, Pattanawong U, Sitthicharoenchai P, Jongwutiwes S. Trichuris vulpis and T. trichiura infections among schoolchildren of a rural community in northwestern Thailand: the possible role of dogs in disease transmission. Asian Biomedicine. 2010;4(1):49–60.
  14. 14. Ghai RR, Simons ND, Chapman CA, Omeja PA, Davies TJ, Ting N, et al. Hidden population structure and cross-species transmission of whipworms (Trichuris sp.) in humans and non-human primates in Uganda. PLoS Negl Trop Dis. 2014;8(10):e3256. pmid:25340752
  15. 15. Ishizaki Y, Kawashima K, Gunji N, Onizawa M, Hikichi T, Hasegawa M, et al. Trichuris trichiura Incidentally Detected by Colonoscopy and Identified by a Genetic Analysis. Internal Medicine. 2022;61(6):821–5. pmid:34471029
  16. 16. Hawash MB, Andersen LO, Gasser RB, Stensvold CR, Nejsum P. Mitochondrial Genome Analyses Suggest Multiple Trichuris Species in Humans, Baboons, and Pigs from Different Geographical Regions. PLoS Negl Trop Dis. 2015;9(9):e0004059. pmid:26367282
  17. 17. Palevich N, Maclean PH. Sequencing and Reconstructing Helminth Mitochondrial Genomes Directly from Genomic Next-Generation Sequencing Data. Methods Mol Biol. 2021;2369:27–40. pmid:34313982
  18. 18. Cock PJ, Fields CJ, Goto N, Heuer ML, Rice PM. The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. Nucleic acids research. 2010;38(6):1767–71. pmid:20015970
  19. 19. Catasti P, Chen X, Mariappan S, Bradbury EM, Gupta G. DNA repeats in the human genome. Structural Biology and Functional Genomics: Springer; 1999. p. 19–51.
  20. 20. Loman NJ, Quick J, Simpson JT. A complete bacterial genome assembled de novo using only nanopore sequencing data. Nat Methods. 2015;12(8):733–5. pmid:26076426
  21. 21. Ekblom R, Galindo J. Applications of next generation sequencing in molecular ecology of non-model organisms. Heredity. 2011;107(1):1–15. pmid:21139633
  22. 22. da Fonseca RR, Albrechtsen A, Themudo GE, Ramos-Madrigal J, Sibbesen JA, Maretty L, et al. Next-generation biology: sequencing and data analysis approaches for non-model organisms. Marine genomics. 2016;30:3–13. pmid:27184710
  23. 23. Jain M, Olsen HE, Paten B, Akeson M. The Oxford Nanopore MinION: delivery of nanopore sequencing to the genomics community. Genome Biol. 2016;17(1):239. pmid:27887629
  24. 24. Vereecke N, Bokma J, Haesebrouck F, Nauwynck H, Boyen F, Pardon B, et al. High quality genome assemblies of Mycoplasma bovis using a taxon-specific Bonito basecaller for MinION and Flongle long-read nanopore sequencing. BMC Bioinformatics. 2020;21(1):517. pmid:33176691
  25. 25. Kurtz S, Phillippy A, Delcher AL, Smoot M, Shumway M, Antonescu C, et al. Versatile and open software for comparing large genomes. Genome Biol. 2004;5(2):R12. pmid:14759262
  26. 26. Marcais G, Delcher AL, Phillippy AM, Coston R, Salzberg SL, Zimin A. MUMmer4: A fast and versatile genome alignment system. PLoS Comput Biol. 2018;14(1):e1005944. pmid:29373581
  27. 27. Li H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics. 2018;34(18):3094–100. pmid:29750242
  28. 28. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009;25(16):2078–9. pmid:19505943
  29. 29. Schneeberger K, Ossowski S, Ott F, Klein JD, Wang X, Lanz C, et al. Reference-guided assembly of four diverse Arabidopsis thaliana genomes. Proceedings of the National Academy of Sciences. 2011;108(25):10249–54. pmid:21646520
  30. 30. Koren S, Walenz BP, Berlin K, Miller JR, Bergman NH, Phillippy AM. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res. 2017;27(5):722–36. pmid:28298431
  31. 31. Vaser R, Sovic I, Nagarajan N, Sikic M. Fast and accurate de novo genome assembly from long uncorrected reads. Genome Res. 2017;27(5):737–46. pmid:28100585
  32. 32. Cingolani P, Platts A, Wang le L, Coon M, Nguyen T, Wang L, et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly (Austin). 2012;6(2):80–92. pmid:22728672
  33. 33. Nguyen LT, Schmidt HA, von Haeseler A, Minh BQ. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol Biol Evol. 2015;32(1):268–74. pmid:25371430
  34. 34. Posada D. jModelTest: phylogenetic model averaging. Mol Biol Evol. 2008;25(7):1253–6. pmid:18397919
  35. 35. Miller MA, Schwartz T, Pickett BE, He S, Klem EB, Scheuermann RH, et al. A RESTful API for Access to Phylogenetic Tools via the CIPRES Science Gateway. Evol Bioinform Online. 2015;11:43–8. pmid:25861210
  36. 36. Blasco-Costa I, Cutmore SC, Miller TL, Nolan MJ. Molecular approaches to trematode systematics: ’best practice’ and implications for future study. Syst Parasitol. 2016;93(3):295–306. pmid:26898592
  37. 37. Cavallero S, Nejsum P, Cutillas C, Callejon R, Dolezalova J, Modry D, et al. Insights into the molecular systematics of Trichuris infecting captive primates based on mitochondrial DNA analysis. Vet Parasitol. 2019;272:23–30. pmid:31395201
  38. 38. Doyle SR, Søe MJ, Nejsum P, Betson M, Cooper PJ, Peng L, et al. Population genomics of ancient and modern Trichuris trichiura. Nature communications. 2022;13(1):3888. pmid:35794092
  39. 39. Stefan CP, Hall AT, Graham AS, Minogue TD. Comparison of Illumina and Oxford Nanopore Sequencing Technologies for Pathogen Detection from Clinical Matrices Using Molecular Inversion Probes. J Mol Diagn. 2022. pmid:35085783
  40. 40. Wenger AM, Peluso P, Rowell WJ, Chang PC, Hall RJ, Concepcion GT, et al. Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome. Nat Biotechnol. 2019;37(10):1155–62. pmid:31406327
  41. 41. Baid G, Cook DE, Shafin K, Yun T, Llinares-Lopez F, Berthet Q, et al. DeepConsensus improves the accuracy of sequences with a gap-aware sequence transformer. Nat Biotechnol. 2023;41(2):232–8. pmid:36050551
  42. 42. Weirather JL, de Cesare M, Wang Y, Piazza P, Sebastiano V, Wang XJ, et al. Comprehensive comparison of Pacific Biosciences and Oxford Nanopore Technologies and their applications to transcriptome analysis. F1000Res. 2017;6:100. pmid:28868132
  43. 43. Avise JC, Bowen BW. Investigating sea turtle migration using DNA markers. Curr Opin Genet Dev. 1994;4(6):882–6. pmid:7888759
  44. 44. Poyton RO, McEwen JE. Crosstalk between nuclear and mitochondrial genomes. Annu Rev Biochem. 1996;65:563–607. pmid:8811190
  45. 45. Wei W, Schon KR, Elgar G, Orioli A, Tanguy M, Giess A, et al. Nuclear-embedded mitochondrial DNA sequences in 66,083 human genomes. Nature. 2022;611(7934):105–14. pmid:36198798
  46. 46. Callejon R, Nadler S, De Rojas M, Zurita A, Petrasova J, Cutillas C. Molecular characterization and phylogeny of whipworm nematodes inferred from DNA sequences of cox1 mtDNA and 18S rDNA. Parasitol Res. 2013;112(11):3933–49. pmid:24018707
  47. 47. Xie Y, Zhao B, Hoberg EP, Li M, Zhou X, Gu X, et al. Genetic characterisation and phylogenetic status of whipworms (Trichuris spp.) from captive non-human primates in China, determined by nuclear and mitochondrial sequencing. Parasit Vectors. 2018;11(1):516. pmid:30236150
  48. 48. Kern EM, Kim T, Park J-K. The mitochondrial genome in nematode phylogenetics. Frontiers in Ecology and Evolution. 2020;8:250.
  49. 49. Brown WM, George M Jr., Wilson AC. Rapid evolution of animal mitochondrial DNA. Proc Natl Acad Sci U S A. 1979;76(4):1967–71.
  50. 50. Khaidakov M, Siegel ER, Shmookler Reis RJ. Direct repeats in mitochondrial DNA and mammalian lifespan. Mech Ageing Dev. 2006;127(10):808–12. pmid:16956646
  51. 51. Lunt DH, Whipple LE, Hyman BC. Mitochondrial DNA variable number tandem repeats (VNTRs): utility and problems in molecular ecology. Mol Ecol. 1998;7(11):1441–55. pmid:9819900
  52. 52. Curtis JA. Genetic diversity Schistosoma mansoni: Evidence and implications of population structure: Purdue University; 2001.
  53. 53. Lee J, Tupasi TE, Park YK. Use of the VNTR typing technique to determine the origin of Mycobacterium tuberculosis strains isolated from Filipino patients in Korea. World J Microbiol Biotechnol. 2014;30(5):1625–31. pmid:24415462
  54. 54. Clayton DA. Transcription and replication of animal mitochondrial DNAs. Int Rev Cytol. 1992;141:217–32. pmid:1452432
  55. 55. Blouin MS. Molecular prospecting for cryptic species of nematodes: mitochondrial DNA versus internal transcribed spacer. Int J Parasitol. 2002;32(5):527–31. pmid:11943225
  56. 56. Rivero J, Cutillas C, Callejon R. Trichuris trichiura (Linnaeus, 1771) From Human and Non-human Primates: Morphology, Biometry, Host Specificity, Molecular Characterization, and Phylogeny. Front Vet Sci. 2020;7:626120. pmid:33681315
  57. 57. Hong JH, Seo M, Oh CS, Shin DH. Genetic analysis of small-subunit ribosomal RNA, internal transcribed spacer 2, and ATP synthase subunit 8 of Trichuris trichiura ancient DNA retrieved from the 15th to 18th century Joseon Dynasty mummies’ coprolites from Korea. Journal of Parasitology. 2019;105(4):539–45. pmid:31310584