Rapid infectious disease identification by next-generation DNA sequencing

Jeremy E. Ellis, Dara S. Missan, Matthew Shabilla, Delyn Martinez, Stephen E. Fry ⁎
Fry Laboratories, L.L.C., 15720 N. Greenway-Hayden Loop STE 3, Scottsdale, AZ 85260, United States

Article info

Article history:
Received 30 March 2016
Received in revised form 26 July 2016
Accepted 16 September 2016
Available online xxxx

Keywords:
Next generation DNA sequencing
Community profiling
Clinical NGS
NGS validation
Rapid infectious disease identification

Abstract

Currently, there is a critical need to rapidly identify infectious organisms in clinical samples. Next- Generation Sequencing (NGS) could surmount the deficiencies of culture-based methods; however, there are no standardized, automated programs to process NGS data. To address this deficiency, we developed the Rapid Infectious Disease Identification (RIDI™) system. The system requires minimal guidance, which reduces operator errors. The system is compatible with the three major NGS platforms. It automatically interfaces with the sequencing system, detects their data format, configures the analysis type, applies appropriate quality control, and analyzes the results. Sequence information is characterized using both the NCBI database and RIDI™ specific databases. RIDI™ was designed to identify high probability sequence matches and more divergent matches that could represent different or novel species. We challenged the system using defined American Type Culture Collection (ATCC) reference standards of 27 species, both individually and in varying combinations. The system was able to rapidly detect known organisms in b12 h with multi-sample throughput. The system accurately identifies 99.5% of the DNA sequence reads at the genus-level and 75.3% at the species-level in reference standards. It has a limit of detection of 146 cells/ml in simulated clinical samples, and is also able to identify the components of polymicrobial samples with 16.9% discrepancy at the genus-level and 31.2% at the species-level. Thus, the system’s effectiveness may exceed current methods, especially in situations where culture methods could produce false negatives or where rapid results would influence patient outcomes.
© 2016 Elsevier B.V. All rights reserved

  1. Introduction
    Rapid identification of pathogenic organisms is a critical need in clinical settings. For patients with infections, such as septicemia, survival decreases hourly. Thus, the current standard of care is to administer broad-spectrum antibiotics until the infection is identified and then switching to more specific treatments (Faria et al., 2015; Perez et al., 2013). While this is an effective course of action for antibiotic susceptible bacteria, it is inadequate for antibiotic-resistant infections. In addition, this treatment regimen prolongs hospital stays and fosters antibiotic resistance. One study estimated that septicemia patients generated greater than US $40,000 per patient in direct hospital costs due to prolonged length of stay, and on average, early directed interventions reduced hospital costs to US $19,547 (Perez et al., 2013). Therefore, rapid pathogen identification would decrease both patient mortality and healthcare costs. Currently, the gold standard for identifying bacteria is culturing
    methods. However, this method requires several days for positive identification of rapid-growing bacteria and even longer for fastidious or slow-growing organisms (Didelot et al., 2012). The positive predictive value of blood cultures can be constrained between 30% to N95% even when performed correctly (Afshari et al., 2012). Given the timeconsuming nature and degree of variability inherent to culture methods, developing molecular methods for pathogen identification would be highly beneficial. One of the most consistent molecular methods for identifying bacterial species is through next-generation sequencing (NGS) of the 16S rDNA gene sequencing. The 16S gene sequence organization is highly conserved in bacteria, which allows investigators to use universal primers to amplify the gene for downstream studies. The 16S gene, consisting of nine variable regions, may be effectively used to identify the bacterium via sequencing. By specifically targeting the variable regions more meaningful and enriched sequencing results per sample is obtained, thus decreasing the sequencing effort required for significant results and increasing the amount of samples that may be analyzed per instrument run (Claesson et al., 2010a). Previous efforts using NGS to target and identify organisms by variable region sequencing have not yielded reliable species-level identification due to target selection, technical limitations, and analysis methods (Junemann et al., 2012). As technology has advanced, NGS is becoming more accessible in a clinical setting. Manufacturers have produced several relatively inexpensive benchtop sequencing models that have a low cost per sample, making NGS a feasible option for many hospitals. However, current methods require highly experienced personnel to both run the instrumentation and bioinformaticians to properly process the data, since there is currently no industry standard method for handling this information (Gullapalli et al., 2012). Successful implementation of NGS in a clinical setting requires a standardized data analysis pipeline that integrates across numerous NGS platforms (Table S1) which can automatically process and interpret the data and present the results in an easy-toread format. Given the industry requirements for a reproducible, automated data analysis pipeline for clinical NGS data, we developed the Rapid Infectious Disease Identification (RIDI™) system. The informatics pipeline was designed to become part of an NGS-based method for rapid clinical identification of bacteria. We strategically designed the software to meet a number of criteria for clinical implementation and experimentally challenged the system to determine its suitability. Overall, we developed an analysis pipeline that may be integrated into a clinical laboratory setting that is capable of highly accurate automated identification at the genus-level, acceptable species-level identification, and yields easy-to-read, actionable reports for use by clinicians.