Documentation
Overview
ViroNEXT is an AI-assisted, fully automated bioinformatics pipeline for unbiased viral detection in next-generation sequencing data. It combines reference-based classification for known viruses with machine-learning-based prediction to detect highly divergent or previously uncharacterized viral candidates.
Publication Information
- Manuscript title
- ViroNEXT: AI-assisted viral detection in metagenomic data with stringent control of false positive assignments
- Short title
- to be added
- Authors
- to be added
- Correspondence
- to be added
- Keywords
- to be added
- DOI
- to be added
Submitting Data
Upload Illumina short-read sequencing data in compressed FASTQ format
(.fastq.gz). Single-end and paired-end reads are accepted.
File names must follow the Illumina-style naming pattern:
<sample_name>_<R1/R2>_001.fastq.gz
For paired-end data, upload both files for each sample, for example
sampleA_R1_001.fastq.gz and
sampleA_R2_001.fastq.gz.
A run can contain either single-end reads or paired-end reads, but not a mixture of both. Submit mixed data types as separate jobs.
- Open the home page.
- Drag files into the upload area, or click the upload area to choose files.
- Enter an email address.
- Click Upload data.
Keep the generated job URL. You need it to follow progress and access results.
There are currently no explicit restrictions on maximum upload size, number of uploaded files or read length.
How the Pipeline Works
ViroNEXT uses two complementary detection branches:
- Reference-based classification: detects known viruses through alignment, taxonomic re-classification, coverage filtering, consensus generation, and assignment.
- Machine-learning-based classification: evaluates assembled contigs for divergent or novel viral candidates.
Job Status
Jobs move through these states:
- Queued: the job is waiting for the worker.
- Running: the analysis pipeline is active.
- Archiving: downloadable results are being prepared.
- Finished: the analysis completed successfully.
- Failed: the analysis stopped before completion.
- Expired: the retention period ended and stored files were removed.
Results and Reports
Results are summarized in a single interactive HTML report. The report includes detected viruses, classification and filtering results, genome coverage statistics, high-quality read assignment proportions, abundance visualizations, coverage profiles, and quality-control summaries.
Highly divergent viral candidates are reported separately with contig length, VirHunter statistics, mapped read counts, and, when available, DIAMOND-based taxonomic classification. Candidate contig sequences can be copied for downstream analysis such as manual BLAST searches, phylogeny, or wet-lab validation.
When a job has finished, the status page lists downloadable result files. If a ZIP archive is ready, all files can be downloaded together. Otherwise, individual result files can be downloaded from the same page.
Retention
Results are available for 7 days after the job finishes. After that, job data is deleted and the job URL returns an expired response.
Citation
Please cite Braun et al. Nature Methods, xxxx in any publication that makes use of analyses inspired by ViroNEXT.
Final citation and BibTeX entry: to be added.
Downloads
The pipeline uses reference databases and machine-learning models for viral detection. These databases are available for download for local execution or further analysis.
Reference Databases
Machine Learning Models
Get Help
In case of questions, contact Paul-Ehrlich Institute at bioinformatics [at] pei [dot] de.
Source Code
The project is available on GitHub: link to be added.
Local execution is possible using the GitHub repository. The local workflow uses the same pipeline and can be run outside the web interface after installation of the required environments.
License, release version, container or conda environment, and reproducible example dataset: to be added.
Limitations
ViroNEXT is designed for research use and unbiased viral detection. Results should be interpreted in the context of sequencing quality, sample preparation, reference database content, and independent validation where appropriate.
Detailed limitations, benchmark dataset links, expected false-positive and false-negative behavior, and unsupported input types: to be added.
Research Use Only
This software is intended only for research purposes and should not be used for clinical, diagnostic, or treatment decisions.