Documentation

Overview

ViroNEXT is an AI-assisted, fully automated bioinformatics pipeline for unbiased viral detection in next-generation sequencing data. It combines reference-based classification for known viruses with machine-learning-based prediction to detect highly divergent or previously uncharacterized viral candidates.

Publication Information

Manuscript title
ViroNEXT: AI-assisted viral detection in metagenomic data with stringent control of false positive assignments
Short title
to be added
Authors
to be added
Correspondence
to be added
Keywords
to be added
DOI
to be added

Submitting Data

Upload Illumina short-read sequencing data in compressed FASTQ format (.fastq.gz). Single-end and paired-end reads are accepted.

File names must follow the Illumina-style naming pattern: <sample_name>_<R1/R2>_001.fastq.gz

For paired-end data, upload both files for each sample, for example sampleA_R1_001.fastq.gz and sampleA_R2_001.fastq.gz.

A run can contain either single-end reads or paired-end reads, but not a mixture of both. Submit mixed data types as separate jobs.

  1. Open the home page.
  2. Drag files into the upload area, or click the upload area to choose files.
  3. Enter an email address.
  4. Click Upload data.

Keep the generated job URL. You need it to follow progress and access results.

There are currently no explicit restrictions on maximum upload size, number of uploaded files or read length.

How the Pipeline Works

ViroNEXT uses two complementary detection branches:

  • Reference-based classification: detects known viruses through alignment, taxonomic re-classification, coverage filtering, consensus generation, and assignment.
  • Machine-learning-based classification: evaluates assembled contigs for divergent or novel viral candidates.

Job Status

Jobs move through these states:

  • Queued: the job is waiting for the worker.
  • Running: the analysis pipeline is active.
  • Archiving: downloadable results are being prepared.
  • Finished: the analysis completed successfully.
  • Failed: the analysis stopped before completion.
  • Expired: the retention period ended and stored files were removed.

Results and Reports

Results are summarized in a single interactive HTML report. The report includes detected viruses, classification and filtering results, genome coverage statistics, high-quality read assignment proportions, abundance visualizations, coverage profiles, and quality-control summaries.

Highly divergent viral candidates are reported separately with contig length, VirHunter statistics, mapped read counts, and, when available, DIAMOND-based taxonomic classification. Candidate contig sequences can be copied for downstream analysis such as manual BLAST searches, phylogeny, or wet-lab validation.

When a job has finished, the status page lists downloadable result files. If a ZIP archive is ready, all files can be downloaded together. Otherwise, individual result files can be downloaded from the same page.

Retention

Results are available for 7 days after the job finishes. After that, job data is deleted and the job URL returns an expired response.

Citation

Please cite Braun et al. Nature Methods, xxxx in any publication that makes use of analyses inspired by ViroNEXT.

Final citation and BibTeX entry: to be added.

Downloads

The pipeline uses reference databases and machine-learning models for viral detection. These databases are available for download for local execution or further analysis.

Reference Databases
Machine Learning Models

Get Help

In case of questions, contact Paul-Ehrlich Institute at bioinformatics [at] pei [dot] de.

Source Code

The project is available on GitHub: link to be added.

Local execution is possible using the GitHub repository. The local workflow uses the same pipeline and can be run outside the web interface after installation of the required environments.

License, release version, container or conda environment, and reproducible example dataset: to be added.

Limitations

ViroNEXT is designed for research use and unbiased viral detection. Results should be interpreted in the context of sequencing quality, sample preparation, reference database content, and independent validation where appropriate.

Detailed limitations, benchmark dataset links, expected false-positive and false-negative behavior, and unsupported input types: to be added.