BMC Bioinformatics

Table 1 Details of the benchmark datasets used for evaluation

From: Clustering biological sequences with dynamic sequence similarity threshold

Dataset	No. of sequences	Sequence length
Dataset	No. of sequences	Mean (standard deviation)	Min	Max
AMR genes	4027	939.93 (± 381.98)	162	4359
AMR proteins	3891	312.53 (± 127.90)	53	1452
Plasmid nucleotides	5005	1010.38 (± 1 008.45)	77	9511
Viral nucleotides	478,652	717.09 (± 837.21)	13	9993
Long viral nucleotides	676	14,803.87 (± 12 048.56)	10,002	262,388
Viral amino acids	469,835	242.64 (± 313.29)	9	13,556

Back to article page

ISSN: 1471-2105

Contact us

General enquiries: journalsubmissions@springernature.com