From: Clustering biological sequences with dynamic sequence similarity threshold
Dataset | No. of sequences | Sequence length | ||
---|---|---|---|---|
Mean (standard deviation) | Min | Max | ||
AMR genes | 4027 | 939.93 (± 381.98) | 162 | 4359 |
AMR proteins | 3891 | 312.53 (± 127.90) | 53 | 1452 |
Plasmid nucleotides | 5005 | 1010.38 (± 1 008.45) | 77 | 9511 |
Viral nucleotides | 478,652 | 717.09 (± 837.21) | 13 | 9993 |
Long viral nucleotides | 676 | 14,803.87 (± 12 048.56) | 10,002 | 262,388 |
Viral amino acids | 469,835 | 242.64 (± 313.29) | 9 | 13,556 |