Significance The correct assignment of DNA sequences to their origin is an important task. However, only a fraction of all species are available in today’s databases and thus easily assignable.… Click to show full abstract
Significance The correct assignment of DNA sequences to their origin is an important task. However, only a fraction of all species are available in today’s databases and thus easily assignable. Therefore, we present a method that is particularly good at classifying sequences for which there are no closely related species in databases. For this purpose, we use a deep learning approach to learn, at first, the “language” of DNA to subsequently distinguish the “language” structure of different groups of organisms, for example, bacteria and viruses. Using this approach, we achieve comparable quality to previous methods for sequences with close relatives in the database and superior quality for new species.
               
Click one of the above tabs to view related content.