Annotation
Bakta
All assembled ATB sample genomes were annotated using Bakta
v1.9.4 using
its full database
v5.1. Bakta was run in a
Conda environment using its public
Bioconda
package.
Result files are available on OSF in the Bakta
component. However, due to the huge amount of
raw annotation data (>35 TB), a couple of measures have been taken to
handle this. First, only JSON files are provided since all Bakta output
files can be restored from those (see below). Second, sample result
files are packed in taxonomic batches, just like assembled FASTA files
to achieve better compression ratios. Third, since there are OSF data
size limits of 50 GB and 5 GB for compartments and files, respectively,
all taxonomy batches were further distributed to several compartments.
Furthermore, some taxonomy batches had to be split into separate files
to meet the 5 GB file limit. By doing so, all annotation data could be
reduced to a total of ~1.3 TB and ~0.3 TB for r0.2 and
incr_release.202408, respectively.
For each release, there is a status file in a File_Lists folder, e.g.
atb.bakta.r0.2.status.tsv.gz providing
the following information:
sample= the INSDC sample accessionstatus= the status of the Bakta run (PASS,FAIL)file_name= the name of the Bakta JSON result file, e.g.SAMN38372697.bakta.jsonfile_md5= MD5 sum offile_nametar_xz= the name of the tar.xz file where this sample’s JSON lives following a fix schema: atb.analysis.release.batch.tar.xz, e.g.atb.bakta.r0.2.batch.1.tar.xztar_xz_md5= MD5 sum oftar_xztar_xz_size_MB= size of thetar_xzfile in MB
Example `SAMN38372697`:
sample SAMN38372697
status PASS
file_name SAMN38372697.bakta.json
file_md5 008d86ad046e0d152b8cc22d7452be24
tar_xz atb.bakta.incr_release.202408.batch.29.tar.xz
tar_xz_md5 7da90ac7650de2c2e0b821569ae2a602
tar_xz_size_MB 1201.0
To restore all output files for a given sample from its JSON file, use the following command:
bakta_io --output <output-path> --prefix <file-prefix> sample.json
For any questions regarding the Bakta genome annotation, please contact Oliver Schwengers.