Part 2: Antigen Selection Pipeline
2.2.5. Essential proteins and virulence factors
Algorithm Card
Input: localized.fasta
Output: virulence.fasta - Proteins from localized.fasta that have adequate matches in both VFDB and DEG
Brief Summary: Filter remaining proteins based on expected virulence and essentiality.
Input: localized.fasta Output: virulence.fasta - Proteins from ‘localized.fasta’ that have adequate matches in both VFDB (Virulence Factor Database) and DEG. Brief Summary: Filter resulting proteins based on their expected virulence.
The last step in the pipeline, which is intended to return a small number of candidates for manual review, is to select only the remaining proteins that are likely to be essential and virulence factors. This can be done through methods similar to the ones used in [Homologous Protein Removal] using the Database of Essential Genes and the Virulence Factor DataBase:
Note that DEG may contain entries where the protein sequence reads “Not available” instead of a valid amino acid sequence. To address that, the script below processes all entries and removes the ones that don’t contain a full sequence, ensuring DIAMOND can read the final file.
The parameters for the matches are once again taken from similar pipelines in the literature. The ‘virulence.fasta’ file has 21 candidate proteins, which need to be manually analyzed.