The Computer Scientist's Guide to Designing mRNA Vaccines

Part 2: Antigen Selection Pipeline

2.2.5. Essential proteins and virulence factors

Algorithm Card

Input: localized.fasta

Output: virulence.fasta - Proteins from localized.fasta that have adequate matches in both VFDB and DEG

Brief Summary: Filter remaining proteins based on expected virulence and essentiality.

Input: localized.fasta Output: virulence.fasta - Proteins from ‘localized.fasta’ that have adequate matches in both VFDB (Virulence Factor Database) and DEG. Brief Summary: Filter resulting proteins based on their expected virulence.

The last step in the pipeline, which is intended to return a small number of candidates for manual review, is to select only the remaining proteins that are likely to be essential and virulence factors. This can be done through methods similar to the ones used in [Homologous Protein Removal] using the Database of Essential Genes and the Virulence Factor DataBase:

Note that DEG may contain entries where the protein sequence reads “Not available” instead of a valid amino acid sequence. To address that, the script below processes all entries and removes the ones that don’t contain a full sequence, ensuring DIAMOND can read the final file.

Loading terminal recording…

The parameters for the matches are once again taken from similar pipelines in the literature. The ‘virulence.fasta’ file has 21 candidate proteins, which need to be manually analyzed.