Analyze Data with Biopython

The Essential Tool for Biological Sequence Analysis

What is Biopython?

Biopython is a collection of free tools and libraries in Python, created for computational biology and bioinformatics. The project is led by an international community of developers and volunteers. It enables work with biological sequences (DNA, RNA, proteins), handling files in various formats (e.g., FASTA, GenBank), and integration with external databases and tools, such as NCBI or BLAST.

Within this project, Biopython was used for a comprehensive analysis of the physicochemical properties of AMPs. For each sequence, a range of key parameters was calculated, allowing for a deeper understanding of their structure, function, and potential therapeutic applications.

Analyzed Physicochemical Properties

For each AMP sequence, the following properties were determined using the Biopython library and custom algorithms.

Parameters Calculated Using Biopython

Parameter Description and Significance in the Context of AMPs
MassThe molecular weight of the peptide (Da). It is directly related to its length and amino acid composition.
Isoelectric Point (pI)The pH value at which the net charge of the peptide is zero. In the context of AMPs, a high (basic) pI is desirable as it facilitates interaction with negatively charged bacterial membranes through electrostatic attraction.
ChargeThe total electrical charge of the molecule at physiological pH (~7.4). This is a key parameter for AMPs. A high positive charge (typically from +2 to +9) determines selectivity towards negatively charged pathogen surfaces, which is the first step in their mechanism of action.
Hydrophobicity (GRAVY)The Grand Average of Hydropathicity. An appropriate level of hydrophobicity is essential for penetrating and destabilizing the hydrophobic core of the lipid membrane. However, excessively high hydrophobicity can lead to non-selective toxicity and peptide aggregation.
Secondary Structures (helix/sheet)The percentage of amino acids forming α-helix or β-sheet structures. The amphipathic α-helix is the most common structural motif among AMPs. β-sheet structures, often stabilized by disulfide bridges, are also crucial for the function of many AMP families.

Parameters Calculated Using Custom Algorithms

The following parameters were calculated using proprietary, simplified models. Their main purpose is to enable rapid filtering, sorting, and grouping of peptides in the database based on their key, predicted characteristics.

Parameter Description and Significance in the Context of AMPs Calculation Method
LengthThe number of amino acids in the sequence. Most AMPs are short peptides, typically 10 to 50 residues, which affects their synthesis cost and pharmacokinetic properties.Counting the total number of amino acids in the sequence.
Hydrophobic Moment (hmoment)A measure of amphipathicity, i.e., the spatial separation of hydrophobic and hydrophilic residues. A high hydrophobic moment is a fundamental feature of many AMPs, as it allows their hydrophobic part to interact with the lipid membrane, leading to its destabilization or pore formation.Modeling the amphipathic nature of a helix by analyzing the spatial distribution of hydrophobicity. The algorithm identifies the region in the sequence with the greatest separation of hydrophobic and hydrophilic features.
Disulfide BridgesCovalent bonds between cysteine residues. In peptides, these bridges significantly increase conformational stability and resistance to degradation. Many naturally occurring AMPs, like defensins, contain disulfide bridges crucial for their activity.Counting all cysteine residues in the sequence and calculating how many pairs (and thus, bridges) they can potentially form.
Proteolytic Stability The predicted resistance to digestion by enzymes (proteases). This is a critical pharmacokinetic parameter. Low stability means a short half-life of the peptide in the body, limiting its therapeutic potential. Estimating resistance to digestive enzymes. Formula:
100 - 5 * (nR + nK)
where nR and nK are the number of arginine and lysine residues. The model assumes that lysine (K) and arginine (R) are the main sites susceptible to cleavage, so their presence reduces the predicted peptide stability.
CPP Potential The likelihood that a peptide is a Cell-Penetrating Peptide (CPP). Some AMPs have CPP properties, allowing them to interfere with intracellular targets (e.g., DNA synthase), expanding their mechanism of action beyond simple membrane destabilization. Assessing the ability to penetrate cell membranes. Formula:
((nR + nK) / N) * 200 - 10 * GRAVY
where N is the total length of the sequence. The score is based on a high content of cationic amino acids (K, R), which facilitate membrane interaction, while also considering overall hydrophobicity.
Toxicity The predicted toxicity to eukaryotic cells, most often measured as hemolytic activity (destruction of red blood cells). This is the most important safety parameter—an ideal AMP must be highly selective and non-toxic to host cells. Predicting potential toxicity to host cells. Formula:
((nL + nF) / N) * 200
where nL and nF are the number of leucine and phenylalanine residues, and N is the sequence length. The model assumes that a high content of hydrophobic residues increases the risk of non-selective membrane damage.
Amino Acid CompositionThe percentage content of each type of amino acid. This composition directly determines the properties of an AMP: an increased content of cationic amino acids (Lysine and Arginine) is responsible for the peptide's positive charge, while the presence of hydrophobic residues (Leucine and Valine) promotes its interaction with the cell membrane.Calculating the percentage of each amino acid type in the entire sequence, allowing for a quick assessment of its fundamental chemical composition.

Why Biopython?

Key advantages that make Biopython a standard in bioinformatic analyses.

Versatility

Supports dozens of file formats, from simple FASTA to complex GenBank annotations, facilitating data integration.

Community & Documentation

Actively developed by a global community. It has extensive documentation, tutorials, and user support.

Integration

Allows for easy execution and parsing of results from popular tools like BLAST, ClustalW, and many others.

Bibliography

Cock, P. J. A., Antao, T., Chang, J. T., Chapman, B. A., Cox, C. J., Dalke, A., ... & De Hoon, M. J. L. (2009). Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics, 25(11), 1422–1423. https://doi.org/10.1093/bioinformatics/btp163

Kang, S. J., Kim, D. H., Mishig-Ochir, T., & Lee, B. J. (2012). Antimicrobial peptides: Their physicochemical properties and therapeutic application. Archives of Pharmacal Research, 35(3), 409–413. https://doi.org/10.1007/s12272-012-0310-4

Olsen, J. V., Ong, S.-E., & Mann, M. (2004). Trypsin Cleaves Exclusively C-terminal to Arginine and Lysine Residues. Molecular & Cellular Proteomics, 3(6), 608–614. https://doi.org/10.1074/mcp.T400003-MCP200

Zhao, X., et al. (2021). An Antibacterial Peptide with High Resistance to Trypsin Obtained by Substituting d-Amino Acids for Trypsin Cleavage Sites. Antibiotics, 10(12), 1465. https://doi.org/10.3390/antibiotics10121465

Madani, F., et al. (2011). Mechanisms of Cellular Uptake of Cell-Penetrating Peptides. Journal of Biophysics, 2011, 414729. https://doi.org/10.1155/2011/414729

Bechara, C., & Sagan, S. (2013). Cell-penetrating peptides: 20 years later, where do we stand?. FEBS Letters, 587(12), 1693–1702. https://doi.org/10.1016/j.febslet.2013.04.031

Role of Peptide Hydrophobicity in the Mechanism of Action of a-Helical Antimicrobial Peptides. Antimicrobial Agents and Chemotherapy. https://journals.asm.org/doi/full/10.1128/aac.00925-06

Tossi, A., Sandri, L., & Giangaspero, A. (2000). Amphipathic, a-helical antimicrobial peptides. Peptide Science, 55(1), 4–30. https://doi.org/10.1002/1097-0282(2000)55:1<4::AID-BIP30>3.0.CO;2-M

Almeida, J. R., et al. (2021). Lessons from a Single Amino Acid Substitution: Anticancer and Antibacterial Properties of Two Phospholipase A2-Derived Peptides. Current Issues in Molecular Biology, 44(1), 46–62. https://doi.org/10.3390/cimb44010004