Academic Year: 2022/2023

In the second semester, I'll teach the machine (deep) learning module of the course Models and Methods for Computational Biophysics at the Physics Department of Cagliari University (Italy). The remaining modules will be taught by Prof. Attilio V. Vargiu. A tentative syllabus for the whole course is reported below:

- Module I: structure and function of proteins -

• Introduction to structure and dynamics of proteins and to their mutual interactions;
• Introduction to protein-protein and protein-ligand binding;
Practical tutorial: play with the VMD software for protein structure and dynamics visualization.

- Module II: machine (deep) learning -

• Introduction to machine and deep learning;
• Unsupervised vs supervised learning: similarities and differences;
• Clustering algorithms (hierarchical and partitioning methods);
• The K-means algorithm: theory and examples.
• Defining the optimal number of clusters with the "Elbow" method;
• Introduction to Principal Component Analysis (PCA);
• Supervised learning: classification vs. regression tasks;
• Building the model: training, validation and test sets;
• The concept of "cost function". K-fold cross validation and overfitting;
• Performance evaluation: the confusion matrix. Sensitivity, specificity, precision and accuracy metrics;
• Support-Vector Machine (SVM) and Random Forest (RF) models;
• Introduction to Neural Networks and Deep Learning;
Practical tutorial (unsupervised learning): clustering protein conformations with the
K-means algorithm;
Practical tutorial (supervised learning): SVM and RF models to predict ligand-protein complex stability.

- Module III: protein-ligand docking -

• Introduction to molecular recognition and molecular docking;
Practical tutorial: run docking calculation with the software "GNINA";
Practical tutorial: analyze and process docking results.


During the first semester, I taught a module named Machine Learning Methods in Computational Biophysics as part of the course of Molecular Modeling of Biological Systems (6 CFU) taught at Cagliari University (Italy) by Prof. Attilio V. Vargiu.
In particular, the course (here the link to the syllabus) is thought to give a broad overview of different techniques routinely used in computational biophysics, such as protein sequence alignment, molecular dynamics simulations and molecular docking. In this context, the machine learning module was thought to give the students an insight into the capabilities of machine learning algorithms when applied to study the biophysical world.

The syllabus of the Machine Learning module is reported below.

• Introduction to big data, artificial intelligence, machine and deep learning;
• Unsupervised vs supervised learning: similarities and differences;
• The concept of "feature";
• Unsupervised learning: clustering algorithms (hierarchical and partitioning methods);
• The K-means algorithm: theory and examples. Defining the optimal number of clusters with the "Elbow" and "Silhouette" methods. Inizialization techniques (random, Forgy, Kmeans++);
• Introduction to the "Natural language processing" (NLP);
• The Bag-of-Words (BoW) model. K-mers and N-grams;
• Supervised learning: classification vs. regression tasks;
• Building the model: training, validation and test sets;
• The concept of "cost function". K-fold cross validation and overfitting;
• Performance evaluation: the confusion matrix. Sensitivity, specificity, precision and accuracy metrics;
• K-Nearest Neighbors (KNN) and Support-Vector Machines (SVM) models (theory and examples);
• Introduction to python libraries for ML applications (with worked examples);
Practical tutorial (unsupervised learning): clustering protein conformations with the K-means algorithm;
Practical tutorial (supervised learning): predicting protein families from DNA sequences of different organisms.