Nikeeta

advertisement
Nikeeta Nurddin Surani
nns25@njit.edu
Machine learning protein sequence alignment
This project is done on protein data obtained from Homstrad Database.
The following steps will be then taken on the data:
1) Extract and Obtain Homstrad alignments.
2 )Make pairwise alignments from the extracted data.
3) Compute alignments to obtain the true alignment score using Clustalw, Muscle,
Mafft, Probcons and Probalign.
4) Split the data into 2 equal halves as training & validation set.
4) Predict q score:
Convert each alignment into feature vector and train a support vector with SVM
Light(Regression Model).2) train a support vector regress with svm-light (do
regression)
5) Compute the co relation coefficient b/w predicted & true q score.
Output:
The output will show us the ranking of the programs used according to their
accuracy.
Download