Domains. Pfam.Task 1Domain choiseIn database Pfam I chose a family of domains: Class II histocompatibility antigen, beta domain (MHC_II_beta). Below you can see 3D structure of human MHC class II molecule (beta chain) in complex with staphylococcal enterotoxin I (SEI) (fig. 1). Figure 1. Structure of Staphylococcal Enterotoxin I (SEI) in Complex with a Human MHC class II Molecule. Jalview project of all sequences aligned: MHC_II_beta_align_all.jar. Architectures choiseThere are 492 sequences with the following architecture: MHC_II_beta, C1-set There are 87 sequences with the following architecture: MHC_II_beta Using script I receive information about all sequences belonging to Pfam family. python swisspfam-to-xls.py -i /srv/databases/pfam/swisspfam.gz -z -p PF00969 -o PF00969.xls Summary tableUsing ВПР (LibreOffice) I formed a summary table where rows are AC-s and columns are Pfam domains. You can see it here: PF00969+taxonomy.ods. Taxons choiseFirstly I saved a list of AC-s from PF00969.xls to ac.txt. Then I get the whole sequences of these proteins by Uniprot-Retrieve searching and then using script I receive information about taxonomy of saved MHCIIb sequences. Then data was added to the previous table. python uniprot_to_taxonomy.py -i uniprot-yourlist.txt taxonomy_only.xls Analyzed data I found out that all sequences belong to taxon Euteleostomi (Eukaryota, Metazoa, Chordata, Craniata, Vertebrata). This taxon include two enough numerous subtaxons: Archelosauria (56) and Mammalia (387). Representatives choiseI chose 17 represetatives of due in each subtoxons and 14 represetatives of uno per subtaxon. Their AC-s were saved as ac_chosen.txt. Final table The resulted table is here: PF00969+taxonomy.ods. Only the selected sequences were left in the basic alignment (MHC_II_beta_align_all.jar) by script (you can see command below). python filter-alignment.py -i MHC_II_beta_align_all.fasta -m ac_chosen.txt -o chosen_align.fasta -a "_" Sequences in reseived 'filtered' alignment were renamed by principle: architecture_subtaxon_AC. Reference designations:
Then they were sorted by ID. Empty columns, N-end and C-end sections were deleted. Bad aligned sequences were deleted too. Different architectures were separately grouped and colored by ClustalX by conservation (50). Final project you can see follow this link - chosen_align.jar and in figure 4. ![]() Figure 4. Final alignment of chosen representatives. Task 2Phylogenetic tree of the domainTo bild a tree I used Neighbor-Joining, Bootstrap 100 (MEGA). The phylogenetic tree of MHC_II_beta domain you can see below. Trees for 'due' (fig. 5) and 'uno' (fig. 6) were builded separately. Bracket tree formulae:
Conclusion:
© Darya Potanina, 2017 |