JalView and Emboss. Pairwise sequence alignment

Entry name	Length	Protein name	Organism	Phylum
DNAK_HALWD	641	Chaperone protein DnaK	Haloquadratum walsbyi (strain DSM 16790 / HBSQ001)	Archaea
HSP74_DROME	641	Major heat shock 70 kDa protein Bbb	Drosophila melanogaster (Fruit fly)	Eukaryota

Command	Aligned Length	Indels	Gaps	% (Indels)	Identity	% (Identity)	Similarity	%(Similarity)
needle	680	10/7	39/30	5.7/4.4	641/336	94.2/49.4	0/109	0/16.0
water	638	10/6	39/11	6.1/1.7	599/332	93.9/52.0	0/108	0/16.9

Table 3. Default data for needle and water EMBOSS commands

	Needle	Water
Matrix (for protein sequences)	EBLOSUM62	EBLOSUM62
Gap open penalty	10.0 (endopen)	10.0 (gapopen)
Gap length penalty	0.5 (endextend)	0.5 (gapextend)
Gap end penalty	- (endweight)	-

Differences^[2]

Global Sequence Alignment (Needle)	Local Sequence Alignment (Water)
In global alignment, an attempt is made to align the entire sequence (that's why there are more gaps).	Finds local regions with the highest level of similarity between the two sequences (that's why the start of sequences is cut off).
A global alignment contains all letters from both the query and target sequences	A local alignment aligns a substring of the query sequence to a substring of the target sequence.
If two sequences have approximately the same length and are quite similar, they are suitable for global alignment (more suitable for aligning two homological sequences).	Any two sequences can be locally aligned as local alignment finds stretches of sequences with high level of matches without considering the alignment of rest of the sequence regions (suitable for aligning more divergent sequences or distantly related sequences).
A global alignment technique is the Needleman–Wunsch algorithm.	A local alignment method is Smith–Waterman algorithm.

Task 2

Used programs: EMBOSS, Jalview
EMBOSS commands: needle, water
Steps: Input Alignment - From File - file.fasta - Colour - ClustalX - Above Identity Threshold (100%)

Table 4. Water sequence alignment of non-homological sequences (proteins with different functions were taken). All 5 sequences were aligned with my protein, BAC73082.1.

Protein	Aligned Length	Gaps	Indels	% (Indels)	Identity	% (Identity)	Similarity	%(Similarity)
AKC28878.1	32	0/0	0/0	0/0	32/8	100.0/25.0	0/8	0/25.0

AMM47000.1	39	2/0	5/0	12.8/0	34/16	87.2/41.0	0/7	0/17.9

AFH91095.1	140	4/3	26/30	18.6/21.4	114/53	81.4/37.9	0/18	0/12.9

AMD46139.1	53	2/1	6/1	11.3/1.9	47/19	88.7/35.8	0/10	0/18.9

AJG99379.1	67	1/3	2/21	3.0/3.1	65/20	97.0/29.9	0/7	0/10.4

Differences between homological protein alignment and non-homological protein alignment

The main difference which i was able to spot is the length of alignment. It's way shorter than the homological alignment (seems logical to me).

Task 3

Figure 3. Screenshot of part of the alignment (from top to bottom: TCoffee Multiple - Needle Pairwise - Water Pairwise)

Jalview alignment (link)

Download link

Mismatches

Length - Needle and Water alignments are much shorter and Water alignment has no first part (starts from M residue);
Multiple alignment has more gaps than the others; I suppose there should be gaps as TCoffee aligned more sequences;
GLY: 37 position in TCoffee, 8 position in Needle, 2 position in Water.

I think that the TCoffee Multiple Aligning Service is the best choice. It aligned two sequences more accurately as it 'compared' more sequences so it must has taken all indels into account.

References

Back to term2 page 🚶