UniProt, UniRef, RefSeq databanks

About

My work was dedicated to learning more about databases structure; finding and extracting all necessary information from any database we work with. The protein from the 1st term was taken in example (Putative globin-like protein belongs to Globins, which are haem-containing proteins, involved in binding and/or transporting oxygen. To learn more about it, visit my term 1 protein page).

Task 1. IDs and general info

There is only one result in UniProtKB for my protein and its unreviewed (TrEMBL). In the table below you can see the information about molecular mass and length of my protein and it's IDs of different services (however, PDB ID hasn't been found in my file). Protein evidence is predicted. RecName is unavailable, that is why i've mentioned only SubName (Submitted Name). From the DR section we can information about heme, oxygen binding and oxygen transport. As the status is unreviewed, more detailed information is unavailable.

Table 1. Main information

RefSeq ID:WP_010986773.1
RecName: none
PDB ID: none
SubName:Putative globin-like protein
UniProt ID:Q82CH9_STRAW
UniProt AC:Q82CH9
Length (aa):134
Molecular mass (Da):15655

Task 2. UniRef protein clusters

Table 2. UniRef50, 90, 100 clusters
Cluster ID:SizeGroups of organismsIdentitySeed
UniRef50_Q82CH9163
  • Streptomyces
  • Nocardiopsis
  • Microterricola
  • Leifsonia
  • Saccharomonospora
  • Microbacterium
  • Allosalinactinospora
  • Actinobacteria
  • Williamsia
  • Arthrobacter
  • Mycobacterium
  • Jiangella
  • Gordonia
  • Frankia
  • Pseudarthrobacter
  • Nocardia
50%Nocardiopsis(id: 2013)
UniRef90_Q82CH957
  • Streptomyces
  • Actinobacteria
90%Streptomyces ambofaciens ATCC 23877
UniRef100_Q82CH916 strains of Streptomyces avermitilis100%Streptomyces avermitilis

Task 3. UniProt search sessions

Table 3. SubName searches

SubName seacrh
Entry: name:putative "globin like" protein
Amount of proteins: 475
Reviewed (SwissProt): 4

SubName seacrh in S. avermitilis
Entry: name:putative "globin like" protein organism:streptomyces avermitilis
Amount of proteins: 1
Reviewed (SwissProt):0

SubName seacrh in family
Entry: name: putative "globin like" protein taxonomy:streptomycetaceae
Amount of proteins: 11
Reviewed (SwissProt): 0

SubName seacrh in phylum
Entry: name:putative "globin like" protein taxonomy:actinobacteria
Amount of proteins: 24
Reviewed (SwissProt): 0

Table 4. Hemoglobin searches

Hemoglobin search
Entry: name:hemoglobin
Amount of proteins: 10,701
Reviewed (SwissProt): 944

Hemoglobin search in Arthropoda
Entry: name:hemoglobin taxonomy:arthropoda
Amount of proteins: 757
Reviewed (SwissProt): 0

Hemoglobin search in Metazoa
Entry: hemoglobin taxonomy:metazoa
Amount of proteins: 3,521
Reviewed (SwissProt): 838

Table 5. Trypsin searches

Trypsin search
Entry: name:trypsin
Amount of proteins: 14,034
Reviewed (SwissProt): 310

Trypsin inhibitors search
Entry: name:trypsin name:inhibitor
Amount of proteins: 2,984
Reviewed (SwissProt): 209

Task 4. UniProt and RefSeq comparison

Table 6.UniProt vs RefSeq

Common Different
  • Length (aa)
  • List of articles
  • Taxonomy
  • Sequence
  • Binding
UniProt: Contains IDs of many other databases
RefSeq: Has filtered re-annontated content for Prokaryotic organisms

Task 5. Protein history

From the history table I've got an information, which is presented below:

Table 7. Main history information

Data of the first release: June 01 2003
Amount of versions: 92
Database: TrEMBL
Entry name:Has changed 2005 from Q82CH9 to Q82CH9_STRAW

There is also a comparing column, where we can look at the differences between 2 entries. Let's have a look at the first one and the last one releases. It can be clearly seen that these version are quite different, many items have been detailed a bit (addition of other databases ID, taxonomy updating).

Task 6. Presentation of unusual elements

Using UniProt Help articles/vocabularies[1][2][3] and page search, I came with nothing enormously interesting to write about (my file is pretty short and it hasn't FT section (was mentioned above)).

Links

Seed - the longest sequence of the cluster. Source: The European Bioinformatics Institute


Back to term 2 page 🚶

© Sophia Veselova, 2017.