HOMCOS:download data (PDB:20200701)
- You can download datasets of current HOMCOS database(PDB:20200701).
- Each protein chain in HOMCOS has three cluster ID numbers:"cluster95", "cluster40", and "clusterE4. These clustering data are used for statistial analyses on non-redundant data, and fast database searchings.
- These clustering numbers are based on the clustering using the sequence search "blastp". The single linkage clustering algorithm has been used. Following three linking conditions are used: (1) E-value of "blastp" has to be less than 0.001. (2) The number of aligned residues are more than 80 % of the longer sequence length. (3) The sequence identity is more than equal to 95 % for "cluster95", or 40 % for "cluster40" and 0 % for "clusterE4".
- Following four criteria are used for choosing one representative chain from a cluster. 1) Experimental method has a following priority:"X-RAY DIFFRACTION" > "SOLUTION NMR" > "ELECTRON MICROSCOPY" > others. 2) Better resolution. 3) Young [pdb_id] in alphabetical order. 4) Young [asym_id] in alphabetical order.
Comments and Questions to :
Go to HOMCOS top page