Positive,和solr similarityy的区别

Identity, Positive, 和Similarity的区别 -
请使用支持脚本的浏览器!
Identity, Positive, 和Similarity的区别
identity和similarity有什么区别,发现自己对这几个概念也不甚了了,于是做了点功课,如下。
第一反应 去查了
IdentityThe extent to which two (nucleotide or amino acid) sequences are invariant.SimilarityThe extent to which nucleotide or protein sequences are related. The extent&of similarity between
identity和similarity有什么区别,发现自己对这几个概念也不甚了了,于是做了点功课,如下。
第一反应 去查了
IdentityThe extent to which two (nucleotide or amino acid) sequences are invariant.SimilarityThe extent to which nucleotide or protein sequences are related. The extent&of similarity between two sequences can be based on percent sequence identity&and/or conservation. In BLAST similarity refers to a positive matrix score.
但是BLAST的output里头没有similarity这一项,奇怪。
&&&& &sp|P05120|PAI2_HUMAN PLASMINOGEN ACTIVATOR INHIBITOR-2, PLACENTAL (PAI-2)&&&&&&&&&&&&&&&& (MONOCYTE ARG- SERPIN).&&&&&&&&&&&&&&&& Length = 415&&&&& Score = 176 (80.2 bits), Expect = 1.8e-65, Sum P(4) = 1.8e-65&&&&& Identities = 38/89 (42%), Positives = 50/89 (56%)&&&& Query:&&&& 1 QIKDLLVSSSTDLDTTLVLVNAIYFKGMWKTAFNAEDTREMPFHVTKQ&&&&&&&&&&&&&&&&& +I +LL&& S D DT +VLVNA+YFKG WKT F& +&&&& PF V&&&&&&& Sbjct:&& 180 KIPNLLPEGSVDGDTRMVLVNAVYFKGKWKTPFEKKLNGLYPFRVNSA
然后找到下面这句话
Identities correspond to exact matches and positives are similarities based&on the scoring matrix used. (来自)
可见positivies就是某种修正过的similarities了。结合起来一看就清楚了,
identities-&exact matchespositives-&similarities based the matirx
在比较nucleotide seq时认为ATCG四个碱基出现机会相等,任何两个之间相同就得一分,替换后都得零分,一个非常简单的Substitution Matrix,这个时候identities和similarities(BLAST中就是positives)是相同的,因为用了这个简单的Substitution Matrix后,计算方法两者是一样的。在比较protein seq时Substitution Matrix用的是BLOSUM,相同的氨基酸得分高,相似的氨基酸得分低,不相匹配的的零分,这个时候identities和positives的计算方法是不一样的,所以两者也就不一样了。
至于统计上的similarity和生物学意义上的homology 又不一样了。想到这里又Google下了homology和similarity,嗯,很大一行字,Similarity is NOT equal to Homology,单独做了个网页强调这两个不是一回事,值得好好注意哦。
()又看到有人评论,自己看了一下,Similarity is NOT equal to Homology的网页链接失效了,通过waybackmachine找了回来贴在下面。
Similarity is NOT equal to Homology
IDENTITY - The extent to which two sequences are invariant.
SIMILARITY - The extent to which sequences are related. Similarity makes no statement about descent from a common ancestor. (Convergent versus Divergent evolution.)
HOMOLOGY - Sequence similarity that can be attributed to descent from a common ancestor.
There are Two Types of Homology
ORTHOLOGOUS - Homologous sequences in different species. These sequences usually retain the same function in the two species.
PARALOGOUS - Homologous sequences in the same species that arose by means of gene duplication. Divergence of function is more common between paralogues.
Why is this important? Homology is a matter of opinion, not directly measurable or observable. Similarity is a direct measurement and can be discussed in terms of percentages.
Cell 50(5): 667 (1987)
另外,Score 与bits-Score的区别:
BLAST scores rely on extensive theory. We start by making the following assumptions: The BLAST score is scoring local ungapped alignments. The theory of scoring here is well understood. The database sequences are assumed to be evolutionary unrelated, i.e. independent of one another. The alignment starts at specific positions along query and database record. The score matrix must give, on the average, a negative (a,b) score. Were this not the case, long alignments would tend to have high score independently of whether the segment aligned were related, and the statistical theory would break down.
Figure 5.10: Random walk: The score for a match is +2 and the punishment for a missmatch is -1, As shown,the expectancy for the whole walk is negative. The probability that the Top Score will be larger than X decreases exponentially with x.
When searching a query of length m in a database of total length n one performs m*n random walk experiment, each with exponentially decreasing probability of achieving a score S. Thus, the E-value for score s is: .
and K are constants:
- scaling factor K - correction for dependency and bias of the scoring scheme.
Indeed the E-score is normalized by the length of the query and database: The same alignment would have different E-score if these length are different. Also the E-score is exponential, thus it is instructive to consider a normalization of the E-score into logarithmic scale, called the Bit - score.
The Bit-score B is computed from the E-score E by E=mn2-B. Obviously, the Bit-score is linear in the raw score s: . In contrast to raw scores, that have little meaning without k and , the Bit-score is measured in standard units (see eg. []). Naturally, the meaning of the Bit-score depends on sizes of the query and the database.
Again, as mentioned before one can ask for the P-value (the probability of the observed number of records with a known E-value or lower). Define the random variable Y to be the observed number of pairs achieveing E-value E or better(smaller). Y is distributed Poisson with (E). The Probability of Ye to be r is , and the probability of Ye to be 0 is equivilant to the probability that the (Best E-score & E)=exp (-E). Specifically the chance of finding zero alignments with score &= S is e-E so the probability of finding at least one such alignment is 1-e-E . This is the P-value associated with the score S (see eg. []). Note that this model assumes an I.I.D trial for each database position.翻译几句话,急(追加分)In addition, a positive relationship between similarity and country image is supported by many of the more than 300 articles that examined country-of-origin (COO) issues (Nebenzahl, Jaffe, & Usunier, 2003). While the_百度作业帮
翻译几句话,急(追加分)In addition, a positive relationship between similarity and country image is supported by many of the more than 300 articles that examined country-of-origin (COO) issues (Nebenzahl, Jaffe, & Usunier, 2003). While the
翻译几句话,急(追加分)In addition, a positive relationship between similarity and country image is supported by many of the more than 300 articles that examined country-of-origin (COO) issues (Nebenzahl, Jaffe, & Usunier, 2003). While there are many factors that affect COO images, the political, economic, cultural and social environments have been found to influence the willingness of foreign consumers to purchase that country's products, independently of the products perceived quality. For instance, Wong and Lamb (1983) found Americans were more prepared to buy products from politically democratic countries. Similarly, Watson and Wright (2000) found highly ethnocentric consumers rated products imported from culturally similar country as more favorable.不要网上翻译的
此外,积极的关系,相似性和国家形象的支持,许多的300多条,审查了原产国(首席运营官)的问题( nebenzahl ,谢斐,& usunier ,2003年) .虽然有许多因素会影响首席运营官形象,在政治,经济,文化和社会环境中,已经被发现的影响力的意愿国外消费者购买该国的产品,独立的产品感知质量.举例来说,黄和羊肉( 1983年)发现,美国人更愿意购买产品,从政治上的民主国家.同样,沃森和赖特( 2000 )发现高度种族为消费者评为进口产品,从文化上相似的国家更为有利.序列比对与系统发生分析(第四章)1013_图文_百度文库
两大类热门资源免费畅读
续费一年阅读会员,立省24元!
评价文档:
序列比对与系统发生分析(第四章)1013
上传于||文档简介
&&南​京​师​范​大​学​,​生​物​信​息​学​,​复​习
大小:17.87MB
登录百度文库,专享文档复制特权,财富值每天免费拿!
你可能喜欢Similarity, Congruence, and Proofs Standards:相似性,一致性,和证明标准相..
扫扫二维码,随身浏览文档
手机或平板扫扫即可继续访问
Similarity, Congruence, and Proofs Standards:相似性,一致性,和证明标准
举报该文档为侵权文档。
举报该文档含有违规或不良信息。
反馈该文档无法正常浏览。
举报该文档为重复文档。
推荐理由:
将文档分享至:
分享完整地址
文档地址:
粘贴到BBS或博客
flash地址:
支持嵌入FLASH地址的网站使用
html代码:
&embed src='/DocinViewer-4.swf' width='100%' height='600' type=application/x-shockwave-flash ALLOWFULLSCREEN='true' ALLOWSCRIPTACCESS='always'&&/embed&
450px*300px480px*400px650px*490px
支持嵌入HTML代码的网站使用
您的内容已经提交成功
您所提交的内容需要审核后才能发布,请您等待!
3秒自动关闭窗口}

我要回帖

更多关于 similarity check 的文章

更多推荐

版权声明:文章内容来源于网络,版权归原作者所有,如有侵权请点击这里与我们联系,我们将及时删除。

点击添加站长微信