而且學(xué)生特別的好學(xué),已經(jīng)懂得去搜索我們已有的1.3萬(wàn)篇教程,找到了芯片探針序列重新注釋的流程,但是我昨天就說(shuō)到過(guò):芯片探針序列的基因注釋已經(jīng)無(wú)需你自己親自做了, 肯定是學(xué)員沒(méi)有追我們的公眾號(hào)最新教程,不過(guò)這個(gè)不能怪他。這個(gè)是公眾號(hào)的弊端,太多冗余信息讓大家分心,與我們真正的知識(shí)分享初衷背道而馳了。 所以呢,其實(shí)使用我們的包,安裝方法說(shuō)到過(guò):芯片探針序列的基因注釋已經(jīng)無(wú)需你自己親自做了, ,使用起來(lái)也非常簡(jiǎn)單: library(AnnoProbe) 僅僅是一句話(huà),就拿到了這個(gè)平臺(tái)的探針的注釋信息,如下: 但是呢,我們還是探索一下,因?yàn)檫@個(gè)是下載的GPL的soft文件里面的注釋信息,所以可以看到是有一些探針居然是對(duì)應(yīng)多個(gè)基因,其實(shí)是因?yàn)檫@些基因本身坐標(biāo)就是有overlap,所以呢, 探索的代碼就會(huì)稍微復(fù)雜一點(diǎn)。 ids=ids[nchar(ids[,2])>1,] 可以看到,五萬(wàn)多個(gè)探針里面,真正的蛋白編碼基因的探針只有4萬(wàn),剩余的一萬(wàn)多都是可以進(jìn)行探索的。 但是呢,這個(gè)并不是最佳的選擇,因?yàn)槲覀儾](méi)有對(duì)這個(gè)GPL平臺(tái)的探針的堿基序列進(jìn)行參考基因組比對(duì)后,自己重新注釋?zhuān)€是使用的GPL里面的soft文件的信息。 我們看看其它芯片文獻(xiàn)里面的GPL570探針I(yè)D的基因注釋信息比如Published: 12 March 2019的文章:Identification of Key Long Non-Coding RNAs in the Pathology of Alzheimer’s Disease and their Functions Based on Genome-Wide Associations Study, Microarray, and RNA-seq Data Briefly, we first downloaded the reference sequences of these potentially AD-related lncRNAs in FASTA format from NONCODE database . 或者 Briefly, probe sets of HG-U133_Plus_2.0 array were aligned to the human genome (GRCh38) and lncRNA gene sequence from GENCODE (release 23) using SeqMap tool with no mismatch [49]. 又或者 we obtained 3215 probes (probe sets) covering 2330 lncRNAs for Affymetrix HG-U133_Plus_2.0 array and 855 probes (probe sets) covering 663 lncRNAs for Affymetrix HG-U133A array, respectively. The expression data of multiple probes (probe sets) mapping to the same lncRNA were integrated by using the arithmetic mean to represent the expression level of single lncRNA. 又或者 Briefly, the probe sets of Affymetrix HG‐U133 Plus 2.0 were retrieved from the Affymetrix website (http://www.affymetrix.com). We then re‐mapped those probes to the chromosomal positions of the ncRNAs derived from GENCODE (release 24, GRCh38) with no mismatch 14. A total of 2380 probes and 2118 corresponding ncRNA genes were obtained. When multiple probes mapped to the same ncRNA, we used the arithmetic mean of the probe intensities. 參考文獻(xiàn):
既然每個(gè)文獻(xiàn)都不一樣而且大部分人是沒(méi)辦法自主注釋的,所以我們理論上應(yīng)該是有一個(gè)平臺(tái)代替大家做全部的芯片探針的堿基序列的重新注釋。 我們前面提到的:芯片探針序列的基因注釋已經(jīng)無(wú)需你自己親自做了 里面的AnnoProbe包已經(jīng)在幫大家一個(gè)個(gè)的注釋啦。 敬請(qǐng)期待全部GPL的重新注釋。 |
|
來(lái)自: 健明 > 《待分類(lèi)》