通過(guò)查閱文章找到marker gene后手動(dòng)注釋分群,如何檢驗(yàn)分群正確與否? 這個(gè)圖是通過(guò)Seurat標(biāo)準(zhǔn)流程降維后得到的muscle組織分群圖 然后查閱文章得到muscle組織不同細(xì)胞的markergene genes_VEC = c("Fabp4", "Cdh5", "Cav1") #vascular endothelial cells genes_FC = c("Ddr2","Tcf21", "Col3a1", "Col1a2", "Col1a1") #fibroblasts genes_AC = c("Nppa", "Myl7", "Sln") #"Nppa", "Myl7" not found #atrial cardiomyocytes genes_EC = c("Npr3", "Pecam1") #endocardial cells genes_IC = c("C1qa", "H2-Eb1") #immune cells genes_SMSC= c("Myf5", "Myod1","Sox9","Acta2","Chodl") #skeletal muscle satellite cell genes_MSC= c("Pdgfra","Chad") #mesenchymal stem cell genes_endothelialC=c("Atxn1") genes_TmB = c('Ptprc') genes_macrophage=c("Itgam") genes_T=c("Cd3g","Cd4") genes_B=c("Cd19") genes_others=c("Acan") genes_all = c(genes_VEC, genes_FC, genes_AC,genes_EC,genes_IC,genes_SMSC,genes_MSC,genes_endothelialC,genes_TmB,genes_macrophage,genes_T,genes_B,genes_others) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 以下三幅圖片都是想表達(dá)不同的markergene的表達(dá)情況: FeaturePlot(tiss, genes_all, pt.size = 1,ncol = 4) 1 DotPlot(tiss, features = genes_all) 1 VlnPlot(tiss, genes_all, ncol = 4) 1 有了這三幅圖,我們可以對(duì)這些muscle細(xì)胞進(jìn)行一個(gè)手動(dòng)的注釋?zhuān)玫较旅娴募?xì)胞分群圖: #annotaion tiss@meta.data$free_annotation <- plyr::mapvalues(from = c(1,2,5,7,8,10,11,3,6,9,0,4), to = c("vascular endothelial cells", rep("fibroblasts",6), rep("immune cells", 3),rep("skeletal muscle satellite cells", 2)), x = tiss@meta.data$seurat_clusters) 1 2 3 4 TSNEPlot(object = tiss, group.by = "free_annotation") 1 但是這個(gè)分群圖正確與否應(yīng)該怎么判斷呢? 檢驗(yàn)分群正確與否方法 Findmarkers 函數(shù) singleR Seruat4.0 一、Findmarkers 函數(shù) 首先需要加載所需的R包和數(shù)據(jù)集 library(Seurat) tiss<- #數(shù)據(jù)集加載 1 2 以下為自己數(shù)據(jù)示例 levels(tiss) markers_df <- FindMarkers(object = tiss, ident.1 = 1, min.pct = 0.25)#一號(hào)群 markers_df #Fabp4", "Cdh5", "Cav1"文章中提到的 print(x = head(markers_df)) markers_genes = rownames(head(x = markers_df, n = 5)) VlnPlot(object = tiss, features =markers_genes,log =T ) FeaturePlot(object = tiss, features=markers_genes ) 1 2 3 4 5 6 7 這樣我們可以得到每一個(gè)分群的markergene,可以大概看一下這些markergene與文獻(xiàn)中提到的有沒(méi)有重合的。這個(gè)只能當(dāng)作一種檢驗(yàn)方法吧。 二、SingleR singleR自帶7個(gè)數(shù)據(jù)庫(kù)文件,需要聯(lián)網(wǎng)才能下載,其中5個(gè)是人類(lèi)數(shù)據(jù),2個(gè)是小鼠的數(shù)據(jù): BlueprintEncodeData Labels HumanPrimaryCellAtlasData Labels DatabaseImmuneCellExpressionData Labels NovershternHematopoieticData Labels MonacoImmuneData Labels ImmGenData Labels MouseRNAseqData Labels 本文以MouseRNAseqData 數(shù)據(jù)集為例: 首先加載參考數(shù)據(jù)集 # load ref library("SingleR") cg <- MouseRNAseqData() cg 1 2 3 4 以下為加載測(cè)試數(shù)據(jù)集的過(guò)程: #load test library("openxlsx") library("ggplot2") library("Matrix") mypwd <- "/mnt/raid64/Mouse_iso_atlas/analysis/SingleCell/NGS_ONT_analysis/01.Seurat/" library("Seurat", lib.loc = "/home/zhangdan/R/x86_64-pc-linux-gnu-library/4.0") source(paste0(mypwd, "00_data_ingest/02_tissue_analysis_rmd/boilerplate.R")) tissue_of_interest = 'muscle' 1 2 3 4 5 6 7 8 process_tissue = function(tiss, scale){ tiss <- NormalizeData(object = tiss, scale.factor = scale) tiss <- FindVariableFeatures(object = tiss, do.plot = TRUE, x.high.cutoff = Inf, y.cutoff = 0.5) tiss <- ScaleData(object = tiss) tiss <- RunPCA(object = tiss, do.print = FALSE) } load_tissue_singleron = function(tissue_of_interest){ singleron_metadata_filename = paste0(mypwd, "00_data_ingest/00_singleron_raw_data/metadata_singleron.csv") singleron_metadata <- read.csv(singleron_metadata_filename, sep=",", header = TRUE) tissue_metadata <- singleron_metadata[singleron_metadata$Tissue == tissue_of_interest, ] subfolder = tissue_metadata$Sample raw.data <- Read10X(data.dir = paste0(mypwd, "00_data_ingest/00_singleron_raw_data/singleron/", subfolder[1])) # Create the Seurat object with all the data tiss <- CreateSeuratObject(counts = raw.data, project = tissue_of_interest) tiss[["percent.mt"]] <- PercentageFeatureSet(tiss, pattern = "^mt-") tiss[["percent.ribo"]] <- PercentageFeatureSet(tiss, pattern = "^Rp[sl][[:digit:]]") tiss@meta.data[,'free_annotation'] <- NA tiss[["Tissue"]] <- tissue_metadata[1,]$Tissue tiss[["Sex"]] <- tissue_metadata[1,]$Sex tiss[["Strain"]] <- tissue_metadata[1,]$Strain tiss[["mouse.id"]] <- tissue_metadata[1,]$mouse.id VlnPlot(tiss, features = c("nFeature_RNA", "nCount_RNA", "percent.mt","percent.ribo"), ncol = 2) #tiss <- subset(tiss, subset = nFeature_RNA > 200 & nFeature_RNA < 6000 & percent.mt < 5) tiss <- process_tissue(tiss, 1e4) return(tiss) } 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 tiss = load_tissue_singleron(tissue_of_interest) 1 測(cè)試數(shù)據(jù)集通過(guò)SingleR比對(duì)到參考數(shù)據(jù)集上: data <- tiss@assays$RNA@data pred.hesc <- SingleR(test = data, ref = cg, assay.type.test=1, labels = cg$label.main) pred.hesc 1 2 3 4 #比對(duì)結(jié)果,會(huì)顯示有哪些細(xì)胞及其細(xì)胞數(shù)量 # Summarizing the distribution: table(pred.hesc$labels) 1 2 plotScoreHeatmap(pred.hesc) 1 tiss@meta.data$lables <- pred.hesc$labels DimPlot(tiss, group.by="lables", label = T) 1 2 三、Seurat4.0 還在學(xué)習(xí)中 后續(xù)補(bǔ)上 ———————————————— 版權(quán)聲明:本文為CSDN博主「微光**」的原創(chuàng)文章,遵循CC 4.0 BY-SA版權(quán)協(xié)議,轉(zhuǎn)載請(qǐng)附上原文出處鏈接及本聲明。 原文鏈接:https://blog.csdn.net/zengwanqin/article/details/114655570 |
|
來(lái)自: 生活就是流水賬 > 《單細(xì)胞測(cè)序》