一、數(shù)據(jù)類型和向量
1. 數(shù)據(jù)類型
1.1 判斷數(shù)據(jù)類型class()
1.2 按Tab鍵自動補全1.3 數(shù)據(jù)類型的判斷和轉換
(1)is 族函數(shù),判斷,返回值為TRUE或FALSE
is.numeric("123")
is.character("a")
is.logical(TRUE)
(2)as族函數(shù)實現(xiàn)數(shù)據(jù)類型之間轉換
as.matrix()
as.numeric()
as.character()
as.logical()
2. 向量
(1)有重復的用rep(),有規(guī)律的用seq(),隨機數(shù)用rnorm
rep("sample",6)
[1] "sample" "sample" "sample" "sample" "sample" "sample"
seq(4,30,3)
[1] 4 7 10 13 16 19 22 25 28
rnorm(3)
[1] 0.1511196 1.1105814 -0.8626667
(2)組合
paste0(rep("x",3),1:3) # 或 paste0("x",1:3)
[1] "x1" "x2" "x3"
paste0("sample",seq(1,5,2))
[1] "sample1" "sample3" "sample5"
paste() 和paste0()區(qū)別:(1)paste()中的sep=將兩個或多個向量字符串分別對應連接
paste(v1,v1,sep = " ")
paste0()與paste()的區(qū)別是無法設定sep,默認=“”每空格。
paste("x",1:3,sep = "~")
[1] "x~1" "x~2" "x~3"
(2)兩個向量的操作
重點:
x %in% y # x的每個元素在y中嗎
x[x %in% y] #注意x,y順序
x == y # x和對應位置的y相等嗎
x = c(1,5,3,4)
y = c(5,12,24,3)
intersect(x,y)
[1] 5 3
union(x,y)
[1] 1 5 3 4 12 24
setdiff(x,y)
[1] 1 4
setdiff(y,x)
[1] 12 24
- 當兩個向量長度不一致
循環(huán)補齊
![](http://image109.360doc.com/DownloadImg/2022/04/1800/243316230_1_20220418121535663_wm.png)
- match(x,y)
x[match(y,x)]
match : 誰在外面,誰就在后面,以y為模板,給x調整順序
x = c("A","B","C","D","E")
y = c("E","C","B","A")
match(y,x)
x[match(y,x)]
二、數(shù)據(jù)框、矩陣和列表
1.區(qū)別
(1)Vector向量——一維;matrix矩陣——二維,只允許一種數(shù)據(jù)類型;data.frame數(shù)據(jù)框——二維,每列只允許一種數(shù)據(jù)類型
2.練習題
(1)#求c1第一列數(shù)值的中位數(shù) #篩選c1中,最后一列值為a或c的行
c1 <- read.csv("./exercise.csv")
median(c1$Petal.Length) # 求c1第一列數(shù)值的中位數(shù)
# 或者median(c1[,1])
c1[c1$Species %in% c("c","a"),] # 篩選c1中,最后一列值為a或c的行
# 或者c1[c1$Species == "a"| c1$Species == "c",]
錯誤形式如下:
c1[c1$Species == c("c","a"),] # 一長一短,無法比較,他們發(fā)生了循環(huán)補齊
(2)修改行名和列名
#改行名和列名
rownames(df) <- c("r1","r2","r3","r4")
#只修改某一行/列的名
colnames(df)[2]="CHANGE"
(3)兩個數(shù)據(jù)框的連接
merge(test1,test2,by=“name”)
merge(test1,test3,by.x = “name”,by.y = “NAME”)
(4) 練習
1.統(tǒng)計內(nèi)置數(shù)據(jù)iris最后一列有哪幾個取值,每個取值重復了多少次
2.提取內(nèi)置數(shù)據(jù)iris的前5行,前4列,并轉換為矩陣,賦值給a。3.將a的行名改為flower1,flower2…flower5。
table(iris[,ncol(iris)])
a = as.data.frame(iris[1:5,1:4])
rownames(a) = paste0("flowers",1:5) # 或者
rownames(a) = paste0("flowers",1:nrow(a))
(5) match() 函數(shù)的使用![](http://image109.360doc.com/DownloadImg/2022/04/1800/243316230_2_20220418121536147_wm.png)
## 以y為模板,對X的順序進行排序,然后選擇x的id列給y的列名:match()函數(shù)
# match(colnames(y),x$file_name)
# x[match(colnames(y),x$file_name),]
# x$ID[match(colnames(y),x$file_name)]
colnames(y) = x$ID[match(colnames(y),x$file_name)]
三、幾種加載包的方法
# 方法一:
install.packages("tidyr")
install.packages('BiocManager')
# 方法二:
BiocManager::install("ggplot2")
# 方法三:
devtools::install_github("jmzeng1314/idmap1") #括號里寫作者用戶名加包名
# 方法四:
if(!require(stringr))install.packages("stringr")
鏡像源推薦:
# 清華鏡像
# http://mirrors.tuna./CRAN/
# http://mirrors.tuna./bioconductor/
# 中科大鏡像
# http://mirrors.ustc.edu.cn/CRAN/
# http://mirrors.ustc.edu.cn/bioc/
R語言中的符號
[ ] :向量,數(shù)據(jù)框,矩陣取子集 [[ ]]:列表取子集
四、讀取,寫入數(shù)據(jù)
txt 和csv
read.csv():一般讀取csv格式 read.table():一般讀取txt格式
ex1 <- read.table("./ex1.txt",
header = T)
ex2 <- read.csv("./ex2.csv",
row.names = 1) # 第一列為行名
soft <- read.table("./soft.txt",
sep = "\t", # 以...分隔
fill = TRUE, # 空格自動填充
header = TRUE
)
write.table(ex1,file = “./ex1.txt”) write.csv(ex2,file = “./ex2.csv”)
Rdata
save() --- > 保存
load() --- > 加載
save(ex1,file = "./ex1.Rdata")
load("./ex1.Rdata")
讀入數(shù)據(jù),ID轉換
案例:![](http://image109.360doc.com/DownloadImg/2022/04/1800/243316230_3_20220418121536523_wm.png)
soft <- read.csv("./soft.csv",row.names = 1)
head(soft)
exp$symbol <- soft$GeneName[match(rownames(exp),soft$ID)]
exp <- exp[!duplicated(exp$symbol),]
exp <- exp[!grepl("^ENST",exp$symbol),]
rownames(exp) <- exp$symbol
exp = exp[,-ncol(exp)]
五. 畫圖
(1)繪圖
(1)作圖:ggplot2、ggpubr、base
(2)拼圖:patchwork包、par里的mfrow、grid.arrange、cowplot
(3)導出:
#圖片的保存和導出
# 1. ggplot2系列
ggsave(p,filename = "")
# 2.通用:三段論
# 保存的格式及文件名
pdf("test.pdf")
dev.off() # 關閉畫板
(2)ggplot2語法
- 屬性設置
映射:根據(jù)數(shù)據(jù)的某一列的內(nèi)容分配顏色
手動設置:把圖形設置為一個或N個顏色,與數(shù)據(jù)類型無關
![](http://image109.360doc.com/DownloadImg/2022/04/1800/243316230_4_20220418121536835_wm.png)
![](http://image109.360doc.com/DownloadImg/2022/04/1800/243316230_5_20220418121537413_wm.png)
#1.入門級繪圖模板:作圖數(shù)據(jù),橫縱坐標
ggplot(data = iris)+
geom_point(mapping = aes(x = Sepal.Length,
y = Petal.Length))
#2.屬性設置(顏色、大小、透明度、點的形狀,線型等)
ggplot(data = iris) +
geom_point(mapping = aes(x = Sepal.Length, y = Petal.Length),
size = 5, # 點的大小5mm
alpha = 0.5, # 透明度 50%
shape = 8) # 點的形狀
## 指定映射的具體顏色?
ggplot(data = iris)+
geom_point(mapping = aes(x = Sepal.Length,
y = Petal.Length,
color = Species))+
scale_color_manual(values = c("blue","grey","red"))
## 區(qū)分color和fill兩個屬性
### 1 空心形狀和實心形狀都用color設置顏色
ggplot(data = iris)+
geom_point(mapping = aes(x = Sepal.Length,
y = Petal.Length,
color = Species),
shape = 17) #17號,實心的例子
ggplot(data = iris)+
geom_point(mapping = aes(x = Sepal.Length,
y = Petal.Length,
color = Species),
shape = 2) #2號,空心的例子
### 2 既有邊框又有內(nèi)心的,才需要color和fill兩個參數(shù)
ggplot(data = iris)+
geom_point(mapping = aes(x = Sepal.Length,
y = Petal.Length,
color = Species),
shape = 24,
fill = "black") #24號,雙色的例子
#3.分面
ggplot(data = iris) +
geom_point(mapping = aes(x = Sepal.Length, y = Petal.Length)) +
facet_wrap(~ Species)
#雙分面
dat = iris
dat$Group = sample(letters[1:5],150,replace = T)
ggplot(data = dat) +
geom_point(mapping = aes(x = Sepal.Length, y = Petal.Length)) +
facet_grid(Group ~ Species)
#4.幾何對象
#局部設置和全局設置
ggplot(data = iris) +
geom_smooth(mapping = aes(x = Sepal.Length,
y = Petal.Length))+
geom_point(mapping = aes(x = Sepal.Length,
y = Petal.Length))
ggplot(data = iris,mapping = aes(x = Sepal.Length, y = Petal.Length))+
geom_smooth()+
geom_point()
#5.統(tǒng)計變換使用場景
#5.1.不統(tǒng)計,數(shù)據(jù)直接做圖
fre = as.data.frame(table(diamonds$cut))
fre
ggplot(data = fre) +
geom_bar(mapping = aes(x = Var1, y = Freq), stat = "identity")
#5.2count改為prop
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, y = ..prop.., group = 1))
#6.位置關系
# 6.1抖動的點圖
ggplot(data = iris,mapping = aes(x = Species,
y = Sepal.Width,
fill = Species)) +
geom_boxplot()+
geom_point()
ggplot(data = iris,mapping = aes(x = Species,
y = Sepal.Width,
fill = Species)) +
geom_boxplot()+
geom_jitter()
# 6.2堆疊直方圖
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut,fill=clarity))
# 6.3 并列直方圖
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, fill = clarity), position = "dodge")
#7.坐標系
#翻轉coord_flip()
ggplot(data = mpg, mapping = aes(x = class, y = hwy)) +
geom_boxplot() +
coord_flip()
#極坐標系coord_polar()
bar <- ggplot(data = diamonds) +
geom_bar(
mapping = aes(x = cut, fill = cut),
width = 1
) +
theme(aspect.ratio = 1) +
labs(x = NULL, y = NULL)
bar
bar + coord_flip()
bar + coord_polar()
# 練習題:小提琴圖+箱線圖
ggplot(iris,mapping = aes(x = Sepal.Width,y = Species)) +
geom_violin(aes(fill = Species)) +
geom_boxplot()+
geom_jitter(aes(shape = Species))
單分面![](http://image109.360doc.com/DownloadImg/2022/04/1800/243316230_6_20220418121537882_wm.png)
雙分面![](http://image109.360doc.com/DownloadImg/2022/04/1800/243316230_7_202204181215387_wm.png)
統(tǒng)計變換![](http://image109.360doc.com/DownloadImg/2022/04/1800/243316230_8_20220418121538304_wm.png)
堆疊直方圖![](http://image109.360doc.com/DownloadImg/2022/04/1800/243316230_9_20220418121538491_wm.png)
并列直方圖
**
小提琴+箱線圖![](http://pubimage.360doc.com/wz/default.gif)
(3)ggpubr.R語法
# sthda上有大量ggpubr出的圖
library(ggpubr)
ggscatter(iris,x="Sepal.Length",
y="Petal.Length",
color="Species")
p <- ggboxplot(iris, x = "Species",
y = "Sepal.Length",
color = "Species",
shape = "Species",
add = "jitter")
p
my_comparisons <- list( c("setosa", "versicolor"),
c("setosa", "virginica"),
c("versicolor", "virginica") )
p + stat_compare_means(comparisons = my_comparisons)+ # Add pairwise comparisons p-value
stat_compare_means(label.y = 9)
(4) 圖片的保存
#圖片的保存和導出
#1. ggplot2系列
ggsave(p,filename = “”)
#2.通用:三段論
保存的格式及文件名
pdf(“test.pdf”)
…
…
dev.off() # 結束![](http://pubimage.360doc.com/wz/default.gif)
(5)拼圖
# patchwork包
p1.1 <- violin_plot(dat = dat,gene = dat$CCL5)
p1.2 <- violin_plot(dat = dat,gene = dat$MMP9)
p1.4 <- violin_plot(dat = dat,gene = dat$RAC2)
p1.5 <- violin_plot(dat = dat,gene = dat$CORO1A)
p1.6 <- violin_plot(dat = dat,gene = dat$CCL2)
library(patchwork)
p1 <- (p1.1 | p1.2 ) / # 分成兩行
(p1.4 | p1.5 | p1.6)
library(ggplot2)
ggsave("./vertify/GSE100927_vertify.pdf", plot = p1, width = 15, height = 18)
1234567891011
六、專題
1.數(shù)據(jù)框的排序
- order 或者 tidyverse中的arrange()函數(shù)
# order 可以給向量排序,也可以給數(shù)據(jù)框排序
sort(test$Sepal.Length)
test$Sepal.Length[order(test$Sepal.Length)]
test[order(test$Sepal.Length),]
test[order(test$Sepal.Length,decreasing = T),]
# arrange,更加靈活的排序
library(tidyverse) # 需要加載這個包
arrange(test, Sepal.Length)
arrange(test, desc(Sepal.Length))
arrange(test, desc(Sepal.Width),Sepal.Length) # 先按照Sepal.Width排序,如果Sepal.Width列相同,再按照Sepal.Length列排序
dplyr包中的mutate、select、filter、rename
mutate():新增列,rename():重命名列名
select():篩選列;filter():篩選行
管道符號:%>%:ctrl + shift +m
2.表達矩陣畫箱線圖
如下圖,根據(jù)這樣的表達矩陣,畫出這個圖,如果不變換表,是無法成功的
將長表變成短表,變化操作如下
library(tidyr)
library(tibble)
library(dplyr)
dat = t(exp) %>% as.data.frame() %>% rownames_to_column() %>%
mutate(group = group_list)
![](http://pubimage.360doc.com/wz/default.gif)
![](http://pubimage.360doc.com/wz/default.gif)
![](http://pubimage.360doc.com/wz/default.gif)
3. 連接
library(dplyr)
inner_join(test1,test2,by="name")
inner_join(test1,test2,by=c("name" = "Name")
right_join(test1,test2,by="name")
full_join(test1,test2,by="name")
semi_join(test1,test2,by="name")
anti_join(test1,test2,by="name")
merge():函數(shù)![](http://pubimage.360doc.com/wz/default.gif)
4. 字符串函數(shù):加載stringr包
![](http://pubimage.360doc.com/wz/default.gif)
![](http://pubimage.360doc.com/wz/default.gif)
![](http://pubimage.360doc.com/wz/default.gif)
![](http://pubimage.360doc.com/wz/default.gif)
![](http://pubimage.360doc.com/wz/default.gif)
x <- "The birch canoe slid on the smooth planks."
x
###1.檢測字符串長度
str_length(x)
length(x)
###2.字符串拆分
str_split(x," ")
x2 = str_split(x," ")[[1]];x2
y = c("jimmy 150","nicker 140","tony 152")
str_split(y," ")
str_split(y," ",simplify = T)
###3.按位置提取字符串
str_sub(x,5,9)
###4.字符檢測
str_detect(x2,"h")
###5.字符串替換
str_replace(x2,"o","A")
str_replace_all(x2,"o","A")
###6.字符刪除
str_remove(x," ")
str_remove_all(x," ")