nature communications图片复现散点小提琴图+蜜蜂图

对读取的数据进行一些统计分析,并创建一个包含violin图、点分布、平均值线和误差线的ggplot2图表。

假设数据包括两个主要列: "Treatment" 和 "Integrated.density"。"Treatment" 列包含分类变量,"Integrated.density" 列包含一些数值。同时存在 "Biological.replicate" 列用于表示生物复制。我们将生成这样的随机数据。

这是生成模拟数据的R语言代码:

#生成模拟数据
set.seed(123)

Treatment <- rep(c("Treatment1", "Treatment2", "Treatment3"), each = 100)
Integrated.density <- rnorm(length(Treatment), mean = rep(c(100, 200, 150), each = 100), sd = 20)
Biological.replicate <- rep(1:3, times = 100)

A <- data.frame(Treatment, Integrated.density, Biological.replicate)

#打印前几行数据
head(A)

运行这段代码后,你可以看到类似这样的输出:

Treatment Integrated.density Biological.replicate
1 Treatment1 120.57578 1
2 Treatment1 90.63943 2
3 Treatment1 102.73857 3
4 Treatment1 89.12502 1
5 Treatment1 110.98337 2
6 Treatment1 90.02306 3

 

完整代码



# 设定工作目录
setwd("")

# 生成模拟数据
set.seed(123)

Treatment <- rep(c("Treatment1", "Treatment2", "Treatment3"), each = 100)
Integrated.density <- rnorm(length(Treatment), mean = rep(c(100, 200, 150), each = 100), sd = 20)
Biological.replicate <- rep(1:3, times = 100)

A <- data.frame(Treatment, Integrated.density, Biological.replicate)

# 加载所需的库
library(ggplot2)
library(ggbeeswarm)
library(ggpubr)
library(dplyr)

# 转化 'Biological.replicate' 列为因子
A$Biological.replicate <- as.factor(A$Biological.replicate)

# 创建新的数据框B,包含每个处理组的统计数据
# 计算B的数据
B <- A %>% 
  group_by(Treatment) %>% 
  summarise(upper = quantile(Integrated.density, 0.75),
         lower = quantile(Integrated.density, 0.25),
         mean = mean(Integrated.density),
         median = median(Integrated.density),
         sd = sd(Integrated.density))

# 创建图形
p <- ggplot(A, aes(x=Treatment, y=Integrated.density)) +
  geom_violin(width = 0.8, fill = '#EDEDED', color = '#EDEDED') +
  geom_quasirandom(aes(color = Biological.replicate), width = 0.4, size = 2.5, alpha = 0.7) +
  scale_color_manual(name = 'Rep.', values = c('#FFD7A8','#F2A9A9','#BAB099'), labels = c('1','2','3')) +
  theme_bw() +
  labs(x=" ", y='Integrated density', title = "Integrated Density by Treatment") +
  theme(plot.title = element_text(hjust = 0.5),
        axis.title.y = element_text(colour = 'black', size = 16),
        axis.text = element_text(colour = 'black', size = 14),
        axis.line = element_line(linewidth = 1),
        legend.title = element_text(size = 14),
        legend.text = element_text(size = 14)) +
  guides(color=guide_legend(override.aes = list(size=4))) +
  geom_errorbar(data = B, aes(ymin = lower, ymax = upper, y=mean), width = 0.2, size = 0.5) +
  stat_summary(fun = "mean", geom = "crossbar", mapping = aes(ymin = ..y.., ymax = ..y..), width = 0.4, linewidth = 0.3) +
  stat_summary(aes(fill = Biological.replicate), geom = "point", fun = mean, shape = 21, size = 6, stroke = 1.3) +
  scale_fill_manual(values = c('#FFAF51','#E65454','#756233')) +
  guides(fill = guide_legend(title = 'Mean'))

print(p)   # 打印图形

# 保存图形为PDF格式
ggsave(filename = "plot.pdf", plot = p, width = 11, height = 8)

# 保存数据为CSV格式
write.csv(A, "A.csv", row.names = FALSE)
write.csv(B, "B.csv", row.names = FALSE)

以上代码将生成一个包含小提琴图、分布点、平均值线和误差线的ggplot2图形,并保存为PDF文件。同时,它还会将原始数据A和统计数据B保存为CSV文件。

如果想要用自己的数据运行此代码,数据应该是一个包含以下列的CSV文件:'Treatment', 'Integrated.density', 和 'Biological.replicate'。'Treatment' 列包含了处理的类别,'Integrated.density' 列包含了对应的数值,而 'Biological.replicate' 列包含了生物复制的编号。

阅读剩余
THE END