【流行病学】Melodi-Presto因果关联工具
- 手机
- 2025-09-11 12:30:02

title: “[流行病学] Melodi Presto因果关联工具” date: 2022-12-08 lastmod: 2022-12-08 draft: false tags: [“流行病学”,“因果关联工具”] toc: true autoCollapseToc: true 阅读介绍
Melodi-Presto: A fast and agile tool to explore semantic triples derived from biomedical literature1
triples: subject–predicate–object triple
SemMedDB 大型开放式知识库
使用入口🚩在线工具 Web Application
API
Jupyter Notebooks
git 下载到json在提取
curl -X POST ' melodi-presto.mrcieu.ac.uk/api/overlap/' -H 'accept: application/json' -H 'Content-Type: application/json' -d '{ "x": [ "diabetes " ], "y": [ "coronary heart disease" ]}' > 1.json 使用示例 X: KRAS Y: lung cancer输入的专业术语应该在Mesh先确定???
文章复现doi: 10.1093/ije/dyab2032
{{< note >}} 1. 部分内容已经改变 2. Object的挑选精确到chronic 3. Predicate的挑选先无限制 4. Subject的挑选去掉了CRP,但是论文有纳入 5. OR的计算已经去掉? 6. gtf基因和[Uniprot蛋白名库]( .uniprot.org/uniprotkb?facets=model_organism%3A9606&query=reviewed%3Atrue)删掉 7. +药物库? {{< /note >}} library(openxlsx) # read df <- read.xlsx("chronic kidney disease.xlsx", sheet = 1, colNames=TRUE, check.names=FALSE ) str(df$Pval) df$Pval <- as.numeric(df$Pval) # P value < 0.005 df <- subset(df,df$Pval < 0.005 ) # removed triples where the subject was a gene or protein df$Subject <- tolower(df$Subject) a=stringr::str_which(df$Subject, pattern = "gene|protein|receptor") # [waring:delete the CRP in the paper] df$Subject[a] df <- df[-a,] # where the term “CAUSES” implies causality, # the term “ASSOCIATED_WITH” implies association, # and the term “COEXISTS_WITH” implies co-existence. table(df$Predicate) df <- subset(df,df$Predicate=="CAUSES"| df$Predicate=="ASSOCIATED_WITH"| df$Predicate=="COEXISTS_WITH") # restricted to triples # where the object contained either “kidney” or “renal” table(df$Object) dplyr::count(df,forcats::fct_lump_n(Object,n=10)) # df$Object <- tolower(df$Object) b=stringr::str_which(df$Object, pattern = "kidney|renal") df$Object[b] df <- df[b,] # removed2 df$Subject c=stringr::str_which(df$Subject, pattern = "\\|") df$Subject[c] df <- df[-c,] # df$Subject c=stringr::str_which(df$Subject, pattern = "factor") df$Subject[c] df <- df[-c,] # df$Subject c=stringr::str_which(df$Subject, pattern = "peptide") df$Subject[c] df <- df[-c,] # retained only unique risk factors (subjects) # to avoid duplicates df <- dplyr::arrange(df,desc(Count),Pval) df <- df[!duplicated(df$Subject),] table(df$Count) # df <- subset(df,df$Count>2) write.xlsx(df, file = "筛选4.xlsx", colNames = TRUE) # enrichment odds ratio # (a) count the number of these triples # (b) the number of total triples matched to the query # (c) the total number of these triples in the data base , # (d) and the total number of triples in the database . # stats.fisher_exact([[a, b-a], [c, d-c]]) library(openxlsx) # read df <- read.xlsx("chronic kidney disease.xlsx", sheet = 1, colNames=TRUE, check.names=FALSE ) str(df$Pval) df$Pval <- as.numeric(df$Pval) # P value < 0.005 df <- subset(df,df$Pval < 0.005 ) # removed triples where the subject was a gene or protein df$Subject <- tolower(df$Subject) a=stringr::str_which(df$Subject, pattern = "gene|protein|receptor") # [waring:delete the CRP in the paper] df$Subject[a] df <- df[-a,] # where the term “CAUSES” implies causality, # the term “ASSOCIATED_WITH” implies association, # and the term “COEXISTS_WITH” implies co-existence. table(df$Predicate) df <- subset(df,df$Predicate=="CAUSES"| df$Predicate=="ASSOCIATED_WITH"| df$Predicate=="COEXISTS_WITH") # restricted to triples # where the object contained either “kidney” or “renal” table(df$Object) dplyr::count(df,forcats::fct_lump_n(Object,n=10)) # df$Object <- tolower(df$Object) b=stringr::str_which(df$Object, pattern = "kidney|renal") df$Object[b] df <- df[b,] # removed2 df$Subject c=stringr::str_which(df$Subject, pattern = "\\|") df$Subject[c] df <- df[-c,] # df$Subject c=stringr::str_which(df$Subject, pattern = "factor") df$Subject[c] df <- df[-c,] # df$Subject c=stringr::str_which(df$Subject, pattern = "peptide") df$Subject[c] df <- df[-c,] # retained only unique risk factors (subjects) # to avoid duplicates df <- dplyr::arrange(df,desc(Count),Pval) df <- df[!duplicated(df$Subject),] table(df$Count) # df <- subset(df,df$Count>2) write.xlsx(df, file = "筛选4.xlsx", colNames = TRUE) # enrichment odds ratio # (a) count the number of these triples # (b) the number of total triples matched to the query # (c) the total number of these triples in the data base , # (d) and the total number of triples in the database . # stats.fisher_exact([[a, b-a], [c, d-c]]) NHANES注意事项, 参考文章复现
doi: 10.1093/bioinformatics/btaa726 ↩︎
Trans-ethnic Mendelian-randomization study reveals causal relationships between cardiometabolic factors and chronic kidney disease ↩︎
【流行病学】Melodi-Presto因果关联工具由讯客互联手机栏目发布,感谢您对讯客互联的认可,以及对我们原创作品以及文章的青睐,非常欢迎各位朋友分享到个人网站或者朋友圈,但转载请说明文章出处“【流行病学】Melodi-Presto因果关联工具”