主页 > 手机  > 

【流行病学】Melodi-Presto因果关联工具

【流行病学】Melodi-Presto因果关联工具

title: “[流行病学] Melodi Presto因果关联工具” date: 2022-12-08 lastmod: 2022-12-08 draft: false tags: [“流行病学”,“因果关联工具”] toc: true autoCollapseToc: true 阅读介绍

Melodi-Presto: A fast and agile tool to explore semantic triples derived from biomedical literature1

triples: subject–predicate–object triple

SemMedDB 大型开放式知识库

使用入口

🚩在线工具 Web Application

API

Jupyter Notebooks

git 下载到json在提取

curl -X POST ' melodi-presto.mrcieu.ac.uk/api/overlap/' -H 'accept: application/json' -H 'Content-Type: application/json' -d '{ "x": [ "diabetes " ], "y": [ "coronary heart disease" ]}' > 1.json 使用示例 X: KRAS Y: lung cancer

输入的专业术语应该在Mesh先确定???

文章复现

doi: 10.1093/ije/dyab2032

{{< note >}} 1. 部分内容已经改变 2. Object的挑选精确到chronic 3. Predicate的挑选先无限制 4. Subject的挑选去掉了CRP,但是论文有纳入 5. OR的计算已经去掉? 6. gtf基因和[Uniprot蛋白名库]( .uniprot.org/uniprotkb?facets=model_organism%3A9606&query=reviewed%3Atrue)删掉 7. +药物库? {{< /note >}} library(openxlsx) # read df <- read.xlsx("chronic kidney disease.xlsx", sheet = 1, colNames=TRUE, check.names=FALSE ) str(df$Pval) df$Pval <- as.numeric(df$Pval) # P value < 0.005 df <- subset(df,df$Pval < 0.005 ) # removed triples where the subject was a gene or protein df$Subject <- tolower(df$Subject) a=stringr::str_which(df$Subject, pattern = "gene|protein|receptor") # [waring:delete the CRP in the paper] df$Subject[a] df <- df[-a,] # where the term “CAUSES” implies causality, # the term “ASSOCIATED_WITH” implies association, # and the term “COEXISTS_WITH” implies co-existence. table(df$Predicate) df <- subset(df,df$Predicate=="CAUSES"| df$Predicate=="ASSOCIATED_WITH"| df$Predicate=="COEXISTS_WITH") # restricted to triples # where the object contained either “kidney” or “renal” table(df$Object) dplyr::count(df,forcats::fct_lump_n(Object,n=10)) # df$Object <- tolower(df$Object) b=stringr::str_which(df$Object, pattern = "kidney|renal") df$Object[b] df <- df[b,] # removed2 df$Subject c=stringr::str_which(df$Subject, pattern = "\\|") df$Subject[c] df <- df[-c,] # df$Subject c=stringr::str_which(df$Subject, pattern = "factor") df$Subject[c] df <- df[-c,] # df$Subject c=stringr::str_which(df$Subject, pattern = "peptide") df$Subject[c] df <- df[-c,] # retained only unique risk factors (subjects) # to avoid duplicates df <- dplyr::arrange(df,desc(Count),Pval) df <- df[!duplicated(df$Subject),] table(df$Count) # df <- subset(df,df$Count>2) write.xlsx(df, file = "筛选4.xlsx", colNames = TRUE) # enrichment odds ratio # (a) count the number of these triples # (b) the number of total triples matched to the query # (c) the total number of these triples in the data base , # (d) and the total number of triples in the database . # stats.fisher_exact([[a, b-a], [c, d-c]]) library(openxlsx) # read df <- read.xlsx("chronic kidney disease.xlsx", sheet = 1, colNames=TRUE, check.names=FALSE ) str(df$Pval) df$Pval <- as.numeric(df$Pval) # P value < 0.005 df <- subset(df,df$Pval < 0.005 ) # removed triples where the subject was a gene or protein df$Subject <- tolower(df$Subject) a=stringr::str_which(df$Subject, pattern = "gene|protein|receptor") # [waring:delete the CRP in the paper] df$Subject[a] df <- df[-a,] # where the term “CAUSES” implies causality, # the term “ASSOCIATED_WITH” implies association, # and the term “COEXISTS_WITH” implies co-existence. table(df$Predicate) df <- subset(df,df$Predicate=="CAUSES"| df$Predicate=="ASSOCIATED_WITH"| df$Predicate=="COEXISTS_WITH") # restricted to triples # where the object contained either “kidney” or “renal” table(df$Object) dplyr::count(df,forcats::fct_lump_n(Object,n=10)) # df$Object <- tolower(df$Object) b=stringr::str_which(df$Object, pattern = "kidney|renal") df$Object[b] df <- df[b,] # removed2 df$Subject c=stringr::str_which(df$Subject, pattern = "\\|") df$Subject[c] df <- df[-c,] # df$Subject c=stringr::str_which(df$Subject, pattern = "factor") df$Subject[c] df <- df[-c,] # df$Subject c=stringr::str_which(df$Subject, pattern = "peptide") df$Subject[c] df <- df[-c,] # retained only unique risk factors (subjects) # to avoid duplicates df <- dplyr::arrange(df,desc(Count),Pval) df <- df[!duplicated(df$Subject),] table(df$Count) # df <- subset(df,df$Count>2) write.xlsx(df, file = "筛选4.xlsx", colNames = TRUE) # enrichment odds ratio # (a) count the number of these triples # (b) the number of total triples matched to the query # (c) the total number of these triples in the data base , # (d) and the total number of triples in the database . # stats.fisher_exact([[a, b-a], [c, d-c]]) NHANES

注意事项, 参考文章复现


doi: 10.1093/bioinformatics/btaa726 ↩︎

Trans-ethnic Mendelian-randomization study reveals causal relationships between cardiometabolic factors and chronic kidney disease ↩︎

标签:

【流行病学】Melodi-Presto因果关联工具由讯客互联手机栏目发布,感谢您对讯客互联的认可,以及对我们原创作品以及文章的青睐,非常欢迎各位朋友分享到个人网站或者朋友圈,但转载请说明文章出处“【流行病学】Melodi-Presto因果关联工具