awesome-list

Awesome EHR Graph AI

A collaborative list of resources for making Electronic Health Records AI-friendly

:book: Table of Contents:

Table of Contents

➤ About The List
➤ Tools
➤ Unified Medical Language System
➤ Natural Language Processing
➤ Graph Databases
➤ Learning
➤ Transformers
➤ Academic Research

About the list

This list is a collaborative effort initiated by the Medical Intelligence Society’s Graph Working Group. In the course of meeting its objective, the group’s members came across useful resources that they wished to share with the community, hence this curated list. Some of the items may appear in multiple categorizations
If you have something that is related to the application of graphs and / or artificial intelligence on electronic health records, please send us a pull request.

Tools

Top

MetaMap - A tool for recognising UMLS concepts in text
Apache cTAKES - clinical Text Analysis Knowledge Extraction System: natural language processing system for extraction of information from electronic medical record clinical free-text
SpaCy - SpaCy models for biomedical text processing: scispaCy is a Python package containing spaCy models for processing biomedical, scientific or clinical text.
SemEHR - A general-purpose semantic search system to surface semantic data from clinical notes for tailored care, trial recruitment, and clinical research
Transformers - State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0
WISER -

a system for training sequence tagging models, particularly neural networks for named entity recognition

WISER (Weak and Indirect Supervision for Entity Recognition), a system for training sequence tagging models, particularly neural networks for named entity recognition (NER) and related tasks. WISER uses weak supervision in the form of rules to train these models, as opposed to hand-labeled training data. </details>
Snorkel - Snorkel is a system for programmatically building and managing training datasets without manual labeling. In Snorkel, users can develop large training datasets in hours or days rather than hand-labeling them over weeks or months.

Unified Medical Language System

Top

MetaMap - A tool for recognising UMLS concepts in text
QuickUMLS - QuickUMLS (Soldaini and Goharian, 2016) is a tool for fast, unsupervised biomedical concept extraction from medical text.
SemRep - In this paper, we present an in-depth description of SemRep, an NLP system that extracts semantic relations from PubMed abstracts using linguistic principles and UMLS domain knowledge.

Natural Language Processing

Top

MetaMap - A tool for recognising UMLS concepts in text
Apache cTAKES - clinical Text Analysis Knowledge Extraction System: natural language processing system for extraction of information from electronic medical record clinical free-text
12 open source tools for natural language learning
MedSpaCy - library of tools for performing clinical NLP and text processing tasks with the popular spaCy framework
Treatment Relation -

a supervised learning system that is able to predict whether or not a treatment relation exists between any two medical concepts

We describe a supervised learning system that is able to predict whether or not a treatment relation exists between any two medical concepts mentioned in clinical notes. Our approach for identifying treatment relations in clinical text is based on (a) the idea of exploring the contextual information in which medical concepts are described and (b) the idea of using predefined medication-indication pairs.</details>
BERT-CRel -

a transformer model for fine-tuning biomedical word embeddings that are jointly learned along with concept embeddings

BERT-CRel is a transformer model for fine-tuning biomedical word embeddings that are jointly learned along with concept embeddings using a pre-training phase with fastText and a fine-tuning phase with a transformer setup. The goal is to provide high quality pre-trained biomedical embeddings that can be used in any downstream task by the research community. This repository contains the code used to implement the BERT-CRel methods and generate the embeddings. The corpus used for BERT-CRel contains biomedical citations from PubMed and the concepts are from the Medical Subject Headings (MeSH codes) terminology used to index citations. </details>
UMLS-Bert -

a contextual embedding model that integrates domain knowledge during the pre-training process

Contextual word embedding models, such as BioBERT and Bio_ClinicalBERT, have achieved state-of-the-art results in biomedical natural language processing tasks by focusing their pre-training process on domain-specific corpora. However, such models do not take into consideration expert domain knowledge. In this work, we introduced UmlsBERT, a contextual embedding model that integrates domain knowledge during the pre-training process via a novel knowledge augmentation strategy.</details>
Publicly Available Clinical BERT Embeddings -

a domain-specific language representation model pre-trained on large-scale biomedical corpora

We introduce BioBERT (Bidirectional Encoder Representations from Transformers for Biomedical Text Mining), which is a domain-specific language representation model pre-trained on large-scale biomedical corpora. With almost the same architecture across tasks, BioBERT largely outperforms BERT and previous state-of-the-art models in a variety of biomedical text mining tasks when pre-trained on biomedical corpora. While BERT obtains performance comparable to that of previous state-of-the-art models, BioBERT significantly outperforms them on the following three representative biomedical text mining tasks: biomedical named entity recognition (0.62% F1 score improvement), biomedical relation extraction (2.80% F1 score improvement) and biomedical question answering (12.24% MRR improvement). Our analysis results show that pre-training BERT on biomedical corpora helps it to understand complex biomedical texts.</details>
Enhanced Clinical BERT Embedding Using Biological Knowledge Base -

a novel training method is introduced for adding knowledge base information from UMLS into language model pre-training

Domain knowledge is important for building Natural Language Processing (NLP) systems forlow-resource settings, such as in the clinical domain. In this paper, a novel joint training method is introduced for adding knowledge base information from the Unified Medical Language Sys-tem (UMLS) into language model pre-training for some clinical domain corpus. We show that in three different downstream clinical NLP tasks, our pre-trained language model outperformsthe corresponding model with no knowledge base information and other state-of-the-art mod-els.</details>
SciFive -

a domain-specific T5 model that has been pre-trained on large biomedical corpora

In this report, we introduce SciFive, a domain-specific T5 model that has been pre-trained on large biomedical corpora. Our model outperforms the current SOTA methods (i.e. BERT, BioBERT, Base T5) on tasks in named entity relation, relation extraction, natural language inference, and question-answering. We show that text-generation methods have significant potential in a broad array of biomedical NLP tasks, particularly those requiring longer, more complex outputs. Our results support the exploration of more difficult text generation tasks and the development of new methods in this area </details>
Survey Paper Comparing Various Models -

a survey paper that can provide a comprehensive survey of various transformer-based biomedical pretrained language models

Transformer-based pretrained language models (PLMs) have started a new era in modern natural language processing (NLP). These models combine the power of transformers, transfer learning, and self-supervised learning (SSL). Following the success of these models in the general domain, the biomedical research community has developed various in-domain PLMs starting from BioBERT to the latest BioMegatron and CoderBERT models. We strongly believe there is a need for a survey paper that can provide a comprehensive survey of various transformer-based biomedical pretrained language models (BPLMs).
</details>
Hybrid Approach to Measure Semantic Relatedness in Biomedical Concepts -

the effectiveness of a hybrid approach based on Sentence BERT model and retrofitting algorithm to compute relatedness between any two biomedical concepts

Objective: This work aimed to demonstrate the effectiveness of a hybrid approach based on Sentence BERT model and retrofitting algorithm to compute relatedness between any two biomedical concepts. Materials and Methods: We generated concept vectors by encoding concept preferred terms using ELMo, BERT, and Sentence BERT models. We used BioELMo and Clinical ELMo. We used Ontology Knowledge Free (OKF) models like PubMedBERT, BioBERT, BioClinicalBERT, and Ontology Knowledge Injected (OKI) models like SapBERT, CoderBERT, KbBERT, and UmlsBERT. We trained all the BERT models using Siamese network on SNLI and STSb datasets to allow the models to learn more semantic information at the phrase or sentence level so that they can represent multi-word concepts better. Finally, to inject ontology relationship knowledge into concept vectors, we used retrofitting algorithm and concepts from various UMLS relationships. </details>

Graph Databases

Top

GraphDB - Semantic Text Similarity for Identifying Related Terms & Documents

Learning

Top

Guide to comprehensive adult H&P
12 open source tools for natural language learning
GraphDB - Semantic Text Similarity for Identifying Related Terms & Documents
NLP + Graph Pipeline - NLP goes hand in hand with graphs: Learn how to set up an NLP pipeline and analyze its results with Neo4j
Tabular Data + Transformers - How to Incorporate Tabular Data with HuggingFace Transformers

Transformers

Top

Academic Research

Top

This site is open source. Improve this page.