The Wayback Machine - https://web.archive.org/web/20230508212027/https://github.com/ncbi/BioCPT
Skip to content

ncbi/BioCPT

main
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
May 7, 2023 20:11
May 7, 2023 20:11
May 5, 2023 18:57
May 7, 2023 20:10

BioCPT: Contrastive Pre-trained Transformers with Large-scale PubMed Search Logs for Zero-shot Biomedical Information Retrieval

Overview

image

BioCPT is a first-of-its-kind Contrastive Pre-trained Transformer model trained with an unprecedented scale of PubMed search logs for zero-shot biomedical information retrieval. BioCPT consists of:

  • A frist-stage dense retriever (BioCPT retriever)
    • Contains a query encoder (QEnc) and an article encoder (DEnc), both initialized by PubMedBERT.
    • Trained by 255M query-article pairs from PubMed search logs and in-batch negatives.
  • A second-stage re-ranker (BioCPT re-ranker)
    • A transformer cross-encoder (CrossEnc) initialized by PubMedBERT.
    • Trained by 18M semantic query-article pairs and localized negatives from the pre-trained BioCPT retriever.

Content

This directory contains:

Data availability

Due to privacy concerns, we are not able to release the PubMed user logs. As a surrogate, we provide the question-article pair data from BioASQ in this repo as example training datasets. You can convert your data to the example data formats and train the BioCPT model.

Acknowledgments

This work was supported by the Intramural Research Programs of the National Institutes of Health, National Library of Medicine.

Disclaimer

This tool shows the results of research conducted in the Computational Biology Branch, NCBI/NLM. The information produced on this website is not intended for direct diagnostic use or medical decision-making without review and oversight by a clinical professional. Individuals should not change their health behavior solely on the basis of information produced on this website. NIH does not independently verify the validity or utility of the information produced by this tool. If you have questions about the information produced on this website, please see a health care professional. More information about NCBI's disclaimer policy is available.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published