Autoregressive Language Models For Estimating the Entropy of Epic EHR Audit Logs

🆕 Check out our JAMIA paper which analyzes cross-entropy as an audit log metric in depth. Updated code here.

This repo contains code for training and evaluating transformer-based tabular language models for Epic EHR audit logs. You can use several of our pretrained models for entropy estimation or tabular generation, or train your own model from scratch.

Want to see audit log generation and cross-entropy calculation in action? Try out our audit-icu-gpt2-25.3M model on Hugging Face!

Installation

Use pip install -r requirements.txt to install the required packages. If updated pipreqs . --savepath requirements.txt --ignore Sophia to update. Use git submodule update --init --recursive to get Sophia for training.

This project uses pre-commit hooks for black if you would like to contribute. To install run pre-commit install.

Pretrained Model Usage

Our pretrained models are available on Hugging Face and are mostly compatible with the transformers library. Here's a full list of the available models:

Architecture	# Params	Repository Name
GPT2	25.3M	audit-icu-gpt2-25_3M
GPT2	46.5M	audit-icu-gpt2-46_5M
GPT2	89.0M	audit-icu-gpt2-131_6M
GPT2	131.6M	audit-icu-gpt2-131_6M
RWKV	65.7M	audit-icu-rwkv-65_7M
RWKV	127.2M	audit-icu-rwkv-127_2M
LLaMA	58.1M	audit-icu-llama-58_1M
LLaMA	112.0M	audit-icu-llama-112_0M
LLaMA	219.8M	audit-icu-llama-219_8M

To use our models for cross-entropy loss, see entropy.py for a broad overview of the setup needed. Since they're built with transformers you can also use these models for generative tasks in nearly the same way as any other language model. See gen.py for an example of how to do this.

Citation

Please cite our paper if you use this code in your own work:

@misc{warner2023autoregressive,
      title={Autoregressive Language Models For Estimating the Entropy of Epic EHR Audit Logs},
      author={Benjamin C. Warner and Thomas Kannampallil and Seunghwan Kim},
      year={2023},
      eprint={2311.06401},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Autoregressive Language Models For Estimating the Entropy of Epic EHR Audit Logs

Installation

Pretrained Model Usage

Citation

Files

README.md

Latest commit

History

README.md

File metadata and controls

Autoregressive Language Models For Estimating the Entropy of Epic EHR Audit Logs

Installation

Pretrained Model Usage

Citation