Reach

What is it?

Reach stands for Reading and Assembling Contextual and Holistic Mechanisms from Text. In plain English, Reach is an information extraction system for the biomedical domain, which aims to read scientific literature and extract cancer signaling pathways. Reach implements a fairly complete extraction pipeline, including: recognition of biochemical entities (proteins, chemicals, etc.), grounding them to known knowledge bases such as Uniprot, extraction of BioPAX-like interactions, e.g., phosphorylation, complex assembly, positive/negative regulations, and coreference resolution, for both entities and interactions.

Reach is developed using Odin, our open-domain information extraction framework, which is released within our processors repository.

Please scroll down to the bottom of this page for additional resources, including a Reach output visualizer, REST API, and datasets created with Reach.

Licensing

This project is, and will always be, free for research purposes. However, starting with version 1.2, we are using a license that restricts its use for commercial purposes. Please contact us for details.

Changes

1.6.2 - Update bioresources to 1.1.36 and processors to 8.2.4.
1.6.2 - Added the assembly subproject back. The arizona and cmu formats are supported again.
much more...

Authors

Reach was created by the following members of the CLU lab at the University of Arizona:

Citations

If you use Reach, please cite one of the following papers:

@Article{Escarcega:2018,
  author={Valenzuela-Esc{\'a}rcega, Marco A and Babur, {\"O}zg{\"u}n and Hahn-Powell, Gus and Bell, Dane and Hicks, Thomas and Noriega-Atala, Enrique and Wang, Xia and Surdeanu, Mihai and Demir, Emek and Morrison, Clayton T},
  title={Large-scale Automated Machine Reading Discovers New Cancer Driving Mechanisms},
  journal={Database: The Journal of Biological Databases and Curation},
  url={http://clulab.cs.arizona.edu/papers/escarcega2018.pdf},
  doi={10.1093/database/bay098},
  year={2018},
  publisher={Oxford University Press}
}

@inproceedings{Valenzuela+:2015aa,
  author    = {Valenzuela-Esc\'{a}rcega, Marco A. and Gustave Hahn-Powell and Thomas Hicks and Mihai Surdeanu},
  title     = {A Domain-independent Rule-based Framework for Event Extraction},
  organization = {ACL-IJCNLP 2015},
  booktitle = {Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing: Software Demonstrations (ACL-IJCNLP)},
  url = {http://www.aclweb.org/anthology/P/P15/P15-4022.pdf},
  year      = {2015},
  pages = {127--132},
  Note = {Paper available at \url{http://www.aclweb.org/anthology/P/P15/P15-4022.pdf}},
}

More publications from the Reach project are available here.

Installation

This software requires Java 8 or 11 and Scala 2.11 or 2.12.

The jar is available on Maven Central. To use, simply add the following dependency to your pom.xml:

<dependency>
   <groupId>org.clulab</groupId>
   <artifactId>reach-main_2.12</artifactId>
   <version>1.6.2</version>
</dependency>

The equivalent SBT dependencies are:

libraryDependencies ++= Seq(
    "org.clulab" %% "reach-main" % "1.6.2"
)

How to compile the source code

This is a standard sbt project, so use the usual commands (i.e., sbt compile, sbt assembly, etc.) to compile. Add the generated jar files under target/ to your $CLASSPATH, along with the other necessary dependency jars. Take a look at build.sbt to see which dependencies are necessary at runtime.

Running Reach

Processing a directory of `.nxml` papers

The most common use of Reach is to process a directory containing one or more papers in the proper formats. A Wiki page documents the supported input formats.

Some configuration is necessary before running Reach. Please refer to the Running Reach Wiki page for detailed information on configuring and running Reach.

Pre-processing a directory of `.nxml` papers

Reach supports pre-processing documents and storing intermediate results, containing lemmatization, POS tags, NER tags and dependency parses as serialized files. This allows for use cases where new rules are iteratively developed without having to spend computational resources more than once on parsing and tagging the nxml diles. See pre-processing input formats for the details.

The Interactive Shell

An interactive shell can be run from the command line to process small fragments of entered text. The shell is useful for reviewing and understanding the operation of Reach, including NER, entity, and event processing and rule debugging. To start a Reach shell, run the runReachShell.sh script:

runReachShell.sh

At the shell prompt enter :help to get a list of available commands.

The sieve-based assembly system

Reach now provides a sieve-based system for assembly of event mentions. While still under development, the system currently has support for (1) exact deduplication for both entity and event mentions, (2) unification of mentions through coreference resolution, and (3) the reporting of intra and inter-sentence causal precedence relations (ex. A causally precedes B) using linguistic features, and (4) a feature-based classifier for causal precedence. Future versions will include additional sieves for causal precedence and improved approximate deduplication.

For more details on the sieve-based assembly system, please refer to the following paper:

@inproceedings{GHP+:2016aa,
  author       = {Gus Hahn-Powell and
Dane Bell and
Marco A. Valenzuela-Esc\'{a}rcega and Mihai Surdeanu},
  title        = {This before That: Causal Precedence in the Biomedical Domain},
  booktitle    = {Proceedings of the 2016 Workshop on Biomedical Natural Language Processing},
  organization = {Association for Computational Linguistics}
  year         = {2016}
  Note         = {Paper available at \url{https://arxiv.org/abs/1606.08089}}
}

The sieve-based assembly system can be run over a directory of .nxml and/or .csv files:

sbt "runMain org.clulab.reach.RunReachCLI"

In src/main/resources/application.conf, you will need to...

set outputTypes to ["assembly-tsv"]
set your input directory of papers via papersDir
set your output directory via outDir

Currently, two .tsv files are produced for assembly results within each paper:

results meeting MITRE's (March 2016) requirements
results without MITRE's constraints

Two additional output files are produced for assembly results across all papers:

results meeting MITRE's (March 2016) requirements
results without MITRE's constraints

The interactive Assembly shell

You can run interactively explore assembly output for various snippets of text using the assembly shell:

sbt "runMain org.clulab.assembly.AssemblyShell"

Modifying the code

Reach builds upon our Odin event extraction framework. If you want to modify event and entity grammars, please refer to Odin's Wiki page for details. Please read the included Odin manual for details on the rule language and the Odin API.

Reach web services

We have developed a series of web services on top of the Reach library. All are freely available here.

Reach datasets

We have generated multiple datasets by reading publications from the open-access PubMed subset using Reach. All datasets are freely available here.

Funding

The development of Reach was funded by the DARPA Big Mechanism program under ARO contract W911NF-14-1-0395.

Name	Name	Last commit message	Last commit date
Latest commit kwalcock Aug 14, 2024 3945681 · Aug 14, 2024 History 4,703 Commits
assembly	assembly	Merge pull request #753 from clulab/kwalcock/updateProcessors2	Jun 19, 2021
bioresources	bioresources	Reinstate COVID-19 override curations	Feb 8, 2024
context	context	Uncompressed bioresources kb files	Apr 6, 2021
doc	doc	Added FRIES spec	Apr 26, 2021
docker	docker	Make wget non-verbose	Aug 29, 2021
export	export	Increase a timeout	Jan 8, 2023
main	main	Fix the type of event we look for in the test	Feb 9, 2024
processors	processors	Update processors	Jan 16, 2023
project	project	Merge branch 'master' into myedibleenso/assembly	Dec 28, 2020
src	src	Update ReachCLI.scala	Nov 18, 2022
.gitattributes	.gitattributes	Add .gitattributes	Apr 11, 2020
.gitignore	.gitignore	Ignore more files	Jan 16, 2023
.sbtopts	.sbtopts	debug	May 16, 2020
.travis.yml	.travis.yml	Revert more to 2.12.8	Oct 16, 2020
CHANGES.md	CHANGES.md	Update processors	Feb 16, 2021
CITATION.cff	CITATION.cff	Add citations (#767 )	Feb 24, 2022
LICENSE.pdf	LICENSE.pdf	Added UA license	Nov 8, 2017
README.md	README.md	Remove bogus build status badge	Aug 14, 2024
ashell	ashell	build and ashell	Aug 24, 2020
build.sbt	build.sbt	Add bioresources to build	Oct 18, 2023
fetch_nxml.py	fetch_nxml.py	Added fetch_nxml.py	Sep 22, 2017
jenkinsfile	jenkinsfile	Revert that sh thing	Jan 8, 2023
parse_papers	parse_papers	Change to runMain (#770 )	Apr 10, 2022
runPolarityTest.bat	runPolarityTest.bat	Add batch files	Oct 14, 2020
runPolarityTest.sh	runPolarityTest.sh	Add batch files	Oct 14, 2020
runReachCLI.bat	runReachCLI.bat	Add batch files	Oct 14, 2020
runReachCLI.sh	runReachCLI.sh	Change to runMain (#770 )	Apr 10, 2022
shell	shell	Update shell	Sep 1, 2020
version.sbt	version.sbt	Setting version to 1.6.3-SNAPSHOT	Jan 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reach

What is it?

Licensing

Changes

Authors

Citations

Installation

How to compile the source code

Running Reach

Processing a directory of `.nxml` papers

Pre-processing a directory of `.nxml` papers

The Interactive Shell

The sieve-based assembly system

The interactive Assembly shell

Modifying the code

Reach web services

Reach datasets

Funding

About

Releases

Packages

Contributors 17

Languages

clulab/reach

Folders and files

Latest commit

History

Repository files navigation

Reach

What is it?

Licensing

Changes

Authors

Citations

Installation

How to compile the source code

Running Reach

Processing a directory of .nxml papers

Pre-processing a directory of .nxml papers

The Interactive Shell

The sieve-based assembly system

The interactive Assembly shell

Modifying the code

Reach web services

Reach datasets

Funding

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 17

Languages

Processing a directory of `.nxml` papers

Pre-processing a directory of `.nxml` papers

Packages