This is the readme file of the user simulator (jig) package for the TREC 2016-2017 Dynamic Domain (DD) Track.
If you have any technical question, please send your request to google group.
- Python 3.5 environment
- Updates on CubeTest and new metrics: sDCG and Expected Utility
- Additional supporting file for evaluation
- the jig (
jig/jig.py
) - evaluation metric scripts
- sample files
- Operation Systems: Mac OS, Windows, Linux.
- Python 3.5 environment.
- Download the Topics (with ground truth) from NIST.
- Obtain the TREC DD datasets following the instructions here.
- You will need to obtain the New York Times dataset from LDC.
> git clone https://github.com/trec-dd/trec-dd-jig
> cd trec-dd-jig
> pip3 install -r requirements.txt
-
Create your own directory to hold your trec-dd search system.
> mkdir your_dd_directory
-
Put the jig package under your_dd_directory, that is,
> mv trec-dd-jig your_dd_directory/.
-
Unzip and put the topic (with ground truth) file that you downloaded from NIST, under
~/your_dd_directory/trec-dd-jig/jig/topics/
> gunzip dd_topic_file.xml.gz > mv dd_topic_file.xml your_dd_directory/trec-dd-jig/topics/
-
Uncompress and preprocess the datasets. In this jig package, we release sample files from both the Ebola and the NYT datasets. The following codes are for the NYT dataset. They can be run on both the sample files and the actual dataset. Remember to replace the input
.tgz
file with the actual dataset. The resulting corpus will be in TRECTEXT format.> cd trec-dd-jig > ./config/process_nyt.sh sample_doc/nyt_sample.tgz sample_doc/nyt_sample
The first parameter is the
.tgz
file that you obtained from LDC, the second parameter is the output directory.The output directory will contain the following subdirectories:
- nyt_corpus: a directory that contains the uncompressed New York Times dataset - nyt_trectext: a directoty that contains the processed text in TRECTEXT format
-
Setup database and prepare parameter files
> python3 config/setup.py --topics sample_run/topic.xml --trecdirec sample_doc/ebola_sample sample_doc/nyt_sample/nyt_trectext --params sample_run/params
- where:
topics
: the topic xml file you download from NISTtrecdirec
: the directory that holds the Ebola and New York Times dataset in TRECTEXT formatparams
: the parameter file needed in evaluation
This script will set up a sqlite database at
./trec-dd-jig/jig/truth.db
. It will also generate a parameter file which will be used later in the metric calculation. The parameter file for the sample is located atsample_run/params
Remember to replace the file paths with the actual datasets.
Congratulations to a successful installation of the jig!
- where:
-
Your systems should call
python3 jig/jig.py
to get feedback for each search iteration. The jig outputs a json dumped string. It provides feedback to your returned documents. Use the following command to call the Jig:> python3 jig/jig.py -runid my_runid -topic topic_id -docs docno1:rankingscore docno2:rankingscore docno3:rankingscore docno4:rankingscore docno5:rankingscore
-
where:
my_runid
: An identifier used to declare the runtopic_id
: the id of the topic you are working ondocno1, docno2 ...
: the five document ids that your system returned. It needs to be the document ids in TREC DD datasets.ranking score
: the ranking score of the documents generated by your system
-
-
Suppose your run id is 'testrun'. Call the jig with the topic id, the run id, the ids of the 5 documents that your system retrieved together with their ranking scores:
> python3 jig/jig.py -runid testrun -topic DD16-1 -docs ebola-45b78e7ce50b94276a8d46cfe23e0abbcbed606a2841d1ac6e44e263eaf94a93:833.00 ebola-0000e6cdb20573a838a34023268fe9e2c883b6bcf7538ebf68decd41b95ae747:500.00 ebola-012d04f7dc6af9d1d712df833abc67cd1395c8fe53f3fcfa7082ac4e5614eac6:123.00 ebola-0002c69c8c89c82fea43da8322333d4f78d48367cc8d8672dd8a919e8359e150:34.00 ebola-9e501dddd03039fff5c2465896d39fd6913fd8476f23416373a88bc0f32e793c:5.00
-
The jig will print the feedback on the screen. Each feedback is a json dumped string.
{ "ranking_score": "833.00", "subtopics": [ { "subtopic_id": "DD16-1.1", "passage_text": "Marine Lt. Col Doug Woodhams U.S. Army Africa Sgt. Bromley and Liberian Armed Forces Capt. Abraham Karmara discuss construction details with a Liberian contractor at the future location of an Ebola treatment unit near Barclayville Liberia", "rating": 2 }, { ... }, ], "doc_id": "ebola-45b78e7ce50b94276a8d46cfe23e0abbcbed606a2841d1ac6e44e263eaf94a93", "topic_id": "DD16-1", "on_topic": "1" } { ... }
- where:
doc_id
: the id of a documentsubtopic_id
: the id of a relevant subtopic that the document coverspassage_text
: the content of a relevant passage that the document coversrating
: the graded relevance judgments provided by NIST assessors. less than 2: marginally relevant, 2: relevant, 3: highly relevant, 4: key results. The relevance grades refer to the relevance level of the passage to a subtopic.ranking score
: the ranking score of a document provided by your system
- where:
-
A run file will be automatically generated at the current directory with the runid as its name. This will be the run file that you submit to NIST later.
> cat ./testrun.txt DD16-1 0 ebola-45b78e7ce50b94276a8d46cfe23e0abbcbed606a2841d1ac6e44e263eaf94a93 833.00 1 DD16-1.1:2|DD16-1.1:2|DD16-1.2:3|DD16-1.2:3|DD16-1.2:2 DD16-1 0 ebola-0000e6cdb20573a838a34023268fe9e2c883b6bcf7538ebf68decd41b95ae747 500.00 0 DD16-1 0 ebola-012d04f7dc6af9d1d712df833abc67cd1395c8fe53f3fcfa7082ac4e5614eac6 123.00 1 DD16-1.1:3 DD16-1 0 ebola-0002c69c8c89c82fea43da8322333d4f78d48367cc8d8672dd8a919e8359e150 34.00 0 DD16-1 0 ebola-9e501dddd03039fff5c2465896d39fd6913fd8476f23416373a88bc0f32e793c 5.00 1 DD16-1.1:2
- where:
topic id
: the id of the topic you are working oniteration_number
: the ordinal number of iterations of current topicdocid
: the id of a documentranking score
: the ranking score of a document provided by your systemon topic
: 0 means the document is off topic, 1 means on topicsubtopic_id
: the id of a relevant subtopic that your document coversrating
: the relevance grade provided by NIST assessors. -1/0/1: marginally relevant (Note that: ratings -1 or 0 or 1 all mean marginally relevant), 2: relevant, 3: highly relevant, 4: key results. The relevance grades refer to the relevance level of your document to the whole topic.
- where:
-
Everytime when you run
jig/jig.py
for the same topic id, the run file will be automatically appended and the number of iterations will increase by one.
-
In 2017, the Track mainly uses three metrics: Cube Test, Session DCG (sDCG) and Expected Utility. We will provide both the raw scores and the normalized scores for each of them. We also include Precision as an additional metric. The scripts for these metrics can be found at the
./scorer
directory. -
You will need the actual topic xml file from NIST and the parameter file generated during installation to evaluate your runs. Remember to replace the sample files in the syntax.
-
The following input arguments are shared by the scoring scripts
runfile
: the run file generated by the jigtopics
: the ground truth file that is downloaded from NISTparams
: the parameter file generated during installationcutoff
: the number of iterations to be evaluated, eg.--cutoff 5
means the scores are calulated based on the first 5 iterations.
-
To run the scorers:
-
Cube Test
Syntax
>$ python3 scorer/cubetest.py --runfile your_runfile --topics your_topic.xml --params your_params_file --cutoff your_cut_off_value
To run the example
>$ python3 scorer/cubetest.py --runfile testrun.txt --topics sample_run/topic.xml --params sample_run/params --cutoff 5
-
Session DCG (sDCG)
Syntax
>$ python3 scorer/sDCG.py --runfile your_runfile --topics your_topic.xml --params your_params_file --cutoff your_cut_off_value
To run the example
>$ python3 scorer/sDCG.py --runfile testrun.txt --topics sample_run/topic.xml --params sample_run/params --cutoff 5
-
Expected Utility
Syntax
>$ python3 scorer/expected_utility.py --runfile your_runfile --topics your_topic.xml --params your_params_file --cutoff your_cut_off_value
To run the example
>$ python3 scorer/expected_utility.py --runfile testrun.txt --topics sample_run/topic.xml --params sample_run/params --cutoff 5
-
Precision (up to the current iteration)
Syntax
>$ python3 scorer/precision.py -run your_run_file -qrel your_qrel_file
To run the example
>$ python3 scorer/precision.py -run testrun.txt -qrel sample_run/qrels.txt
You can find the parameter file for TRECDD 2015, TRECDD 2016 here.
-