ReCoRD

News

07/15/2019 SuperGLUE added ReCoRD in its evaluation suite.
03/17/2019 ReCoRD is now a shared task in the COIN workshop at EMNLP 2019.

What is ReCoRD?

Reading Comprehension with Commonsense Reasoning Dataset (ReCoRD) is a large-scale reading comprehension dataset which requires commonsense reasoning. ReCoRD consists of queries automatically generated from CNN/Daily Mail news articles; the answer to each query is a text span from a summarizing passage of the corresponding news. The goal of ReCoRD is to evaluate a machine's ability of commonsense reasoning in reading comprehension. ReCoRD is pronounced as [ˈrɛkərd].

ReCoRD contains 120,000+ queries from 70,000+ news articles. Each query has been validated by crowdworkers. Unlike existing reading comprehension datasets, ReCoRD contains a large portion of queries requiring commonsense reasoning, thus presenting a good challenge for future research to bridge the gap between human and machine commonsense reading comprehension.

ReCoRD paper (Zhang et al. '18)

Browse examples in ReCoRD in a friendly way:

Browse ReCoRD

Getting Started

We've built a few resources to help you get started with the dataset.

Download a copy of the dataset in JSON format:

Read the following Readme to get familiar with the data structure.

Readme

To evaluate your models, we have also made available the evaluation script we will use for official evaluation, along with a sample prediction file that the script will take as input. To run the evaluation, use python evaluate.py <path_to_dev> <path_to_predictions>.

Once you have a built a model that works to your expectations on the dev set, you submit it to get official scores on the dev and a hidden test set. To preserve the integrity of test results, we do not release the test set to the public. Instead, we require you to submit your model so that we can run it on the test set for you. Here's a tutorial walking you through official evaluation of your model:Submission Tutorial

License

ReCoRD contains passages from two domains. We make them public under the following licenses:

Passages and queries collected from the CNN/Daily Mail dataset are under Apache license.
Passages and queries crawled from Internet Archive are subject to the Term of Use.

Have Questions?

Ask us questions at our google group or at zsheng2@jhu.edu.

Acknowledgements

We thank the SQuAD team for allowing us to use their code and templates for generating this website.

Leaderboard

Rank	Model	EM	F1
	Human Performance Johns Hopkins University (Zhang et al. '18)	91.31	91.69
1 Mar 26, 2020	LUKE (single model) Studio Ousia & NAIST & RIKEN AIP	90.64	91.21
2 Jul 20, 2019	XLNet + MTL + Verifier (ensemble) PingAn Smart Health & SJTU	83.09	83.74
3 Jul 20, 2019	XLNet + MTL + Verifier (single model) PingAn Smart Health & SJTU	81.46	82.66
3 Jul 09, 2019	CSRLM (single model) Anonymous	81.78	82.58
4 Jul 24, 2019	{SKG-NET} (single model) Anonymous	79.48	80.04
5 Jan 11, 2019	KT-NET (single model) Baidu NLP	71.60	73.62
5 May 16, 2019	SKG-BERT (single model) Anonymous	72.24	72.78
6 Nov 29, 2018	DCReader+BERT (single model) Anonymous	69.49	71.14
7 Oct 08, 2020	GraphBert (single) Anonymous	60.80	62.99
8 Oct 07, 2020	GraphBert-WordNet (single) Anonymous	59.86	61.89
9 Oct 08, 2020	GraphBert-NELL (single) Anonymous	59.41	61.51
10 Nov 16, 2018	BERT-Base (single model) JHU [modification of the Google AI implementation] https://arxiv.org/pdf/1810.04805.pdf	54.04	56.07
11 Oct 25, 2018	DocumentQA w/ ELMo (single model) JHU [modification of the AllenNLP implementation] https://arxiv.org/pdf/1710.10723.pdf	45.44	46.65
12 Oct 25, 2018	SAN (single model) Microsoft Business Applications Research Group https://arxiv.org/pdf/1712.03556.pdf	39.77	40.72
13 Oct 25, 2018	DocumentQA (single model) JHU [modification of the AllenNLP implementation] https://arxiv.org/pdf/1710.10723.pdf	38.52	39.76
14 Oct 25, 2018	ASReader (single model) JHU [modification of the IBM Waston implementation] https://arxiv.org/pdf/1603.01547.pdf	29.80	30.35
15 Oct 25, 2018	Random Guess JHU	18.55	19.12
16 Oct 25, 2018	Language Models (single model) JHU [modification of the Google Brain implementation] https://arxiv.org/pdf/1806.02847.pdf	17.57	18.15