ReCoRD

Reading Comprehension with Commonsense Reasoning Dataset

News

  • 03/17/2019 ReCoRD is now a shared task in the COIN workshop at EMNLP 2019.
  • 07/15/2019 SuperGLUE added ReCoRD in its evaluation suite.

What is ReCoRD?

Reading Comprehension with Commonsense Reasoning Dataset (ReCoRD) is a large-scale reading comprehension dataset which requires commonsense reasoning. ReCoRD consists of queries automatically generated from CNN/Daily Mail news articles; the answer to each query is a text span from a summarizing passage of the corresponding news. The goal of ReCoRD is to evaluate a machine's ability of commonsense reasoning in reading comprehension. ReCoRD is pronounced as [ˈrɛkərd].


ReCoRD contains 120,000+ queries from 70,000+ news articles. Each query has been validated by crowdworkers. Unlike existing reading comprehension datasets, ReCoRD contains a large portion of queries requiring commonsense reasoning, thus presenting a good challenge for future research to bridge the gap between human and machine commonsense reading comprehension.

ReCoRD paper (Zhang et al. '18)

Browse examples in ReCoRD in a friendly way:

Browse ReCoRD

Getting Started

We've built a few resources to help you get started with the dataset.

Download a copy of the dataset in JSON format:

Read the following Readme to get familiar with the data structure.

To evaluate your models, we have also made available the evaluation script we will use for official evaluation, along with a sample prediction file that the script will take as input. To run the evaluation, use python evaluate.py <path_to_dev> <path_to_predictions>.

Once you have a built a model that works to your expectations on the dev set, you submit it to get official scores on the dev and a hidden test set. To preserve the integrity of test results, we do not release the test set to the public. Instead, we require you to submit your model so that we can run it on the test set for you. Here's a tutorial walking you through official evaluation of your model:Submission Tutorial

License

ReCoRD contains passages from two domains. We make them public under the following licenses:

Have Questions?

Ask us questions at our google group or at zsheng2@jhu.edu.

Acknowledgements

We thank the SQuAD team for allowing us to use their code and templates for generating this website.

Leaderboard

RankModelEMF1
Human Performance

Johns Hopkins University

(Zhang et al. '18)
91.3191.69

1

Jan 11, 2019
KT-NET (single model)

Baidu NLP

73.0174.76

2

May 16, 2019
SKG-BERT (single model)

Anonymous

72.2472.78

3

Nov 29, 2018
DCReader+BERT (single model)

Anonymous

70.4971.98

4

Nov 16, 2018
BERT-Base (single model)

JHU [modification of the Google AI implementation]

https://arxiv.org/pdf/1810.04805.pdf
55.9957.99

5

Oct 25, 2018
DocumentQA w/ ELMo (single model)

JHU [modification of the AllenNLP implementation]

https://arxiv.org/pdf/1710.10723.pdf
45.4446.65

6

Oct 25, 2018
SAN (single model)

Microsoft Business Applications Research Group

https://arxiv.org/pdf/1712.03556.pdf
39.7740.72

7

Oct 25, 2018
DocumentQA (single model)

JHU [modification of the AllenNLP implementation]

https://arxiv.org/pdf/1710.10723.pdf
38.5239.76

8

Oct 25, 2018
ASReader (single model)

JHU [modification of the IBM Waston implementation]

https://arxiv.org/pdf/1603.01547.pdf
29.8030.35

9

Oct 25, 2018
Random Guess

JHU

18.5519.12

10

Oct 25, 2018
Language Models (single model)

JHU [modification of the Google Brain implementation]

https://arxiv.org/pdf/1806.02847.pdf
17.5718.15