ReCoRD

Reading Comprehension with Commonsense Reasoning Dataset

News

  • 07/15/2019 SuperGLUE added ReCoRD in its evaluation suite.
  • 03/17/2019 ReCoRD is now a shared task in the COIN workshop at EMNLP 2019.

What is ReCoRD?

Reading Comprehension with Commonsense Reasoning Dataset (ReCoRD) is a large-scale reading comprehension dataset which requires commonsense reasoning. ReCoRD consists of queries automatically generated from CNN/Daily Mail news articles; the answer to each query is a text span from a summarizing passage of the corresponding news. The goal of ReCoRD is to evaluate a machine's ability of commonsense reasoning in reading comprehension. ReCoRD is pronounced as [ˈrɛkərd].


ReCoRD contains 120,000+ queries from 70,000+ news articles. Each query has been validated by crowdworkers. Unlike existing reading comprehension datasets, ReCoRD contains a large portion of queries requiring commonsense reasoning, thus presenting a good challenge for future research to bridge the gap between human and machine commonsense reading comprehension.

ReCoRD paper (Zhang et al. '18)

Browse examples in ReCoRD in a friendly way:

Browse ReCoRD

Getting Started

We've built a few resources to help you get started with the dataset.

Download a copy of the dataset in JSON format:

Read the following Readme to get familiar with the data structure.

To evaluate your models, we have also made available the evaluation script we will use for official evaluation, along with a sample prediction file that the script will take as input. To run the evaluation, use python evaluate.py <path_to_dev> <path_to_predictions>.

Once you have a built a model that works to your expectations on the dev set, you submit it to get official scores on the dev and a hidden test set. To preserve the integrity of test results, we do not release the test set to the public. Instead, we require you to submit your model so that we can run it on the test set for you. Here's a tutorial walking you through official evaluation of your model:Submission Tutorial

License

ReCoRD contains passages from two domains. We make them public under the following licenses:

Have Questions?

Ask us questions at our google group or at zsheng2@jhu.edu.

Acknowledgements

We thank the SQuAD team for allowing us to use their code and templates for generating this website.

Leaderboard

RankModelEMF1
Human Performance

Johns Hopkins University

(Zhang et al. '18)
91.3191.69

1

Mar 26, 2020
LUKE (single model)

Studio Ousia & NAIST & RIKEN AIP

90.6491.21

2

Jul 20, 2019
XLNet + MTL + Verifier (ensemble)

PingAn Smart Health & SJTU

83.0983.74

3

Jul 20, 2019
XLNet + MTL + Verifier (single model)

PingAn Smart Health & SJTU

81.4682.66

3

Jul 09, 2019
CSRLM (single model)

Anonymous

81.7882.58

4

Jul 24, 2019
{SKG-NET} (single model)

Anonymous

79.4880.04

5

Jan 11, 2019
KT-NET (single model)

Baidu NLP

71.6073.62

5

May 16, 2019
SKG-BERT (single model)

Anonymous

72.2472.78

6

Nov 29, 2018
DCReader+BERT (single model)

Anonymous

69.4971.14

7

Oct 08, 2020
GraphBert (single)

Anonymous

60.8062.99

8

Oct 07, 2020
GraphBert-WordNet (single)

Anonymous

59.8661.89

9

Oct 08, 2020
GraphBert-NELL (single)

Anonymous

59.4161.51

10

Nov 16, 2018
BERT-Base (single model)

JHU [modification of the Google AI implementation]

https://arxiv.org/pdf/1810.04805.pdf
54.0456.07

11

Oct 25, 2018
DocumentQA w/ ELMo (single model)

JHU [modification of the AllenNLP implementation]

https://arxiv.org/pdf/1710.10723.pdf
45.4446.65

12

Oct 25, 2018
SAN (single model)

Microsoft Business Applications Research Group

https://arxiv.org/pdf/1712.03556.pdf
39.7740.72

13

Oct 25, 2018
DocumentQA (single model)

JHU [modification of the AllenNLP implementation]

https://arxiv.org/pdf/1710.10723.pdf
38.5239.76

14

Oct 25, 2018
ASReader (single model)

JHU [modification of the IBM Waston implementation]

https://arxiv.org/pdf/1603.01547.pdf
29.8030.35

15

Oct 25, 2018
Random Guess

JHU

18.5519.12

16

Oct 25, 2018
Language Models (single model)

JHU [modification of the Google Brain implementation]

https://arxiv.org/pdf/1806.02847.pdf
17.5718.15