An Effective Platform for Tabular Perturbation

About

To truly grasp reasoning ability, a Natural Language Inference model should be evaluated on counterfactual data. TabPert facilitates this by assisting in the generation of such counterfactual data for assessing model tabular reasoning issues. TabPert allows a user to update a table, change its associated hypotheses, change their labels, and highlight rows that are important for hypothesis classification. TabPert also captures information about the techniques used to automatically produce the table, as well as the strategies employed to generate the challenging hypotheses. These counterfactual tables and hypotheses, as well as the metadata, can then be used to explore an existing model’s shortcomings methodically and quantitatively.

tl;dr: TabPert is a tool to augment existing tabular datasets to effectively and efficiently create counterfactual datasets.

Right Inference ≠ Right Reasoning

Existing NLI models tend to exploit annotation artefacts, pre-trained knowledge, and hypothesis biases in data to answer a premise. Therefore, to test their performance effectively, we must test them on adversarial data, on which their performance significantly reduces. However, perturbing existing tabular datasets to create counterfactual data is difficult and inefficient without specialised tools. This is where TabPert comes in!

The Utility of TabPert

TabPert is built specifically to allow annotators to effectively and efficiently perturb tabular data and hypotheses.

Example Table

For a case study of TabPert, we use the InfoTabS dataset. Below is an inference example from InfoTabS. On the right is a premise and on the left are some of its hypotheses. Here, colors 'green', 'gray', and 'red' represent true (i.e., entailment), maybe true (i.e., neutral) and false (i.e., contradiction) statements, respectively.

How does it Work?

Annotation proceeds in two stages: the automatic stage followed by the manual stage. Metadata is collected in both stages: metadata for changes to the table is logged automatically, while metadata for hypothesis changes is input by the annotator.

In the automatic perturbation stage, values are shuffled around the tables in the dataset. Values of one 'type' can be replaced by other values of the same type. These types must be specified beforehand. TabPert automatically logs the 'source' of each shuffled value, and this data can be used to find shortcomings in the model like overfitting.

In the manual perturbation stage, annotators correct any logical inconsistencies introduced in the table during the automatic stage, and perturb hypotheses. TabPert automatically logs the kinds of changes done to the table. For the hypotheses, the annotator must manually specify the relevant rows (sections in the table necessary to answer a hypothesis) as well as the strategy used to change each hypothesis. For more information about this stage, click here.

Results

Our team used TabPert to creat a counterfactual dataset from the InfoTabS dataset, consisting of 47 tables with 423 hypotheses. We sorted the hypotheses according to the perturbation strategy marked by the annotators and ran the InfoTabS RoBERTaLarge model on them. The results are shown in the figures below. Note especially the significant performance drop on the counterfactual data in the first three strategies—these were the methods where annotators flipped the label of the hypothesis. So, collecting all that metadata has helped us detect significant hypothesis bias in the model!

Now, as another example, let's look at how the model performs when given only partial premises in the table below. First, note that when shown just the hypotheses without any premise, the model's performance is closer to majority-label baselines on the counterfactual dataset than on the original dataset. This confirms a reduction in hypothesis bias in the new dataset. Next, when shown only the relevant rows while answering a hypothesis, the model's performance falls on the original dataset, indicating that it utilises irrelevant rows as artefacts. However, in the same situation, the performance improves on the counterfactual dataset!

Model Type Original Counterfactual
Majority 33.33 33.33
Hypothesis Only 64.32 44.85
All Rows 78.91 61.26
Relevant Rows 74.11 65.85
Human 84.8 85.8
Performance (accuracy %) of the InfoTabS RoBERTaLarge model on original and counterfactual annotated data.

So, TabPert has both, helped in the creation of an effective adversarial dataset, and helped identify model weaknesses!

People

TabPert has been prepared by the following people at IIT Kanpur and School of Computing of University of Utah:

From left to right, Nupur Jain, Vivek Gupta, Anshul Rai and Gaurav Kumar.

Citation

Please cite our paper as below if you use TabPert.

@inproceedings{jain-etal-2021-tabpert,
    title = "{T}ab{P}ert : An Effective Platform for Tabular Perturbation",
    author = "Jain, Nupur  and
      Gupta, Vivek  and
      Rai, Anshul  and
      Kumar, Gaurav",
    booktitle = "Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations",
    month = nov,
    year = "2021",
    address = "Online and Punta Cana, Dominican Republic",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.emnlp-demo.39",
    pages = "350--360",
    abstract = "To grasp the true reasoning ability, the Natural Language Inference model should be evaluated on counterfactual data. TabPert facilitates this by generation of such counterfactual data for assessing model tabular reasoning issues. TabPert allows the user to update a table, change the hypothesis, change the labels, and highlight rows that are important for hypothesis classification. TabPert also details the technique used to automatically produce the table, as well as the strategies employed to generate the challenging hypothesis. These counterfactual tables and hypotheses, as well as the metadata, is then used to explore the existing model{'}s shortcomings methodically and quantitatively.",
}

Acknowledgement

Authors appreciate the Utah NLP group members' valuable suggestions at various phases of the project, as well as the reviewers' helpful remarks. We also acknowledge NSF grants #1801446 (SATC) and #1822877 (Cyberlearning), as well as a kind donation from Verisk Inc. We also like to thank Vibinex (Alokit Innovations Pvt Ltd) for providing a mentoring platform.

Annotation Instructions

These are the instructions for using TabPert. You should watch the demonstration video to see the platform in action.

When you launch TabPert in your browser and open a particular Table for which you wish to generate counterfactual data, you will be presented with 3 tables (Table A, Table B, Table C) whose entries are created by shuffling data from similar tables during the automatic stage.

These automatically perturbed tables can be edited by the user manually according to his needs. There are two major ways to do so

Changing the Table: In this, we change the table in the several ways, as described below:

  • Changing the value in a particular entry: We can just place our cursor and edit the values in a particular entry
  • Changing the Key: In this, we press the small pencil button next to the Key we wish to change and enter the key’s new value in the popup
  • Adding new entry or deleting existing entry: This can be done by using drag and drop along with the “add” and “delete” areas on the top of the tables
  • Adding a new section: Click the 'Add section' button below a table to do this.
  • Deleting a section: Press the pencil icon beside the key and click 'Delete'.
  • Changing the hypothesis: We can also make changes in the hypothesis and capture the strategies used to make these changes in TabPert. The various ways are as follows:

  • Adding the Table Changing strategy: This is done by pressing the “+” button next to the hypothesis which opens the following pop-up. The user can select the relevant rows used to arrive at the correct label for the hypothesis and also select the strategy the user used to change the hypothesis label.