Using the MPQA Annotation Checker

First, a great big THANK YOU to David Pierce for writing this GATE module!

If this is your first time to run the MPQA Annotation Checker, you will need to load this processing resource into GATE, as well as setup a few additional GATE resources. If you do not delete these resources from your GATE session, they should be loaded automatically from now on, everytime you run GATE.

Start GATE on your machine, making sure that you are pointing to the MPQA creole.xml resource. See startgate.html for these instructions.

Loading the MPQA Annotation Checker

  1. Right click on Processing Resources
  2. New -> MPQA New Annotation Checker
  3. Click OK in the 'Parameters for the new MPQA Annotation Checker' window that will open

The MPQA Annotation Checker resource should be added under Processing Resources in your GATE window.

Creating a New GATE Corpus

  1. Right click on Language Resources
  2. New -> GATE corpus
  3. Click OK in the 'Parameters for the new GATE corpus' window that will open

A GATE corpus with an automatically assigned name should be added under Language Resources in your GATE window.

Creating a New Corpus Pipeline

  1. Right click on Applications
  2. New -> Corpus Pipeline
  3. Click OK in the 'Parameters for the new Corpus Pipeline' window that will open
  4. The new Corpus Pipeline will be added under Applications in your GATE window. Double click on the new Corpus Pipeline. It will open up the Corpus Pipeline for configuration in your main GATE window.
  5. Select MPQA New Annotation Checker under Loaded Processing resources and click the -> button to move the resource to Selected Processing resources.
  6. If your new GATE corpus is not already listed for the Corpus parameter, select your newly created GATE corpus for this parameter using the down arrow.
  7. Double click on the MPQA New Annotation Checker under Selected Processing resources. This will bring the parameters for the MPQA New Annotation Checker into the middle frame.
  8. Click under the Value heading and change the annotationStyle from broad to deep.

Your new corpus pipeline setup should look like this:

Checking an Annotated Document

  1. Add the Document to your GATE Corpus

    If you have not already done so, you will now need to load the annotated document that you want to check into GATE:

    1. Double click on your GATE corpus, under Language Resources. This will open up the corpus into the main GATE window.
    2. In the upper-left corner of the Corpus window, there is a green '+' (add) button and a red 'x' (delete) button. The green button is for adding documents to your corpus; the red button is for removing documents from your corpus. Click on the green + button to open the Add documents to corpus window.
    3. Select the document that you want to check, and click OK to add it to the corpus.
  2. Run the New Annotation Checker by opening the Corpus Pipeline window and clicking on the Run button at the bottom right.

  3. The tab for the Messages window should have lit up. Go to the Messages window to see what errors the Checker found in your annotated document. The list of errors will look something like this:

    To briefly explain some of these errors:

    This is by no means an exhaustive list of the types of things the MPQA annotation checker will warn you about, but it should give you an idea of what the various warnings mean.

  4. Open up the article that you just checked so that you can see the types of and list of annotations. You should notice that a new type of annotation has been added: warning. It is listed under the Check annotation set.

  5. Click on the checkbox to the left of the warning annotation type to show these annotations in the annotation list at the bottom middle of your GATE window. Any annotated span for which the checker found an error now has an additional warning annotation for that same span.

    For example, if you sort the annotations by starting byte, the first warning annotation should be:

    TypeSetStartEnd
    warningCheck00 {es.intensity=strength missing, es.polarity=polarity missing, es.nested-source=nested-source missing, originally=expressive-subjectivity}

    This is a warning that you will get every time due to the zero-span expressive-subjectivity annotation that was added to the beginning of the the article during preprocessing. DELETE this first warning annotation.

  6. One by one, find each warning annotation and figure out what caused the error. After you fix the problem, delete the warning annotation. When you have fixed and deleted all warnings, the warning annotation type will disappear from under Check annotations. If you had a lot of errors in a particular document, you may want to run the MPQA Annotation Checker again, just to make sure you fixed everything.

The above will hopefully get you started debugging your own warning messages. If you truly get stuck and don't know why the MPQA Annotation Checker is flagging something, please contact us at  mpqa-annotation@cs.pitt.edu.