Skip to main content

Predicate Argument Structures for Information Extraction from Dependency Representations: Null Elements are Missing

  • Chapter
  • First Online:
Distributed Systems and Applications of Information Filtering and Retrieval

Part of the book series: Studies in Computational Intelligence ((SCI,volume 515))

Abstract

State of the art parsers are currently trained on converted versions of Penn Treebank into dependency representations which however don’t include null elements. This is done to facilitate structural learning and prevent the probabilistic engine to postulate the existence of deprecated null elements everywhere (see [15]). However it is a fact that in this way, the semantics of the representation used and produced on runtime is inconsistent and will reduce dramatically its usefulness in real life applications like Information Extraction, Q/A and other semantically driven fields by hampering the mapping of a complete logical form. What systems have come up with are “Quasi”-logical forms or partial logical forms mapped directly from the surface representation in dependency structure. We show the most common problems derived from the conversion and then describe an algorithm that we have implemented to apply to our converted Italian Treebank, that can be used on any CONLL-style treebank or representation to produce an “almost complete” semantically consistent dependency treebank.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bies, A., Ferguson, M., Katz, K., MacIntyre, R., Tredinnick, V., Kim, G., Ann Marcinkiewicz, M., Schasberger, B.: Bracketing guidelines for Treebank II style Penn treebank.uni-tuebingen.de/\(\sim \)dm/07/autumn/795.10/ptb-annotation-guide/root. html (1995)

    Google Scholar 

  2. Cahill, A., McCarthy, M., van Genabith, J., Way, A.: Automatic annotation of the Penn-Treebank with LFG f-structure information. In: LREC: Workshop on Linguistic Knowledge Acquisition and Representation: Bootstrapping Annotated Language Data. Las Palmas (2002)

    Google Scholar 

  3. Cahill, A., McCarthy, M., van Genabith, J., Way, A.: Quasi-logical forms for the Penn Treebank. In: Bunt H., van der Sluis I., Morante R. (eds.) Proceedings of the Fifth International Workshop on Computational Semantics, IWCS-05, pp. 55–71. Tilburg (2003)

    Google Scholar 

  4. Cai, S., Chiang, D., Goldberg, Y.: Language-independent parsing with empty elements. In: Proceedings of the 49th Annual Meeting of the ACL, pp. 212–216 (2011)

    Google Scholar 

  5. Campbell, R.: Using linguistic principles to recover empty categories. In Proceedings of ACL (2004)

    Google Scholar 

  6. Chung, T., Gildea, D.: Effects of empty categories on machine translation. In Proceedings EMNLP (2010)

    Google Scholar 

  7. Choi, J.D., Palmer, M.: Robust constituent-to-dependency conversion for english. In: Proceedings of the 9th International Workshop on Treebanks and Linguistic Theories (TLT’9), pp. 55–66. Tartu (2010)

    Google Scholar 

  8. Clark, S., Curran, J.R.: Comparing the accuracy of CCG and Penn Treebank parsers. In: Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, pp. 53–56. Suntec, Singapore (2009)

    Google Scholar 

  9. De Marneffe, M.C., MacCartney, B., Manning, C.D.: Generating typed dependency parses from phrase structure parses. In: Proceedings of LREC, pp. 449–454 (2006/5)

    Google Scholar 

  10. Delmonte, R., Bristot, A., Tonelli, S.: VIT —Venice Italian Treebank: Syntactic and Quantitative Features. In: De Smedt, K., Hajic, J., Kübler, S. (eds.), Proceedings of Sixth International Workshop on TLT, vol. 1, pp. 43–54. Nealt Proceeding Series (2007)

    Google Scholar 

  11. Delmonte R., Bianchi, D.: Semantic web, RDFs and NLP for QA. In: Calzolari N., Magnini B. (eds.) Proceedings of the Workshop on “Topics and Perspectives of NLP in Italy”, Università di Pisa, AI*IA, pp. 67–75 (2003)

    Google Scholar 

  12. Dienes P., Dubey, A.: Antecedent recovery: experiments with a trace tagger. In: Proceedingsof EMNLP (2003a)

    Google Scholar 

  13. Dienes P., Dubey, A.: Deep processing by combining shallow methods. In: Proceedings of ACL (2003b)

    Google Scholar 

  14. Gabbard, R., Marcus M., Kulick, S.: Fully parsing the Penn Treebank. In: Proceedings of the HLT Conference of the North American Chapter of the ACL, pp. 184–191 (2006)

    Google Scholar 

  15. Gaizauskas, R.: Investigations into the Grammar Underlying the Penn Treebank II, Technical Report CS-95-25. Univeristy of Sheffield, Department of Computer Science (1995)

    Google Scholar 

  16. Guo, Y., van Genabith, J., Wang, H.: Treebank-based acquisition of LFG resources for Chinese. In: Lexical Functional Grammar, pp. 28–30. California (2007)

    Google Scholar 

  17. Johnson, M.: A simple patter-matching algorithm for recovering empty nodes and their antecedents. In: Proceedings of the 39th Annual Meeting of the ACL, 136–143, Toulouse, France (2001)

    Google Scholar 

  18. Johansson, R., Nugues, P.: Extended constituent-to-dependency conversion for english. In: Proceedings of NODALIDA 2007, Tartu (2007)

    Google Scholar 

  19. Katz, B.: Annotating the World Wide Web using natural language. In: RIAO ’97 (1997)

    Google Scholar 

  20. Liakata, M., Pulman, S.: From Trees to Predicate-Argument Structures. In: Proceedings of the 19th International Conference on Computational Linguistics (COLING 2002), pp. 563–569. Taipei (2002)

    Google Scholar 

  21. Litkowski, K.C.: Syntactic clues and Lexical resources in question-answering. In: Voorhees E.M., Harman D.K. (eds.) The Ninth Text Retrieval Conference (TREC-9). NIST Special Publication 500–249, Gaithersburg, pp. 157–166 (2001)

    Google Scholar 

  22. Marcus, M., Kim, G., Ann Marcinkiewicz, M., Macintyre, R., Bies, A., Ferguson, M., Katz, K., Schasberger, B.: The Penn Treebank: annotating predicate argument structure. In: ARPA Human Language Technology Workshop, pp. 114–119 (1994)

    Google Scholar 

  23. Sagae, K., Tsujii, J.: Shift-reduce dependency DAG parsing. In: Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008), Manchester (2008)

    Google Scholar 

  24. Schmid, H.: Trace prediction and recovery with unlexicalized PCFGs and slash features. In: Proceedings COLING-ACL (2006)

    Google Scholar 

  25. Tonelli, S., Delmonte, R., Bristot, A.: Enriching the Venice Italian Treebank with dependency and grammatical relations, LREC 2008 (2008)

    Google Scholar 

  26. Xue, N., Xia, F., Chiou, F.-D., Palmer, M.: The Penn Chinese TreeBank: phrase structure annotation of a large corpus. Nat. Lang. Eng. 11(2), 207–238 (2005)

    Google Scholar 

  27. Yang, Y., Xue, N.: Chasing the ghost: recovering empty categories in the Chinese Treebank. In: Proceedings COLING (2010)

    Google Scholar 

  28. http://nlp.stanford.edu:8080/parser/

  29. http://www.connexor.com/nlplib/?q=demo/syntax

Download references

Acknowledgments

This work has been partially funded by the PARLI Project (Portale per l’Accesso alle Risorse Linguistiche per l’Italiano—MIUR—PRIN 2008).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rodolfo Delmonte .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Delmonte, R. (2014). Predicate Argument Structures for Information Extraction from Dependency Representations: Null Elements are Missing. In: Lai, C., Giuliani, A., Semeraro, G. (eds) Distributed Systems and Applications of Information Filtering and Retrieval. Studies in Computational Intelligence, vol 515. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40621-8_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-40621-8_2

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-40620-1

  • Online ISBN: 978-3-642-40621-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics