Abstract
State of the art parsers are currently trained on converted versions of Penn Treebank into dependency representations which however don’t include null elements. This is done to facilitate structural learning and prevent the probabilistic engine to postulate the existence of deprecated null elements everywhere (see [15]). However it is a fact that in this way, the semantics of the representation used and produced on runtime is inconsistent and will reduce dramatically its usefulness in real life applications like Information Extraction, Q/A and other semantically driven fields by hampering the mapping of a complete logical form. What systems have come up with are “Quasi”-logical forms or partial logical forms mapped directly from the surface representation in dependency structure. We show the most common problems derived from the conversion and then describe an algorithm that we have implemented to apply to our converted Italian Treebank, that can be used on any CONLL-style treebank or representation to produce an “almost complete” semantically consistent dependency treebank.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bies, A., Ferguson, M., Katz, K., MacIntyre, R., Tredinnick, V., Kim, G., Ann Marcinkiewicz, M., Schasberger, B.: Bracketing guidelines for Treebank II style Penn treebank.uni-tuebingen.de/\(\sim \)dm/07/autumn/795.10/ptb-annotation-guide/root. html (1995)
Cahill, A., McCarthy, M., van Genabith, J., Way, A.: Automatic annotation of the Penn-Treebank with LFG f-structure information. In: LREC: Workshop on Linguistic Knowledge Acquisition and Representation: Bootstrapping Annotated Language Data. Las Palmas (2002)
Cahill, A., McCarthy, M., van Genabith, J., Way, A.: Quasi-logical forms for the Penn Treebank. In: Bunt H., van der Sluis I., Morante R. (eds.) Proceedings of the Fifth International Workshop on Computational Semantics, IWCS-05, pp. 55–71. Tilburg (2003)
Cai, S., Chiang, D., Goldberg, Y.: Language-independent parsing with empty elements. In: Proceedings of the 49th Annual Meeting of the ACL, pp. 212–216 (2011)
Campbell, R.: Using linguistic principles to recover empty categories. In Proceedings of ACL (2004)
Chung, T., Gildea, D.: Effects of empty categories on machine translation. In Proceedings EMNLP (2010)
Choi, J.D., Palmer, M.: Robust constituent-to-dependency conversion for english. In: Proceedings of the 9th International Workshop on Treebanks and Linguistic Theories (TLT’9), pp. 55–66. Tartu (2010)
Clark, S., Curran, J.R.: Comparing the accuracy of CCG and Penn Treebank parsers. In: Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, pp. 53–56. Suntec, Singapore (2009)
De Marneffe, M.C., MacCartney, B., Manning, C.D.: Generating typed dependency parses from phrase structure parses. In: Proceedings of LREC, pp. 449–454 (2006/5)
Delmonte, R., Bristot, A., Tonelli, S.: VIT —Venice Italian Treebank: Syntactic and Quantitative Features. In: De Smedt, K., Hajic, J., Kübler, S. (eds.), Proceedings of Sixth International Workshop on TLT, vol. 1, pp. 43–54. Nealt Proceeding Series (2007)
Delmonte R., Bianchi, D.: Semantic web, RDFs and NLP for QA. In: Calzolari N., Magnini B. (eds.) Proceedings of the Workshop on “Topics and Perspectives of NLP in Italy”, Università di Pisa, AI*IA, pp. 67–75 (2003)
Dienes P., Dubey, A.: Antecedent recovery: experiments with a trace tagger. In: Proceedingsof EMNLP (2003a)
Dienes P., Dubey, A.: Deep processing by combining shallow methods. In: Proceedings of ACL (2003b)
Gabbard, R., Marcus M., Kulick, S.: Fully parsing the Penn Treebank. In: Proceedings of the HLT Conference of the North American Chapter of the ACL, pp. 184–191 (2006)
Gaizauskas, R.: Investigations into the Grammar Underlying the Penn Treebank II, Technical Report CS-95-25. Univeristy of Sheffield, Department of Computer Science (1995)
Guo, Y., van Genabith, J., Wang, H.: Treebank-based acquisition of LFG resources for Chinese. In: Lexical Functional Grammar, pp. 28–30. California (2007)
Johnson, M.: A simple patter-matching algorithm for recovering empty nodes and their antecedents. In: Proceedings of the 39th Annual Meeting of the ACL, 136–143, Toulouse, France (2001)
Johansson, R., Nugues, P.: Extended constituent-to-dependency conversion for english. In: Proceedings of NODALIDA 2007, Tartu (2007)
Katz, B.: Annotating the World Wide Web using natural language. In: RIAO ’97 (1997)
Liakata, M., Pulman, S.: From Trees to Predicate-Argument Structures. In: Proceedings of the 19th International Conference on Computational Linguistics (COLING 2002), pp. 563–569. Taipei (2002)
Litkowski, K.C.: Syntactic clues and Lexical resources in question-answering. In: Voorhees E.M., Harman D.K. (eds.) The Ninth Text Retrieval Conference (TREC-9). NIST Special Publication 500–249, Gaithersburg, pp. 157–166 (2001)
Marcus, M., Kim, G., Ann Marcinkiewicz, M., Macintyre, R., Bies, A., Ferguson, M., Katz, K., Schasberger, B.: The Penn Treebank: annotating predicate argument structure. In: ARPA Human Language Technology Workshop, pp. 114–119 (1994)
Sagae, K., Tsujii, J.: Shift-reduce dependency DAG parsing. In: Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008), Manchester (2008)
Schmid, H.: Trace prediction and recovery with unlexicalized PCFGs and slash features. In: Proceedings COLING-ACL (2006)
Tonelli, S., Delmonte, R., Bristot, A.: Enriching the Venice Italian Treebank with dependency and grammatical relations, LREC 2008 (2008)
Xue, N., Xia, F., Chiou, F.-D., Palmer, M.: The Penn Chinese TreeBank: phrase structure annotation of a large corpus. Nat. Lang. Eng. 11(2), 207–238 (2005)
Yang, Y., Xue, N.: Chasing the ghost: recovering empty categories in the Chinese Treebank. In: Proceedings COLING (2010)
Acknowledgments
This work has been partially funded by the PARLI Project (Portale per l’Accesso alle Risorse Linguistiche per l’Italiano—MIUR—PRIN 2008).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Delmonte, R. (2014). Predicate Argument Structures for Information Extraction from Dependency Representations: Null Elements are Missing. In: Lai, C., Giuliani, A., Semeraro, G. (eds) Distributed Systems and Applications of Information Filtering and Retrieval. Studies in Computational Intelligence, vol 515. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40621-8_2
Download citation
DOI: https://doi.org/10.1007/978-3-642-40621-8_2
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40620-1
Online ISBN: 978-3-642-40621-8
eBook Packages: EngineeringEngineering (R0)