A4 Article in conference proceedings
Connecting lexical entries to distributed real-world knowledge (2002)
Legrand, S., & Tyrväinen, P. (2002). Connecting lexical entries to distributed real-world knowledge. In T. Cameron, C. Shank, & K. Holley (Eds.), Proceedings of the Fifth Annual High Desert Linguistics Society Conference : November 1-2, 2002, University of New Mexico (pp. 109-120). University of New Mexico press. https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.452.4945&rep=rep1&type=pdf
JYU authors or editors
Publication details
All authors or editors: Legrand, Steve; Tyrväinen, Pasi
Parent publication: Proceedings of the Fifth Annual High Desert Linguistics Society Conference : November 1-2, 2002, University of New Mexico
Parent publication editors: Cameron, Terry; Shank, Christopher; Holley, Keri
Place and date of conference: Albuquerque, NM, USA, 1.-2.11.2002
Publication year: 2002
Pages range: 109-120
Publisher: University of New Mexico press
Place of Publication: New Mexico
Publication country: United States
Publication language: English
Persistent website address: https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.452.4945&rep=rep1&type=pdf
Publication open access: Openly available
Publication channel open access: Open Access channel
Abstract
Various attempts to create real-world databases have been made and are underway to address this and other problems. Lenat’s (1995) CYC, a massive database of real-world knowledge under painstaking construction for the last two decades, has been criticised by Yuret (1997) among others as being too explicit in its representation of knowledge in a single uniform framework and with deduction as its main inference engine. Locke (1990) warns about the dangers of creating systems that are accessible only to experts. To avoid the threat of ‘ontological imperialism’, the Semantic Web with its distributed ontologies and technologies seems a better alternative. XML coded and RDF-Schema based knowledge representation languages such as OIL and DAML-OIL and their future extensions with their increasing inference capabilities are suitable for domain specific, distributed ontology representation of real-world knowledge. If larger centralized database is eventually needed, then SUMO (Suggested Upper Merged Ontology) to unify disparate Semantic Web ontologies as discussed in (Pease et al 2002) could be used. Eventually, though, interfacing or even merging between CYC and SUMO might take place.
This paper will investigate the possibility of connecting distributed real-world ontologies in the Semantic Web to linguistic knowledge (syntactic-pragmatic-semantic) without corrupting the underlying aims of HPSG (Head-driven Phrase Structure Grammar) of Pollard and Sag (1994). RDF based real-world knowledge in the Semantic Web and elsewhere is divided into many distinct domain ontologies. After the knowledge about the domain is acquired by statistical free text parsing or by other means, the HPSG lexical entry’s CONTEXT|BACKGROUND attribute will be made to point to the domain ontology in the Semantic Web and to structure-share it with HPSG’s other components. The pointed domain ontology will be retrieved and aligned with semantic upper ontology integrated in the lexical entry and based on Pustejovsky’s (1995) qualia structure as exemplified by Verspoor (1997) for nominals and on Jackendoff’s (1983, 1990) lexical conceptual structures as used by Davis (1995) for verbs. As the result, the real-world knowledge, together with semantics, syntax and pragmatics can be integrated to constrain the structure-shared lexical entries. It should be kept in mind that although the knowledge about the applicable domain is determined by pragmatics in this case, knowledge contained by the domain ontology itself is real-world knowledge.
The main motivation behind this research is to improve the accuracy of linguistic parsers to benefit linguistic applications used in translation and language learning and other tasks, which use parsers for disambiguation. Current parsing applications might seem adequate for these purposes having reached accuracies close to 100 per cent. However, a word-based disambiguation error rate as small as 4 % is high enough to completely change the meaning of an average-length sentence translating into a 56% per-sentence error rate (Abney 1996). Deployment of real-world knowledge together with linguistic knowledge in disambiguation will help to bridge this gap.
Keywords: language technology; ontologies (information management); semantic web
Free keywords: HPGL
Contributing organizations
Ministry reporting: Yes
Preliminary JUFO rating: Not rated