A4 Article in conference proceedings
Connecting lexical entries to distributed real-world knowledge (2002)


Legrand, S., & Tyrväinen, P. (2002). Connecting lexical entries to distributed real-world knowledge. In T. Cameron, C. Shank, & K. Holley (Eds.), Proceedings of the Fifth Annual High Desert Linguistics Society Conference : November 1-2, 2002, University of New Mexico (pp. 109-120). University of New Mexico press. https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.452.4945&rep=rep1&type=pdf


JYU authors or editors


Publication details

All authors or editorsLegrand, Steve; Tyrväinen, Pasi

Parent publicationProceedings of the Fifth Annual High Desert Linguistics Society Conference : November 1-2, 2002, University of New Mexico

Parent publication editorsCameron, Terry; Shank, Christopher; Holley, Keri

Place and date of conferenceAlbuquerque, NM, USA1.-2.11.2002

Publication year2002

Pages range109-120

PublisherUniversity of New Mexico press

Place of PublicationNew Mexico

Publication countryUnited States

Publication languageEnglish

Persistent website addresshttps://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.452.4945&rep=rep1&type=pdf

Publication open accessOpenly available

Publication channel open accessOpen Access channel


Abstract

Real-world knowledge is made up of myriads of everyday facts and their relationships. Without this knowledge it is practically impossible to make sense of the world when we communicate with each other. However - unlike syntax, pragmatics, and semantics - real-world knowledge is not linguistic knowledge. For this reason, it has proved problematic for linguists attempting to incorporate the required knowledge for language use into lexical entries.

Various attempts to create real-world databases have been made and are underway to address this and other problems. Lenat’s (1995) CYC, a massive database of real-world knowledge under painstaking construction for the last two decades, has been criticised by Yuret (1997) among others as being too explicit in its representation of knowledge in a single uniform framework and with deduction as its main inference engine. Locke (1990) warns about the dangers of creating systems that are accessible only to experts. To avoid the threat of ‘ontological imperialism’, the Semantic Web with its distributed ontologies and technologies seems a better alternative. XML coded and RDF-Schema based knowledge representation languages such as OIL and DAML-OIL and their future extensions with their increasing inference capabilities are suitable for domain specific, distributed ontology representation of real-world knowledge. If larger centralized database is eventually needed, then SUMO (Suggested Upper Merged Ontology) to unify disparate Semantic Web ontologies as discussed in (Pease et al 2002) could be used. Eventually, though, interfacing or even merging between CYC and SUMO might take place.

This paper will investigate the possibility of connecting distributed real-world ontologies in the Semantic Web to linguistic knowledge (syntactic-pragmatic-semantic) without corrupting the underlying aims of HPSG (Head-driven Phrase Structure Grammar) of Pollard and Sag (1994). RDF based real-world knowledge in the Semantic Web and elsewhere is divided into many distinct domain ontologies. After the knowledge about the domain is acquired by statistical free text parsing or by other means, the HPSG lexical entry’s CONTEXT|BACKGROUND attribute will be made to point to the domain ontology in the Semantic Web and to structure-share it with HPSG’s other components. The pointed domain ontology will be retrieved and aligned with semantic upper ontology integrated in the lexical entry and based on Pustejovsky’s (1995) qualia structure as exemplified by Verspoor (1997) for nominals and on Jackendoff’s (1983, 1990) lexical conceptual structures as used by Davis (1995) for verbs. As the result, the real-world knowledge, together with semantics, syntax and pragmatics can be integrated to constrain the structure-shared lexical entries. It should be kept in mind that although the knowledge about the applicable domain is determined by pragmatics in this case, knowledge contained by the domain ontology itself is real-world knowledge.

The main motivation behind this research is to improve the accuracy of linguistic parsers to benefit linguistic applications used in translation and language learning and other tasks, which use parsers for disambiguation. Current parsing applications might seem adequate for these purposes having reached accuracies close to 100 per cent. However, a word-based disambiguation error rate as small as 4 % is high enough to completely change the meaning of an average-length sentence translating into a 56% per-sentence error rate (Abney 1996). Deployment of real-world knowledge together with linguistic knowledge in disambiguation will help to bridge this gap.


Keywordslanguage technologyontologies (information management)semantic web

Free keywordsHPGL


Contributing organizations


Ministry reportingYes

Preliminary JUFO ratingNot rated


Last updated on 2023-07-02 at 09:20