| |
Project Abstract
The Sub-Lang Analysis System
Jean-Paul R. Deshaies
Advisor: Dr. Rebecca F. Bruce
Department of Computer Science
The University of North Carolina at Asheville
One University Heights
Asheville, NC 28804
System Details:
Sub-Lang Analysis System (short for Subset of Language Analysis System) is a system written
in the Java and C++ programming languages. This system consists of two component applications;
an 'annotator' and an 'analyzer.' The 'annotator' is an application written in Java which
facilitates the application of a characterization of text via a process know as 'tagging.'
Tagging refers to the applying of properties to lexical items (words) in order to discretely
represent their (words) meaning within the context of a selection of language. Once a
selection of language has been 'tagged', the 'annotator' provides the ability to generate an
'annotated data' output file which is used as an input file to the 'analyzer' component of the
system. The 'analyzer' is an application written in C++ which is used to analyze annotated
data. The system takes as input, lexical data that has been categorized by more than one human
judge. The application analyzes the agreement among the multiple judges in categorizing the
text, and also provides the ability to compare and search the annotated data. The application
returns results in the form of output text files with content related to the categorizations
of the annotated data.
Purpose:
Artificial Intelligence researchers in the Natural Language Processing field (AI-NLP) can use
this system to draw conclusions concerning the meaning and usage of a subset of language.
The meanings and usage of lexical items (usually words) are typically represented by categories
whose definitions are partially subjective. For example, "noun" is a category describing
grammatical usage. This type of software is useful in evaluating and testing the reliability
with which humans can assign a categorization to text. It enables AI-NLP researchers to
develop and define categorizations; a task that is necessary in most AI-NLP processing systems.
Hardware / Software:
Sun Workstation, GNU Project Utilities (g++), IBM PC (MS-Windows), Java 2 SDK
|