Deshaies, Jean-Paul
 
                                        Project Abstract

                                  The Sub-Lang Analysis System

                                      Jean-Paul R. Deshaies

                                  Advisor:  Dr. Rebecca F. Bruce

                                  Department of Computer Science

                           The University of North Carolina at Asheville
                                     One University Heights
                                      Asheville, NC 28804


System Details:



Sub-Lang Analysis System (short for Subset of Language Analysis System) is a system written 



in the Java and C++ programming languages.  This system consists of two component applications; 



an 'annotator' and an 'analyzer.'  The 'annotator' is an application written in Java which 



facilitates the application of a characterization of text via a process know as 'tagging.'  



Tagging refers to the applying of properties to lexical items (words) in order to discretely 



represent their (words) meaning within the context of a selection of language.  Once a 



selection of language has been 'tagged', the 'annotator' provides the ability to generate an 



'annotated data' output file which is used as an input file to the 'analyzer' component of the 



system.  The 'analyzer' is an application written in C++ which is used to analyze annotated 



data.  The system takes as input, lexical data that has been categorized by more than one human 



judge.  The application analyzes the agreement among the multiple judges in categorizing the 



text, and also provides the ability to compare and search the annotated data.  The application 



returns results in the form of output text files with content related to the categorizations 



of the annotated data.











Purpose:



Artificial Intelligence researchers in the Natural Language Processing field (AI-NLP) can use 



this system to draw conclusions concerning the meaning and usage of a subset of language.  



The meanings and usage of lexical items (usually words) are typically represented by categories 



whose definitions are partially subjective.  For example, "noun" is a category describing 



grammatical usage.  This type of software is useful in evaluating and testing the reliability 



with which humans can assign a categorization to text.  It enables AI-NLP researchers to 



develop and define categorizations; a task that is necessary in most AI-NLP processing systems.











Hardware / Software:



Sun Workstation, GNU Project Utilities (g++), IBM PC (MS-Windows), Java 2 SDK