Date of Graduation
5-2017
Document Type
Thesis
Degree Name
Bachelor of Science
Degree Level
Undergraduate
Department
Computer Science and Computer Engineering
Advisor/Mentor
Li, Wing
Committee Member/Reader
Patitz, Matthew
Committee Member/Third Reader
Beavers, Gordon
Abstract
Data file layout inference refers to building the structure and determining the metadata of a text file. The text files dealt within this research are personal information records that have a consistent structure. Traditionally, if the layout structure of a text file is unknown, the human user must undergo manual labor of identifying the metadata. This is inefficient and prone to error. Content-based oracles are the current state-of-the-art automation technology that attempts to solve the layout inference problem by using databases of known metadata. This paper builds upon the information and documentation of the content-based oracles, and improves the databases of the oracles through experimentation.
Citation
Dai, K. T. (2017). Improving Automatic Content Type Identification from a Data Set. Computer Science and Computer Engineering Undergraduate Honors Theses Retrieved from https://scholarworks.uark.edu/csceuht/45