Date of Graduation
Bachelor of Science
Computer Science and Computer Engineering
Data file layout inference refers to building the structure and determining the metadata of a text file. The text files dealt within this research are personal information records that have a consistent structure. Traditionally, if the layout structure of a text file is unknown, the human user must undergo manual labor of identifying the metadata. This is inefficient and prone to error. Content-based oracles are the current state-of-the-art automation technology that attempts to solve the layout inference problem by using databases of known metadata. This paper builds upon the information and documentation of the content-based oracles, and improves the databases of the oracles through experimentation.
Dai, Kathy T., "Improving Automatic Content Type Identification from a Data Set" (2017). Computer Science and Computer Engineering Undergraduate Honors Theses. 45.