Date of Graduation

5-2017

Document Type

Thesis

Degree Name

Bachelor of Science

Degree Level

Undergraduate

Department

Computer Science and Computer Engineering

Advisor/Mentor

Li, Wing

Committee Member

Patitz, Matthew

Third Committee Member

Beavers, Gordon

Abstract

Data file layout inference refers to building the structure and determining the metadata of a text file. The text files dealt within this research are personal information records that have a consistent structure. Traditionally, if the layout structure of a text file is unknown, the human user must undergo manual labor of identifying the metadata. This is inefficient and prone to error. Content-based oracles are the current state-of-the-art automation technology that attempts to solve the layout inference problem by using databases of known metadata. This paper builds upon the information and documentation of the content-based oracles, and improves the databases of the oracles through experimentation.

Citation

Dai, K. T. (2017). Improving Automatic Content Type Identification from a Data Set. Computer Science and Computer Engineering Undergraduate Honors Theses Retrieved from https://scholarworks.uark.edu/csceuht/45

Download

Included in

Computer Engineering Commons

COinS

Computer Science and Computer Engineering Undergraduate Honors Theses

Improving Automatic Content Type Identification from a Data Set

Date of Graduation

Document Type

Degree Name

Degree Level

Department

Advisor/Mentor

Committee Member

Third Committee Member

Abstract

Citation

Included in

Search

Links

Browse

Contact Us

Computer Science and Computer Engineering Undergraduate Honors Theses

Improving Automatic Content Type Identification from a Data Set

Author

Date of Graduation

Document Type

Degree Name

Degree Level

Department

Advisor/Mentor

Committee Member

Third Committee Member

Abstract

Citation

Included in

Share

Search

Links

Browse

Contact Us