Date of Graduation


Document Type


Degree Name

Bachelor of Science in Computer Engineering

Degree Level



Computer Science and Computer Engineering


The goal of this project is a service based solution that utilizes parallel and distributed processing algorithms to solve the transitive closure problem for a large dataset. A dataset may be view conceptually as a table in a database, with a physical structure representing a file containing a sequence of records and fields. Two records are said to be transitively related if and only if they are directly related due to sharing of one or more specific fields, or a sequence may be made from one record to the other under the condition that all intermediate entries are related the immediate previous and subsequent entry. The transitive closure problem is to cluster the records in a dataset into groups such that all transitively related records are in one group. An approach to solve this problem is to divide the task into two separate problems. The first of these problems is the processing of the dataset, and thus generating a set of pairs. Each of these pairs would include two record identifiers, and these pairs would exist if and only if these two records were directly related. The second of these problems is to use the record pairs to cluster the records into transitive closures. The current software solution solves this second sub problem through the reading of record pairs, produced by a different software solution, and writes the completed results of the transitive closure problem to a file. This thesis studies how to enhance the current software solution in such a way that it becomes a "service". The study includes designing, implementing, testing, and evaluating the enhanced solution. The service model identifies an aspect that would potentially benefit from restructuring or addition of functionality. A current issue is the lack of an ability to fetch transitive closure from within the solution upon the completion of a job, and is thus limited in itsdirect use with other processes or applications.