Accuracy and Interpretability Testing of Text Mining Methods

Ashton, Triss A.

Accuracy and Interpretability Testing of Text Mining Methods

PDF Version Also Available for Download.

Description

Extracting meaningful information from large collections of text data is problematic because of the sheer size of the database. However, automated analytic methods capable of processing such data have emerged. These methods, collectively called text mining first began to appear in 1988. A number of additional text mining methods quickly developed in independent research silos with each based on unique mathematical algorithms. How good each of these methods are at analyzing text is unclear. Method development typically evolves from some research silo centric requirement with the success of the method measured by a custom requirement-based metric. Results of the new … continued below

Creation Information

Ashton, Triss A. August 2013.

Context

This dissertation is part of the collection entitled: UNT Theses and Dissertations and was provided by the UNT Libraries to the UNT Digital Library, a digital repository hosted by the UNT Libraries. It has been viewed 671 times. More information about this dissertation can be viewed below.

Author

Ashton, Triss A.

Chair

Evangelopoulos, Nicholas Committee Chair

Committee Members

Publisher

University of North Texas
Publisher Info: www.unt.edu

Place of Publication: Denton, Texas

Rights Holder

For guidance see Citations, Rights, Re-Use.

Ashton, Triss A.

Provided By

UNT Libraries

The UNT Libraries serve the university and community by providing access to physical and online collections, fostering information literacy, supporting academic research, and much, much more.

Degree Information

Department: Department of Information Technology and Decision Sciences
Discipline: Management Science
Level: Doctoral
Name: Doctor of Philosophy
Grantor: University of North Texas
PublicationType: Doctoral Dissertation

Description

Extracting meaningful information from large collections of text data is problematic because of the sheer size of the database. However, automated analytic methods capable of processing such data have emerged. These methods, collectively called text mining first began to appear in 1988. A number of additional text mining methods quickly developed in independent research silos with each based on unique mathematical algorithms. How good each of these methods are at analyzing text is unclear. Method development typically evolves from some research silo centric requirement with the success of the method measured by a custom requirement-based metric. Results of the new method are then compared to another method that was similarly developed. The proposed research introduces an experimentally designed testing method to text mining that eliminates research silo bias and simultaneously evaluates methods from all of the major context-region text mining method families. The proposed research method follows a random block factorial design with two treatments consisting of three and five levels (RBF-35) with repeated measures. Contribution of the research is threefold. First, the users perceived a difference in the effectiveness of the various methods. Second, while still not clear, there are characteristics with in the text collection that affect the algorithms ability to extract meaningful results. Third, this research develops an experimental design process for testing the algorithms that is adaptable into other areas of software development and algorithm testing. This design eliminates the bias based practices historically employed by algorithm developers.

Subjects

Keywords

Language

English

Item Type

Thesis or Dissertation

Identifier

Unique identifying numbers for this dissertation in the Digital Library or other systems.

Archival Resource Key: ark:/67531/metadc283791

Collections

This dissertation is part of the following collection of related materials.

UNT Theses and Dissertations

Theses and dissertations represent a wealth of scholarly and artistic content created by masters and doctoral students in the degree-seeking process. Some ETDs in this collection are restricted to use by the UNT community.

What responsibilities do I have when using this dissertation?

Creation Date

August 2013

Added to The UNT Digital Library

April 23, 2014, 8:20 p.m.

Description Last Updated

Dec. 14, 2023, 10:56 a.m.

Usage Statistics

When was this dissertation last used?

Yesterday: 0

Past 30 days: 0

Total Uses: 671

Ashton, Triss A. Accuracy and Interpretability Testing of Text Mining Methods, dissertation, August 2013; Denton, Texas. (https://digital.library.unt.edu/ark:/67531/metadc283791/: accessed May 23, 2024), University of North Texas Libraries, UNT Digital Library, https://digital.library.unt.edu; .

Accuracy and Interpretability Testing of Text Mining Methods

Description

Creation Information

Context

Who

Author

Chair

Committee Members

Publisher

Rights Holder

Provided By

UNT Libraries

Contact Us

What

Degree Information

Description

Subjects

Keywords

Language

Item Type

Identifier

Collections

UNT Theses and Dissertations

Digital Files

When

Creation Date

Added to The UNT Digital Library

Description Last Updated

Usage Statistics

Interact With This Dissertation

Search Inside

Start Reading

Citations, Rights, Re-Use

International Image Interoperability Framework

Print / Share

Links for Robots

Archival Resource Key (ARK)

International Image Interoperability Framework (IIIF)

Metadata Formats

Images

URLs

Stats