The UNT Libraries serve the university and community by providing access to physical and online collections, fostering information literacy, supporting academic research, and much, much more.
This paper discusses the Classification of the End-of-Term Archive project.
Physical Description
4 p.
Notes
Reprinted with permission of IS&T: The Society for Imaging Science and Technology sole copyright owners of IS&T's Archiving 2012 proceedings.
Abstract: For users, selecting relevant content from Web archives is often a daunting endeavor. This Institute of Museum and Library Services (IMLS) funded research project, Classification of the End-of-Term Archive, investigated whether link analysis and the cluster analysis were effective techniques for classifying the materials in the EOT Archive to improve discovery. Classification of the resulting clusters by subject matter experts in government information indicated that the structural analysis was not effective at creating clusters of related websites when authored by four or fewer federal government parent agencies. The results also suggested that cluster analysis might be effective at identifying topically related websites across agency authors, which would be highly desirable to both system developers and users. To investigate this, subject matter experts applied subject tags to the websites in two sets of machine-generated clusters. The findings indicate that the cluster analysis successfully identified strongly related content in 61% of clusters.
This paper is part of the following collection of related materials.
UNT Scholarly Works
Materials from the UNT community's research, creative, and scholarly activities and UNT's Open Access Repository. Access to some items in this collection may be restricted.
Poster presented at the 2012 IS&T Archiving Conference. This poster discusses a research project on classifying the End-of-Term Archive. This Institute of Museum and Library Services (IMLS) funded research project investigates whether link analysis and cluster analysis were effective techniques for classifying the materials in the End-of-Term archive to improve discovery.