Search Results

open access

Programmatic Extraction of ‘Documents’ from Web Archives: Identifying Document Characteristics from Content Selector Interviews

Description: White paper documenting the results of interviews with professionals who manage collections of state or federal documents, and institutional repositories. These interviews gathered information about collection policies and characteristics of born-digital publications that are incorporated into these bodies of materials, to inform future machine learning algorithms.
Date: 2020
Creator: Fox, Nathaniel T.; Phillips, Mark Edward & Tarver, Hannah
Partner: UNT Libraries
open access

Dynamic Classification in Web Archiving Collections

Description: Article explores dynamic fusion models to find, on the fly, the model or combination of models that performs best on a variety of document types. The experimental results show that the approach that fuses different models outperforms individual models and other ensemble methods on three datasets.
Date: May 2020
Creator: Phillips, Mark Edward; Patel, Krutarth & Caragea, Cornelia
Partner: UNT Libraries
Back to Top of Screen