Search Results

Portal to Texas History Newspaper OCR Text Dataset: Temple

Description: Dataset of OCR text from The Portal to Texas History and the Texas Digital Newspaper Program. This dataset includes titles from Temple Texas from the years 1907 to 1922. Titles included in this dataset include: Temple Daily Telegram. In all there are 4,627 issues comprised of 44,633 pages of text.
Date: November 12, 2015
Creator: Phillips, Mark Edward

Quality Assurance Practices in Web Archiving [Dataset]

Description: This dataset contains the results of a survey of quality assurance practices within the field of web archiving and its practitioners. To understand current QA practices, the authors surveyed institutions engaged in web archiving, which included national libraries, colleges and universities, and museums and art libraries. The survey was administered online. It includes the completed responses of 54 participants. The data has been anonymized for privacy reasons. This dataset was used in the "Curr… more
Date: December 2014
Creator: Reyes Ayala, Brenda; Phillips, Mark Edward & Ko, Lauren

[Response Data: Survey of Benchmarks in Metadata Quality]

Description: Complete, anonymized dataset of responses to the Survey of Benchmarks in Metadata Quality. Date, time, IP addresses, and geographic data has been omitted. Responses that included project, organization, and/or repository names were removed from this data, as well as potentially identifying names, acronyms, and/or links.
Date: July 2019
Creator: Digital Library Federation. Assessment Interest Group. Metadata Working Group. Benchmarks Sub-Group.

Restricted University of North Texas Electronic Theses and Dissertations

Description: This dataset contains responses to a survey questionnaire distributed by the University of North Texas (UNT) Libraries asking 125 authors of electronic theses and dissertations (ETDs) whether they agree to change the existing restricted permission status on their ETDs.
Date: February 24, 2014
Creator: Alemneh, Daniel Gelaw

Ruth Bader Ginsburg Remembrance Twitter Dataset

Description: This dataset contains Twitter JSON data for Tweets related to the passing of Ruth Bader Ginsburg on September 18, 2020. This dataset was created using the twarc (https://github.com/edsu/twarc) package that makes use of Twitter's search API. A total of 4,195,270 Tweets make up the combined dataset.
Date: 2020-09-10/2020-10-04
Creator: Phillips, Mark Edward

"Stand With Wendy" Twitter Dataset

Description: This dataset contains Twitter JSON data for several Twitter search queries collected the week following the filibuster by Wendy Davis in the Texas Senate related to Senate Bill 5, using the twarc (https://github.com/edsu/twarc) package that makes use of Twitter's search API. A total of 560,954 Tweets make up the combined dataset.
Date: 2013-06-25/2013-07-03
Creator: Phillips, Mark Edward

Texas Digital Newspaper Program Issue Dataset for IFLA/Rootstech Analysis

Description: This dataset contains the descriptive metadata harvested from the Texas Digital Newspaper Program collection on The Portal to Texas History and is accompanied by a dataset derived from the harvested metadata. This dataset was used for an IFLA Newspaper Section and Rootstech presentation.
Date: January 16, 2014
Creator: Phillips, Mark Edward & Krahmer, Ana

Texas Newspapers Natural Language Processing

Description: This dataset includes data on natural language processing from the Texas Newspapers Project. The dataset includes word counts, name entity recognition results, and topic models.
Date: April 7, 2013
Creator: Torget, Andrew J., 1978-

Tropical Storm Imelda Twitter Dataset

Description: This dataset contains Twitter JSON data for Tweets related to Tropical Storm Imelda and the subsequent flooding in the south Texas region. This dataset was created using the twarc (https://github.com/DocNow/twarc) package that makes use of Twitter's search API. A total of 76,420 Tweets and 4,429 media files make up the combined dataset.
Date: 2019-09-10/2019-09-21
Creator: Phillips, Mark Edward

[U.S. Patent OCR Files: Disk USP001]

Description: This dataset contains the compiled Optical Character Recognition (OCR) text files for the content of patent grants issued by the United States Patent Office from 1 to 469,664 (inclusive).
Date: 2013
Creator: Phillips, Mark Edward

[U.S. Patent OCR Files: Disk USP002]

Description: This dataset contains the compiled Optical Character Recognition (OCR) text files for the content of patent grants issued by the United States Patent Office from ## to ## (non-inclusive).
Date: 2013
Creator: Phillips, Mark Edward

[U.S. Patent OCR Files: Disk USP003]

Description: This dataset contains the compiled Optical Character Recognition (OCR) text files for the content of patent grants issued by the United States Patent Office from ## to ## (non-inclusive).
Date: 2013
Creator: Phillips, Mark Edward

[U.S. Patent OCR Files: Disk USP004]

Description: This dataset contains the compiled Optical Character Recognition (OCR) text files for the content of patent grants issued by the United States Patent Office from ## to ## (non-inclusive).
Date: 2013
Creator: Phillips, Mark Edward

[U.S. Patent OCR Files: Disk USP005]

Description: This dataset contains the compiled Optical Character Recognition (OCR) text files for the content of patent grants issued by the United States Patent Office from ## to ## (non-inclusive).
Date: 2013
Creator: Phillips, Mark Edward

[U.S. Patent OCR Files: Disk USP006]

Description: This dataset contains the compiled Optical Character Recognition (OCR) text files for the content of patent grants issued by the United States Patent Office from ## to ## (non-inclusive).
Date: 2013
Creator: Phillips, Mark Edward

[U.S. Patent OCR Files: Disk USP007]

Description: This dataset contains the compiled Optical Character Recognition (OCR) text files for the content of patent grants issued by the United States Patent Office from ## to ## (non-inclusive).
Date: 2013
Creator: Phillips, Mark Edward

[U.S. Patent OCR Files: Disk USP008]

Description: This dataset contains the compiled Optical Character Recognition (OCR) text files for the content of patent grants issued by the United States Patent Office from ## to ## (non-inclusive).
Date: 2013
Creator: Phillips, Mark Edward

[U.S. Patent OCR Files: Disk USP009]

Description: This dataset contains the compiled Optical Character Recognition (OCR) text files for the content of patent grants issued by the United States Patent Office from ## to ## (non-inclusive).
Date: 2013
Creator: Phillips, Mark Edward

[U.S. Patent OCR Files: Disk USP010]

Description: This dataset contains the compiled Optical Character Recognition (OCR) text files for the content of patent grants issued by the United States Patent Office from ## to ## (non-inclusive).
Date: 2013
Creator: Phillips, Mark Edward
Back to Top of Screen