Skip to content
This repository has been archived by the owner on Dec 25, 2020. It is now read-only.

Data Sources

Peter Rowland edited this page Feb 12, 2020 · 3 revisions

Privacy Impact Assessments (PIAs) and System of Record Notices (SORNs) are the primary sources of data we process and analyze. For Phase 3, we are working with GSA's PIAs and SORNs.

Privacy Impact Assessments (PIAs)

Sources

GSA publishes all system PIAs on its website - GSA PIAs. All agencies are required to have a page on its website to publish PIAs.

Formats

PDF only

Structure

GSA uses a template for PIAs. The sections that we extract information from are:

  • 3.2: Personally Identifiable Information (PII)
  • 1.1: Purpose of collection
  • 1.5: Retention Policy
  • 4.2: Sharing
  • 1.3: SORN-ID

Note: Older PIAs are not in this format, and in some cases the relevant information is contained in other sections of the document.

System of Record Notices (SORNs)

Sources

Formats

SORNs are available in both PDF and machine-readable (though not structured) XML.

Structure

These are the sections we extract information from are:

  • SYSTEM NAME
  • CATEGORIES OF RECORDS IN THE SYSTEM
  • PURPOSE
  • RETENTION AND DISPOSAL
  • ROUTINE USES OF RECORDS MAINTAINED IN THE SYSTEM INCLUDING CATEGORIES OF USERS AND THE PURPOSES OF SUCH USES
  • We also get the official document title, which is usually the first element in the <PRIACT> section.
Clone this wiki locally