Structure and effectiveness of The Citation Identifier, an operational computer program for automatic identification of case citations in legal literature

A computer program for automatic identification of “full form” case citations in legal literature (e.g., Rutherford v. Geddes, 4 Wall. 220, 18 L. Ed. 343; Southland Industries, Inc. v. Federal Communications Commission, 1938, 69 App. D.C., 82, 99 F.2D 117) has been developed by this group and is now...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of the American Society for Information Science 1970-01, Vol.21 (1), p.8-15
Hauptverfasser: Borkowski, Casimir, Cepanec, Louis, Martin, J. Sperling, Salko, Virginia, Treu, Siegfried
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:A computer program for automatic identification of “full form” case citations in legal literature (e.g., Rutherford v. Geddes, 4 Wall. 220, 18 L. Ed. 343; Southland Industries, Inc. v. Federal Communications Commission, 1938, 69 App. D.C., 82, 99 F.2D 117) has been developed by this group and is now operational. The level of performance of this program known as “The Citation Identifier” is high. In a recent computer run, The Citation Identifier scanned the full texts of 191 randomly selected decisions of U.S. Court of Appeals (some 400,000 words of running text) and located correctly 2,220 full‐form citations out of a total of 2,227 (that is, better than 99% of the total). Only seven misses and three false drops occurred. Of 2,220 full‐form citations located correctly, 1944 (87%) were identified perfectly. In addition, there were 276 partial identifications containing two types of errors: (1) partial identifications in which some citation terms were mistakenly lopped off by the program (“short hits”); and (2) partial identifications that contained words improperly included in the citations (“long hits”). Both types of errors are, for the most part, easily correctible and can be largely eliminated by suitable changes in the program. The Citation Identifier operates rather rapidly. In a recent test run, the total time required to process some 400,000 running words of text was approximately 151/2 minutes. This speed could be further increased by suitable changes in the computer program. An extension of The Citation Identifier to reduced‐form citations (e.g., “the Geddes decision”, “the Southland Industries case”) is now in preparation.
ISSN:0002-8231
1097-4571
DOI:10.1002/asi.4630210104