The two other studies, one by David Biek, head librarian at the Tacoma Public Library's main branch, and one by Cory Finnell of Certus Consulting Group, of Seattle, Washington, chose actual logs of Web pages visited by library patrons during specific time periods as the universe of Web pages to analyze. This method, while surely not as accurate as a truly random sample of the indexed Web would be (assuming it would be possible to take such a sample), has the virtue of using the actual Web sites that library patrons visited during a specific period. Because library patrons selected the universe of Web sites that Biek and Finnell's studies analyzed, this removes the possibility of bias resulting from the study author's selection of the universe of sites to be reviewed. We find that the Lemmons and Hunter studies are of little probative value because of the methodology used to select the sample universe of Web sites to be tested. We will therefore focus on the studies conducted by Finnell and Biek in trying to ascertain estimates of the rates of over- and underblocking that takes place when filters are used in public libraries. The government hired expert witness Cory Finnell to study the Internet logs compiled by the public libraries systems in Tacoma, Washington; Westerville, Ohio; and Greenville, South Carolina. Each of these libraries uses filtering software that keeps a log of information about individual Web site requests made by library patrons. Finnell, whose consulting firm specializes in data analysis, has substantial experience evaluating Internet access logs generated on networked systems. He spent more than a year developing a reporting tool for N2H2, and, in the course of that work, acquired a familiarity with the design and operation of Internet filtering products.

The Tacoma library uses Cyber Patrol filtering software, and logs information only on sites that were blocked. Finnell worked from a list of all sites that were blocked in the Tacoma public library in the month of August 2001. The Westerville library uses the Websense filtering product, and logs information on both blocked sites and non-blocked sites. When the logs reach a certain size, they are overwritten by new usage logs. Because of this overwriting feature, logs were available to Finnell only for the relatively short period from October 1, 2001 to October 3, 2001. The Greenville library uses N2H2's filtering product and logs both blocked sites and sites that patrons accessed. The logs contain more than 500,000 records per day. Because of the volume of the records, Finnell restricted his analysis to the period from August 2, 2001 to August 15, 2001.

Finnell calculated an overblocking rate for each of the three libraries by examining the host Web site containing each of the blocked pages. He did not employ a sampling technique, but instead examined each blocked Web site. If the contents of a host Web site or the pages within the Web site were consistent with the filtering product's definition of the category under which the site was blocked, Finnell considered it to be an accurate block. Finnell and three others, two of whom were temporary employees, examined the Web sites to determine whether they were consistent with the filtering companies' category definitions. Their review was, of course, necessarily limited by: (1) the clarity of the filtering companies' category definitions; (2) Finnell's and his employees' interpretations of the definitions; and (3) human error. The study's reliability is also undercut by the fact that Finnell failed to archive the blocked Web pages as they existed either at the point that a patron in one of the three libraries was denied access or when Finnell and his team reviewed the pages. It is therefore impossible for anyone to check the accuracy and consistency of Finnell's review team, or to know whether the pages contained the same content when the block occurred as they did when Finnell's team reviewed them. This is a key flaw, because the results of the study depend on individual determinations as to overblocking and underblocking, in which Finnell and his team were required to compare what they saw on the Web pages that they reviewed with standard definitions provided by the filtering company.

Tacoma library's Cyber Patrol software blocked 836 unique Web sites during the month of August. Finnell determined that 783 of those blocks were accurate and that 53 were inaccurate. The error rate for Cyber Patrol was therefore estimated to be 6.34%, and the true error rate was estimated with 95% confidence to lie within the range of 4.69% to 7.99%. Finnell and his team reviewed 185 unique Web sites that were blocked by Westerville Library's Websense filter during the logged period and determined that 158 of them were accurate and that 27 of them were inaccurate. He therefore estimated the Websense filter's overblocking rate at 14.59% with a 95% confidence interval of 9.51% to 19.68%. Additionally, Finnell examined 1,674 unique Web sites that were blocked by the Greenville Library's N2H2 filter during the relevant period and determined that 1,520 were accurate and that 87 were inaccurate. This yields an estimated overblocking rate of 5.41% and a 95% confidence interval of 4.33% to 6.55%. Finnell's methodology was materially flawed in that it understates the rate of overblocking for the following reasons. First, patrons from the three libraries knew that the filters were operating, and may have been deterred from attempting to access Web sites that they perceived to be "borderline" sites, i.e., those that may or may not have been appropriately filtered according to the filtering companies' category definitions. Second, in their cross-examination of Finnell, the plaintiffs offered screen shots of a number of Web sites that, according to Finnell, had been appropriately blocked, but that Finnell admitted contained only benign materials. Finnell's explanation was that the Web sites must have changed between the time when he conducted the study and the time of the trial, but because he did not archive the images as they existed when his team reviewed them for the study, there is no way to verify this. Third, because of the way in which Finnell counted blocked Web sites i.e., if separate patrons attempted to reach the same Web site, or one or more patrons attempted to access more than one page on a single Web site, Finnell counted these attempts as a single block, see supra note 10 his results necessarily understate the number of times that patrons were erroneously denied access to information.

At all events, there is no doubt that Finnell's estimated rates of overblocking, which are based on the filtering companies' own category definitions, significantly understate the rate of overblocking with respect to CIPA's category definitions for filtering for adults. The filters used in the Tacoma, Westerville, and Greenville libraries were configured to block, among other things, images of full nudity and sexually explicit materials. There is no dispute, however, that these categories are far broader than CIPA's categories of visual depictions that are obscene, or child pornography, the two categories of material that libraries subject to CIPA must certify that they filter during adults' use of the Internet. Finnell's study also calculated underblocking rates with respect to the Westerville and Greenville Libraries (both of which logged not only their blocked sites, but all sites visited by their patrons), by taking random samples of URLs from the list of sites that were not blocked. The study used a sample of 159 sites that were accessed by Westerville patrons and determined that only one of them should have been blocked under the software's category definitions, yielding an underblocking rate of 0.6%. Given the size of the sample, the 95% confidence interval is 0% to 1.86%. The study examined a sample of 254 Web sites accessed by patrons in Greenville and found that three of them should have been blocked under the filtering software's category definitions. This results in an estimated underblocking rate of 1.2% with a 95% confidence interval ranging from 0% to 2.51%.

We do not credit Finnell's estimates of the rates of underblocking in the Westerville and Greenville public libraries for several reasons. First, Finnell's estimates likely understate the actual rate of underblocking because patrons, who knew that filtering programs were operating in the Greenville and Westerville Libraries, may have refrained from attempting to access sites with sexually explicit materials, or other contents that they knew would probably meet a filtering program's blocked categories. Second, and most importantly, we think that the formula that Finnell used to calculate the rate of underblocking in these two libraries is not as meaningful as the formula that information scientists typically use to calculate a rate of recall, which we describe above in Subsection II.E.3. As Dr. Nunberg explained, the standard method that information scientists use to calculate a rate of recall is to sort a set of items into two groups, those that fall into a particular category (e.g., those that should have been blocked by a filter) and those that do not. The rate of recall is then calculated by dividing the number of items that the system correctly identified as belonging to the category by the total number of items in the category.

In the example above, we discussed a database that contained 1000 photographs. Assume that 200 of these photographs were pictures of dogs. If, for example, a classification system designed to identify pictures of dogs identified 80 of the dog pictures and failed to identify 120, it would have performed with a recall rate of 40%. This would be analogous to a filter that underblocked at a rate of 60%. To calculate the recall rate of the filters in the Westerville and Greenville public libraries in accordance with the standard method described above, Finnell should have taken a sample of sites from the libraries' Internet use logs (including both sites that were blocked and sites that were not), and divided the number of sites in the sample that the filter incorrectly failed to block by the total number of sites in the sample that should have been blocked. What Finnell did instead was to take a sample of sites that were not blocked, and divide the total number of sites in this sample by the number of sites in the sample that should have been blocked. This made the denominator that Finnell used much larger than it would have been had he used the standard method for calculating recall, consequently making the underblocking rate that he calculated much lower than it would have been under the standard method.

Moreover, despite the relatively low rates of underblocking that Finnell's study found, librarians from several of the libraries proffered by defendants that use blocking products, including Greenville, Tacoma, and Westerville, testified that there are instances of underblocking in their libraries. No quantitative evidence was presented comparing the effectiveness of filters and other alternative methods used by libraries to prevent patrons from accessing visual depictions that are obscene, child pornography, or in the case of minors, harmful to minors. Biek undertook a similar study of the overblocking rates that result from the Tacoma Library's use of the Cyber Patrol software. He began with the 3,733 individual blocks that occurred in the Tacoma Library in October 2000 and drew from this data set a random sample of 786 URLs. He calculated two rates of overblocking, one with respect to the Tacoma Library's policy on Internet use that the pictorial content of the site may not include "graphic materials depicting full nudity and sexual acts which are portrayed obviously and exclusively for sensational or pornographic purposes" and the other with respect to Cyber Patrol's own category definitions. He estimated that Cyber Patrol overblocked 4% of all Web pages in October 2000 with respect to the definitions of the Tacoma Library's Internet Policy and 2% of all pages with respect to Cyber Patrol's own category definitions.

It is difficult to determine how reliable Biek's conclusions are, because he did not keep records of the raw data that he used in his study; nor did he archive images of the Web pages as they looked when he made the determination whether they were properly classified by the Cyber Patrol program. Without this information, it is impossible to verify his conclusions (or to undermine them). And Biek's study certainly understates Cyber Patrol's overblocking rate for some of the same reasons that Finnell's study likely understates the true rates of overblocking used in the libraries that he studied. We also note that Finnell's study, which analyzed a set of Internet logs from the Tacoma Library during which the same filtering program was operating with the same set of blocking categories enabled, found a significantly higher rate of overblocking than the Biek study did. Biek found a rate of overblocking of approximately 2% while the Finnell study estimated a 6.34% rate of overblocking. At all events, the category definitions employed by CIPA, at least with respect to adult use visual depictions that are obscene or child pornography are narrower than the materials prohibited by the Tacoma Library policy, and therefore Biek's study understates the rate of overblocking with respect to CIPA's definitions for adults. In sum, we think that Finnell's study, while we do not credit its estimates of underblocking, is useful because it states lower bounds with respect to the rates of overblocking that occurred when the Cyber Patrol, Websense, and N2H2 filters were operating in public libraries. While these rates are substantial between nearly 6% and 15% we think, for the reasons stated above, that they greatly understate the actual rates of overblocking that occurs, and therefore cannot be considered as anything more than minimum estimates of the rates of overblocking that happens in all filtering programs. 5. Methods of Obtaining Examples of Erroneously Blocked Web Sites

The plaintiffs assembled a list of several thousand Web sites that they contend were, at the time of the study, likely to have been erroneously blocked by one or more of four major commercial filtering programs: SurfControl Cyber Patrol 6.0.1.47, N2H2 Internet Filtering 2.0, Secure Computing SmartFilter 3.0.0.01, and Websense Enterprise 4.3.0. They compiled this list using a two-step process. First, Benjamin Edelman, an expert witness who testified before us, compiled a list of more than 500,000 URLs and devised a program to feed them through all four filtering programs in order to compile a list of URLs that might have been erroneously blocked by one or more of the programs. Second, Edelman forwarded subsets of the list that he compiled to librarians and professors of library science whom the plaintiffs had hired to review the blocked sites for suitability in the public library context. Edelman assembled the list of URLs by compiling Web pages that were blocked by the following categories in the four programs: Cyber Patrol: Adult/Sexually Explicit; N2H2: Adults Only, Nudity, Pornography, and Sex, with "exceptions" engaged in the categories of Education, For Kids, History, Medical, Moderated, and Text/Spoken Only; SmartFilter: Sex, Nudity, Mature, and Extreme; Websense: Adult Content, Nudity, and Sex.