population = a + b (ln area)
or
population = a + b (ln fishing miles)
where a and b are constants to be determined and ln is the logarithm to the base e.
Of course we would not expect these relationships to be precise. The lack of exactness might be due to the crudeness of the various measurements involved or perhaps to the fact that population depends on more than one such factor. To account in some way for the uncertainty, we might make a further assumption and propose the following relationships:
population = a + b (ln area) + X
population = a + b (ln fishing miles) + X
where X has a normal probability distribution with mean = 0 and some unknown variance = σ2. X is then, roughly speaking, the error involved in each observation. That the error would be distributed normally is quite reasonable under the circumstances. In situations where the uncertainty of the observation is due to measurement error or to a multiplicity of factors, the distribution obtained often assumes a normal form or a form sufficiently normal so that the normal distribution can be used as an approximation.
One additional assumption is necessary. We must assume that the sample used is taken in a random fashion from the population to be studied. In the present investigation, the sample is definitely not taken at random, since we are using all groups for which we have population estimates based on ethnographic information. The question is, then, whether this selection of groups would result in some bias. For instance, the groups for which we have ethnographic data might be the most numerous in the first place and might thus cause us overestimate the population of the remaining groups. On the whole, it would seem to me that there is no such bias and that the assumption of a random sample is therefore not misleading, at least in the direction of overestimation. If we now consider each group for which we have no ethnographic data, we can see whether the lack of such data is due to an initially small population or to mere luck.
Kato: The reason Kato population is being estimated in gross rather than from ethnographic data is that Goddard (1909, p. 67) obtained a list of more than 50 villages which are not available for calculation.
Bear River: Here the lack of information is due simply to the fact that it was not collected. There have been several informants living until recently (see Nomland, 1938).
Lassik: There was at least one good informant living until recently (Essene, 1942), but Merriam worked with her only briefly. Goddard evidently recorded a number of villages from this group, but his notes are lost.
Nongatl: Goddard seems to have worked with at least two informants from this group, but he spent a very brief time in the area and some of his notes may have been lost.
Shelter Cove Sinkyone: Several informants from this group have been alive until recently (see Nomland, 1935). No one saw fit to collect the appropriate data.
It is obvious from this summary that the main reason for our lack of information on these groups is the loss of Goddard's notes. If those were at hand, we would probably have complete information on the Kato, the Lassik, and probably the Nongatl. The absence of data on the Bear River and Shelter Cove Sinkyone is due to the ethnographers' oversight. None of these groups, therefore, seem to have been selected because of their small aboriginal population. If the following estimates are in error because the sample is not a random one, then the error is probably one of underestimate rather than overestimate.