GROSS ESTIMATE
From the preceding data we have obtained population estimates for certain of the California Athabascan groups. If these estimates are judged reliable, it would be desirable to use them as a basis for estimating the population of the remaining groups. When a detailed analysis of the ecological or demographical factors involved is lacking, it is sometimes necessary to fall back on rather simplistic assumptions to attain the desired end. Cook goes rather far in this direction, using simply the average population density per square mile of the known groups to estimate the population of the unknown groups.
It appears to this writer that a somewhat more satisfactory method of estimation would be based on simple linear regression theory. It is a fact that pertinent relationships in population studies can often be expressed in terms of simple exponential functions or in linear combinations of logarithms. Thus we might propose a relationship such as the following:
population = a + b (ln area)
or
population = a + b (ln fishing miles)
where a and b are constants to be determined and ln is the logarithm to the base e.
Of course we would not expect these relationships to be precise. The lack of exactness might be due to the crudeness of the various measurements involved or perhaps to the fact that population depends on more than one such factor. To account in some way for the uncertainty, we might make a further assumption and propose the following relationships:
population = a + b (ln area) + X
population = a + b (ln fishing miles) + X
where X has a normal probability distribution with mean = 0 and some unknown variance = σ2. X is then, roughly speaking, the error involved in each observation. That the error would be distributed normally is quite reasonable under the circumstances. In situations where the uncertainty of the observation is due to measurement error or to a multiplicity of factors, the distribution obtained often assumes a normal form or a form sufficiently normal so that the normal distribution can be used as an approximation.
One additional assumption is necessary. We must assume that the sample used is taken in a random fashion from the population to be studied. In the present investigation, the sample is definitely not taken at random, since we are using all groups for which we have population estimates based on ethnographic information. The question is, then, whether this selection of groups would result in some bias. For instance, the groups for which we have ethnographic data might be the most numerous in the first place and might thus cause us overestimate the population of the remaining groups. On the whole, it would seem to me that there is no such bias and that the assumption of a random sample is therefore not misleading, at least in the direction of overestimation. If we now consider each group for which we have no ethnographic data, we can see whether the lack of such data is due to an initially small population or to mere luck.
Kato: The reason Kato population is being estimated in gross rather than from ethnographic data is that Goddard (1909, p. 67) obtained a list of more than 50 villages which are not available for calculation.
Bear River: Here the lack of information is due simply to the fact that it was not collected. There have been several informants living until recently (see Nomland, 1938).
Lassik: There was at least one good informant living until recently (Essene, 1942), but Merriam worked with her only briefly. Goddard evidently recorded a number of villages from this group, but his notes are lost.
Nongatl: Goddard seems to have worked with at least two informants from this group, but he spent a very brief time in the area and some of his notes may have been lost.
Shelter Cove Sinkyone: Several informants from this group have been alive until recently (see Nomland, 1935). No one saw fit to collect the appropriate data.
It is obvious from this summary that the main reason for our lack of information on these groups is the loss of Goddard's notes. If those were at hand, we would probably have complete information on the Kato, the Lassik, and probably the Nongatl. The absence of data on the Bear River and Shelter Cove Sinkyone is due to the ethnographers' oversight. None of these groups, therefore, seem to have been selected because of their small aboriginal population. If the following estimates are in error because the sample is not a random one, then the error is probably one of underestimate rather than overestimate.
Given the foregoing assumptions, the least squares estimate of the normal regression line may be obtained with the following formula.
P: population. A: area. F: fishing miles.
The equations of the lines are:
P = a + b (ln A)
P = a' + b' (ln F)
the estimate of b is (Bennett and Franklin, 1954, p. 224)
Σ(Xi − X̅)(Yi − Y̅)
b̂ = ------------------------------------
Σ(Xi − X)2
and of a is
â = Y̅ − b̂X̅
where Xi = ln A for each group with known population and Yi = P for each known group.
Similarly the estimate of b' is
Σ(Xi − X̅)(Yi − Y̅)
b̂' = ----------------------------------
Σ(Xi − X̅)2
and of a' is
â' = Y̅ − b̂'X̅
where Xi = ln F for each known group and Yi = P for each known group. These calculations are shown in table 4.
TABLE 4
Calculation of Regression Lines Shown in Figure 2
| Fishing Miles | ||||
| (Xi − X̅) | (Yi − Y̅) | (Xi − X̅)·(Yi − Y̅) | (Xi − X̅)2 | |
| -.452 | -.027 | .012 | .204 | |
| -.882 | -.579 | .511 | .778 | |
| .058 | -.483 | -.028 | .003 | |
| .548 | .393 | .215 | .300 | |
| .068 | -.208 | -.014 | .005 | |
| .658 | .905 | .595 | .433 | |
| Total. | ... | ... | 1.291 | 1.723 |
| Area | ||||
| (Xi − X̅) | (Yi − Y̅) | (Xi − X̅)·(Yi − Y̅) | (Xi − X̅)2 | |
| .041 | -.027 | -.001 | .002 | |
| -.445 | .579 | .258 | .198 | |
| -.514 | -.483 | .248 | .264 | |
| .034 | .393 | .013 | .001 | |
| .400 | -.208 | -.083 | .160 | |
| .484 | .905 | .438 | .234 | |
| Total. | ... | ... | .873 | .859 |
The results are the following equations, which are shown, together with the points from which they were calculated, on figure 2.
P = 1.02 (ln A) − 4.06
P = .75 (ln F) − 1.00
Thus, given either the area of a group or the fishing miles of a group habitat, we may estimate its population. From the diagram in figure 2 it appears that the estimates based on area have greater dispersion than those based on fishing miles and are therefore less reliable. This fact can best be made precise by using the above assumptions to obtain the confidence intervals for each of the estimates. The confidence intervals for the area estimates are given by the following formula (Bennett and Franklin, 1954, p. 229).
{1 (Xo − X̅)2 }
1.02 Xo − 4.06 ± t∝Sa × √{- + -----------}
{6 Σ(Xi − X̅)2}
where the symbols have the following values and meanings:
[10.6] Xo: the log of the area of the group for which the population is being estimated.
Xi: the log of the area of each of the groups for which the population is already known.
X̅: the average of the Xi.
t∝: the upper ∝-point of the t-distribution (Bennett and Franklin, 1954, p. 696) where 1-∝ is the confidence coefficient.
{1 }
Sa = √{- × Σ(Yi + 4.06 − 1.02Xi)2}
{4 }where Yi is the population of each of the groups for which population is known. This is the estimated standard deviation of population where the estimate is made from area.
Fig. 2. Simple linear regression of population. a. Regression of population on ln area. b. Regression of population on ln fishing miles.
The confidence intervals for the fishing-mile estimates may be obtained in similar fashion—simply substituting the words fishing mile for area and Sf for Sa.
For calculating the confidence intervals for area we have the following quantities:
X̅ = 5.56
t.2 = 1.533
Σ(Xi − X̅)2 = .859
Sa = .3594
The calculations are shown in table 5.
The comparable quantities in calculating the confidence intervals for fishing-mile estimates are:
X̅ = 3.70
t.2 = 1.533
Σ(Xi − X̅)2 = .932
Sf = .394
The calculations are shown in table 6.
TABLE 5
Calculation of Confidence Intervals for Area
| Tribe | Xo | (Xo − X̅) | (Xo − X̅)2 --------------- Σ((Xi − X̅)2) | { (Xo − X̅)2} √{1/6 + ----------------} { Σ((Xi − X̅)2)} | { (Xo − X̅)2} t.2Sa × √{1/6 + ----------------} { Σ((Xi − X̅)2)} |
| Kato | 5.42 | -.23 | .0616 | .4778 | .263 |
| Bear River | 4.80 | -.83 | .8510 | 1.0088 | .556 |
| Lassik | 5.96 | .31 | .1119 | .5278 | .291 |
| Nongatl | 6.75 | 1.10 | 1.4086 | 1.2551 | .692 |
| Shelter Cove Sinkyone | 5.86 | .21 | .0513 | .4669 | .257 |
TABLE 6
Calculation of Fishing-Mile Estimates
| Tribe | Xo | (Xo − X̅) | (Xo − X̅)2 --------------- Σ((Xi − X̅)2) | { (Xo − X̅)2} √{1/6 + ----------------} { Σ((Xi − X̅)2)} | { (Xo − X̅)2} t.2Sf × √{1/6 + ----------------} { Σ((Xi − X̅)2)} |
| Kato | 3.37 | -.22 | .0281 | .4414 | .267 |
| Bear River | 3.04 | -.55 | .1756 | .5851 | .353 |
| Lassik | 3.22 | -.37 | .0795 | .4962 | .300 |
| Nongatl | 4.44 | .85 | .4193 | .7655 | .462 |
| Shelter Cove Sinkyone | 4.20 | .67 | .2160 | .6186 | .374 |
The results of the calculations are given in table 7. The figures are point estimates with 80 per cent confidence intervals. This means that under the assumptions given earlier we expect that the tabled intervals will contain the true population 8 times out of 10. I have accepted the estimates derived from fishing miles because their confidence intervals are a bit shorter on the average.
TABLE 7
Population Estimates and Confidence Intervals
| Tribe | Fishing-mile Estimate | Area Estimate |
| Kato | 1,523 ± 267 | 1,470 ± 263 |
| Bear River | 1,276 ± 353 | 840 ± 556 |
| Lassik | 1,411 ± 300 | 2,020 ± 291 |
| Nongatl | 2,325 ± 462 | 2,830 ± 692 |
| Shelter Cove Sinkyone | 2,145 ± 374 | 1,920 ± 257 |
The question of whether the fishing-mile estimates yield shorter confidence intervals than the area estimates brings up an entire range of problems pertaining to economy, settlement pattern, and the like. The obvious interpretation of the shorter confidence intervals would be that the economy of the people in question depended more on fish and fishing than on the general produce over the whole range of their territory. The question then becomes one of quantitative expression—we would like to have some index of the extent of dependence on various factors in the economy. This might best be approached from the standpoint of analysis of covariance, where we would obtain the "components of variance." This technique is a combination of the methods of regression used in this paper and those of the analysis of variance. It would evidently yield sound indices of economic components, but it involves, for myself at least, certain problems of calculation and interpretation which will have to be resolved in the future.
Another problem of this kind turns on the question of which factors are important in which area. Considering the State of California, for instance, we might want to know about such factors as deer population, water supply, the quantity of oak trees, etc. Any one of these factors or any combination of them might be important in a particular area; the problem of gathering the pertinent information then becomes crucial. Moreover, because the situation has changed since aboriginal times, we must combine modern information with available historic sources. S. F. Cook has shown that energetic and imaginative use of these sources yields very good results (e.g., Cook, 1955).
Finally, there is the problem of the assumptions we were required to make in order to obtain our population estimates. Although many of the assumptions in the present paper are difficult to assess, the two which I would like to discuss here were particularly unyielding—the assumptions of the number of persons per house and the assumptions of the number of houses per village.
The question of how many persons there were per house has been dealt with extensively by both Kroeber and Cook. There is also a great deal of random information in the ethnographic and historical literature. I believe there are enough data now at hand to provide realistic limits within which we could work, at least for the State of California. This information should be assembled and put into concise and systematic form so that it would be available for use in each area. It would also be of interest in itself from the standpoint of social anthropology.
For the number of houses per village we have also a considerable body of information, but here we are faced with a slightly different problem. It often happens that we know, from ethnographic information or from archaeological reconnaissance, how many house pits there are in a village site but do not know how many of the houses which these pits represent were occupied simultaneously. In the present paper it has been assumed that four-fifths of the house pits represents the number of houses in the village occupied at any one time. This, however, is simply a guess, and one has no way of knowing how accurate a guess. The solution to this problem is simple but laborious. From each area of the State a random sample of villages with recorded house counts should be taken. Each of these village sites should then be visited and the house pits counted. A comparison of the two sets of figures would give us a perfectly adequate estimate, which could then be used subsequently over the entire area.
TABLE 8
Population Estimates
| Tribe | Area (sq. mi.) | Fishing Miles | Pop. Estimate | Area Density | Fishing-mile Density | Kroeber[5] Estimate | Cook[6] Estimate |
| Kato[4] | 225 | 29 | 1,523 | 6.77 | 52.5 | 500 | 1,100 |
| Wailaki | 296 | 23 | 1,656 | 5.59 | 72.0 | 600 | 2,315 |
| Pitch Wailaki | 182 | 15 | 1,104 | 6.07 | 73.6 | 400 | 1,032 |
| Lassik[4] | 389 | 25 | 1,411 | 3.63 | 56.4 | 500 | 1,500 |
| Shelter Cove Sinkyone[4] | 350 | 67 | 2,145 | 6.13 | 32.0 | 375 | 1,450 |
| Lolangkok | 294 | 63 | 2,076 | 7.06 | 33.0 | 375 | 1,450 |
| Sinkyone Mattole | 170 | 38.5 | 1,200 | 7.06 | 31.2 | 350 | 840 |
| Bear River[4] | 121 | 21 | 1,276 | 10.55 | 60.8 | 150 | 360 |
| Nongatl[4] | 855 | 85 | 2,325 | 2.72 | 27.4 | 750 | 3,300 |
| Whilkut | 461 | 70 | 2,588 | 5.61 | 37.0 | 1,000 | 2,100 |
| Hupa | 424 | 39 | 1,475 | 3.48 | 37.8 | 1,000 | 2,000 |
| Total | 3,767 | 475.5 | 18,779 | 4.99 | 39.5 | 6,000 | 17,447 |
[4] The population figures for these groups are estimated in the gross by the method indicated in the text.
[5] Kroeber, 1925a, p. 883. The breakdown has been changed somewhat to accommodate boundary changes; the total remains the same. The population density, according to Kroeber's figures, is 1.6 persons per sq. mi.
[6] Cook, 1956. The breakdown has been changed somewhat to accommodate boundary changes; the total remains the same. The population density, according to Cook's figures, is 4.6 persons per sq. mi.
The corpus of information provided by the methods outlined above would be useful in two ways. First, it would clarify our definitions of the economic factors in the lives of hunter-gatherers. Functional hypotheses which postulate dependence of social factors on economy would be subject to objective, quantitative tests of their validity.
Second, the corpus of information would afford a suitable basis for inference from archaeological data. If we can determine what were the major economic factors in the lives of a prehistoric people, then we can make assertions about population, settlement pattern, and the like. Conversely, information about population and settlement pattern would imply certain facts about the economy. This technique has already been developed to some extent. For instance, Cook and Heizer, depending on assumptions derived from ethnographic data (Cook and Treganza, 1950; Heizer, 1953; Heizer and Baumhoff, 1956), have made inferences concerning village populations. These methods have such great possibilities for the conjunctive approach in archaeology that their use should be extended as much as possible.