Analytic Approach

If 32.6 percent of male users and 9.2 percent of female users report their build as "athletic," and users were contacting each other randomly but in heterosexual pairs, we would expect 0.326 * .092, or 3.0 percent, of contacts to involve two users of athletic build. However, if users of athletic build sought other such users more often, the percentage of contacts involving two of these users would exceed 3.0 percent; if these users avoided each other, the percentage would be lower.

Figure 2.6. At top, distribution of men and women by marital status. At bottom, communication via private messaging is overlaid. Marital status was the most strongly bounding characteristic, with most communication occurring between two users who share the same status. These images were created with the custom visualization tool. See Appendix B, Figure B.13 for another example.

Widowed Separated Divorced Married In a relationship Never married (Invalid) Prefer not to answer

Widowed Separated Divorced Married In a relationship Never married (Invalid) Prefer not to answer

Women mwmmt

Expected Actual percent Actual percent percent same (all same (recip.

Characteristic

same (x)

contacts, a7)

con. only, a2)

t (a2 * x)

Marital status

31.6

51.7

1.64x)

56.0

1.77x)

76.001 t

Wants children

25.1

38.7

1.54x)

40.5

(1.61 x)

48.553t

Num. of children

27.8

38.7

1.39x)

38.6

1.39x)

34.352t

Physical build

19.2

24.5

1.28x)

25.6

1.33x)

22.435t

Smoking

40.5

50.6

1.25x)

54.0

1.33x)

41.979t

Phys. appearance

37.6

46.1

1.23x)

49.2

(1.31 x)

35.886t

Educational level

23.6

28.0

1.19x)

29.3

1.24x)

19.360t

Religion

42.4

49.7

1.17x)

52.6

1.24x)

31.589t

Race

71.1

81.2

1.14x)

85.9

(1.21 x)

65.808t

Drinking habits

61.2

68.7

1.12x)

73.4

1.20x)

42.692t

Pet preferences

34.7

38.5

(1.11 x)

39.9

1.15x)

16.425*

Pets owned

21.8

23.6

1.08x)

24.0

1.10x)

8.038*

t d.f. =

23,940; p < 0.001 * d.f.

= 23,855; p < 0.001

Table 2.2. Bounding strength of categorical characteristics. Expected percent same indicates the statistically expected percentage of dyadic pairs who share the same value for the listed characteristic. The expected probability is based on random selection from the male and female population distributions for the characteristic. Actual percent same indicates the empirical percentage of dyadic pairs who shared the same value for the listed characteristic, across all contacts and just the reciprocated subset, in which the initial recipient replied.

Table 2.2. Bounding strength of categorical characteristics. Expected percent same indicates the statistically expected percentage of dyadic pairs who share the same value for the listed characteristic. The expected probability is based on random selection from the male and female population distributions for the characteristic. Actual percent same indicates the empirical percentage of dyadic pairs who shared the same value for the listed characteristic, across all contacts and just the reciprocated subset, in which the initial recipient replied.

By summing the probability of sameness across all possible values of a characteristic, we find an overall probability that a random pair of one male and one female user will share the same value for that characteristic. These overall probabilities are listed in Table 3.2 as Expected percent same. The expected sameness for a characteristic varies with the number of values possible for that characteristic and how evenly users are distributed among the values. Expected sameness is higher when the number of values is low, as with Physical appearance ("Very attractive," "Attractive," "Average," "Prefer not to answer"), and when many users have picked the same value for a characteristic, as with Race (83.7 percent reported "Caucasian").

Having calculated the expected sameness, I computed the actual percentage of dyads with the same value for each categorical characteristic both for all pairwise exchanges and separately for the subset that were reciprocated. The absolute value of the difference between the actual percentage of sameness and the expected percentage of sameness indicates how much users were deliberately seeking someone with the same value as themselves.

An actual sameness percentage close to its expected sameness percentage indicates that users who share a value for that characteristic did not communicate more often than we would expect by chance if users were contacting each other randomly. On the other hand, a large difference between actual and expected sameness percentages would indicate that users who share a value for a characteristic communicated more often than we would expect by chance.

Because we expect statistically a varying likelihood of sameness for various characteristics, the absolute difference in expected and actual percentages does not facilitate comparisons between different characteristics, which have different expected percentages. Instead, I calculate the proportion of the actual to the expected percentage sameness for each characteristic. Table 3.2 shows these values in parentheses following the actual percentages for all contacts and for reciprocated contacts. The characteristics are listed in descending order of this proportion, which shows the relative bounding strength of each.

Was this article helpful?

0 0

Post a comment