All Indicators > Indicator IB1: Lifestyle
| Definition | Measures of healthy lifestyles |
| Dimension | Intervening factors |
| Sector | Behaviours and environments (individual) |
| Components |
|
| Source | Various - see component details |
Component IB1_1: Smoking prevalence
| Definition | Modelled estimate of the proportion of cigarette smokers |
| Source | 2001: Health Survey for England, 2001 (Joint Survey Unit of the National Centre for Social Research and the Department of Epidemiology and Public Health, University College London/Department of Health), General Household Survey 2000-2001 (Office for National Statistics) and the Omnibus Survey, Jan, Mar, April, June, July, Sep, Oct and Nov 2001 (Office for National Statistics) (See: Health Survey for England) |
| 2001 Ethnic: Health survey for England, 1998-2001 (Joint Survey Unit of the Nation Centre for Social Research and the Department of Epidmiology and Public Health, University College London/Department of Health) (See: Health Survey for England) | |
| 2003: Health survey for England, 2001-2003 (Joint Survey Unit of the Nation Centre for Social Research and the Department of Epidmiology and Public Health, University College London/Department of Health) (See: Health Survey for England) |
Additional details
In the absence of any suitable administrative or census data, survey data was the only source of information available to construct an indicator of smoking. However there are a number of problems associated with using survey data to produce Local Authority District (LAD) estimates, including small or non-existent samples in some areas leading to large variances and unstable estimates and biases introduced by particular sampling strategies.
A great deal of work, particularly in the last twenty years, has gone into addressing these issues. Although a number of different approaches have been used, all the methods tend to fall somewhere on a continuum between using direct estimates, suitably weighted for sample design, and a modelling approach using local area covariates to estimate the indicator of interest. Some are based on only one or other of the methods. However the two methods each have their own particular problems. Direct estimates, weighted as necessary, are unbiased but may have large variances; on the other hand the modelled estimates will have small variances but will be biased. Hence many estimates attempt to combine information from both in order to solve the common problem of minimising the Mean Square Error of the final estimate.
The method used in the HPI required that a well-fitted micro level model could be identified. It also assumed that the important ways in which a group may have been over-sampled in a survey sample can be captured by covariates available in the survey and at a small area level. It involved combining all surveys available for the required year with the necessary dependent and independent variables (e.g. socio-economic status, age, gender and ethnicity).
Data were gathered from the Health Survey for England (HSE, 1998-2003) for estimates for both the whole population and for the ethnic groups. In addition for the 2001 whole population estimate, the General Household Survey 2000-2001 along with all eight phases of the Omnibus Survey were used to create the dataset of smokers (see below).
| Survey point | Section | |
| Jan 2001 | M210_1 | 210: Consumption of tobacco |
| Mar | M210_1 | 210: Consumption of tobacco |
| April | M210_1 | 210: Consumption of tobacco |
| June | M210_1 | 210: Consumption of tobacco |
| July | M210_1 | 210: Consumption of tobacco |
| Sep | M210_1 | 210: Consumption of tobacco |
| Oct | M130_2 | 130: smoking |
| Nov | M130_2 | 130: smoking |
The questions used from the surveys were:
- Omnibus “Do you smoke at all nowadays?”
- GHS “Do you smoke cigarettes at all nowadays?”
- HSE “Do you smoke cigarettes at all nowadays?”
Less than 3% of the smoking population smoke only pipes, so the bias introduced by not having an ‘all smoking’ question on the omnibus survey was not believed to be great.
In 1999, the focus of the HSE was the health of minority ethnic groups as a means to increase understanding through the monitoring of trends and by enabling us to make predictions. For this purpose a boost sample was designed in order to yield interviews with members of the most populous six minority ethnic groups: Black Caribbean, Black African, Indian, Pakistani, Bangladeshi, Chinese, and Irish. For the purpose of this estimate, Irish is included under White, and the Black African group added (although this sample was not boosted, hence the low numbers). The table below shows the number of ethnic groups available for each year that were used in the modelling of estimates for ethnic groups:
| Year | Total | |||||
|---|---|---|---|---|---|---|
| Ethnic Group | 1998
|
1999
|
2000
|
2001
|
||
| White | 18019
|
10437
|
8851
|
17322
|
54629
|
|
| Black Carribean | 183
|
2029
|
143
|
296
|
2651
|
|
| Black African | 143
|
73
|
98
|
172
|
486
|
|
| Indian | 321
|
1909
|
203
|
287
|
2720
|
|
| Pakistani | 198
|
2148
|
91
|
225
|
2662
|
|
| Bangladeshi | 73
|
1905
|
64
|
83
|
2125
|
|
| Chinese | 39
|
961
|
17
|
37
|
1054
|
|
| Total | 18796
|
19462
|
9467
|
18422
|
66327
|
|
Only the main, adult sample, and not the oversampled ‘special populations’, was included in the modelling process for the whole population. For the ethnic population estimates, the adult and 1999 ethnic minority boost was used.
Step 1
Using combined survey data, with LAD geocoding, a multi-level, variable intercepts,
logistic model was run, with level one being the individual i, level two
the primary sampling unit j and level three the LAD k. Covariates from within
the survey, shown in lower case, and LAD level data, shown in upper case,
were used to predict the individual level behaviour.
Logit (Pijk) = Xijk B + Ujk + Vk + Eijk
Where P is a vector of probabilities associated with individual i in Primary Sampling Unit (PSU) j within LAD k, B a vector of regression coefficients, X a matrix of covariates associated with the individual measured within the survey, U a random vector of area effects associated with the PSU and V the LAD and E is a vector of independent random ‘noise’elements. The matrix of covariates included PSU area measures, based on aggregated individual level survey counts within the PSU. The covariates for both the whole population and the ethnic groups are given in the tables below:
2001 Total Population
- Smoking
|
||
| Covariates | ||
| Constant | -0.814 | |
| Individual effects | 20-24 years | 0.562 |
| 25-29 years | 0.628 | |
| 30-34 years | 0.467 | |
| 35-39 years | 0.334 | |
| 40-44 years | 0.219 | |
| 45-49 years | 0.246 | |
| 50-54 years | 0.126 | |
| 55-59 years | -0.072 | |
| 60-64 years | -0.214 | |
| 65-69 years | -0.497 | |
| 70-74 years | -0.538 | |
| 75+years | -1.208 | |
| Male | 0.087 | |
| Income Support recipient | 0.674 | |
| PSU area effects | Proportion Asian | -0.650 |
| Proportion higher social class | -0.609 | |
| Proportion Income Support recipient | 0.299 | |
| LAD area effects | Proportion Income Support recipient | 0.847 |
2001 Ethnic Groups - Smoking
|
|||
| Covariates | |||
| Constant | -1.152 | ||
| Individual effects | Bangladeshi | -3.921 | |
| Black African | -0.824 | ||
| Black Caribbean | -0.211 | ||
| Chinese | -1.185 | ||
| Indian | -1.57 | ||
| Pakistani | -2.256 | ||
| 20-24 years | 0.468 | ||
| 25-29 years | 0.307 | ||
| 30-34 years | 0.19 | ||
| 35-39 years | 0.09 | ||
| 40-44 years | -0.009 | ||
| 45-49 years | -0.046 | ||
| 50-54 years | -0.143 | ||
| 55-59 years | -0.319 | ||
| 60-64 years | -0.467 | ||
| 65-69 years | -0.596 | ||
| 70-74 years | -0.839 | ||
| 75+years | -1.419 | ||
| Male | Bangladeshi | 4.297 | |
| Black African | 0.678 | ||
| Black Caribbean | 0.596 | ||
| Chinese | 0.94 | ||
| Indian | 1.399 | ||
| Pakistani | 2.107 | ||
2003 Total Population - Smoking
|
||
| Covariates | ||
| Constant | -0.724 | |
| Individual effects | 20-24 years | 0.506 |
| 25-29 years | 0.463 | |
| 30-34 years | 0.261 | |
| 35-39 years | 0.223 | |
| 40-44 years | 0.180 | |
| 45-49 years | 0.104 | |
| 50-54 years | -0.092 | |
| 55-59 years | -1.158 | |
| 60-64 years | -0.377 | |
| 65-69 years | -0.664 | |
| 70-74 years | -0.731 | |
| 75+years | -1.422 | |
| Male | 0.062 | |
| Income Support recipient | 0.820 | |
| Higher Social Class | -0.372 | |
| PSU area effects | Proportion Asian | -0.815 |
| Proportion higher social class | -0.594 | |
| Proportion Income Support recipient | 0.642 | |
| LAD area effects | Proportion Income Support recipient | 0.304 |
2005 Total Population - Smoking
|
||
| Covariates | ||
| Constant | -0.435 | |
| Individual effects | 20-24 years | 0 |
| 25-29 years | 0.096 | |
| 30-34 years | 0.096 | |
| 35-39 years | -0.066 | |
| 40-44 years | -0.066 | |
| 45-49 years | -0.238 | |
| 50-54 years | -0.238 | |
| 55-59 years | -0.557 | |
| 60-64 years | -0.557 | |
| 65-69 years | -1.106 | |
| 70-74 years | -1.106 | |
| 75+years | -1.684 | |
| Male | 0.066 | |
| Higher social class | -0.443 | |
| Income Support recipient | 0.675 | |
| PSU Area Effects | Proportion Asian | -1.219 |
| Proportion on Income Support | 1.07 | |
| Proportion higher social class | -0.443 | |
Step 2
The fixed effects part of each model were then taken and applied to the matrix
of small area covariates X held by SDRC for 100% of individuals and LADs
across England, the random LAD area effect added (where it was available
for an LAD), and the anti-logit applied. The probability was then summed
and averaged over the LAD to produce a vector of synthetic LAD level estimates:
Yk = 1 / Nk x Sum ( anti-Logit ( Xijk B + Vk ) )
This method does not use weighting to remove bias in the parameter estimators introduced by unequal selection probabilities in the survey sampling schemes. Instead important characteristics of the sample are included in the model as covariates. The sample indicator variable S will therefore be unrelated to Y conditional on these covariates. In this case the sample can be viewed as uninformative and ignorable. There is little conflict in including theses covariates because they are, by definition, predictors of Y and so should be included in the models. If they were not, the sample design would not bias the standard estimators of the parameters.
Included in our models are measures of non-manual social classes and a ‘level’for the primary sampling unit. Together these will capture, to a great extent, the unequal selection probabilities associated with the sample design. Other variables such as age will ensure that where a question or measure was taken of only a particular age group in a specific survey year that the estimates will not be biased.
Component IB1_2: Fresh fruit intake
| Definition | Modelled estimate of the adult population consuming less than 5 portions of fruit and vegetables a day |
| Source | 2001, 2001 Ethnic, 2003: Health Survey for England, 2001, Joint Survey Unit of the National Centre for Social Research and the Department of Epidemiology and Public Health, University College London/Department of Health |
Additional details
In the absence of any suitable administrative or census data, survey data was the only source of information available to construct an indicator of fresh fruit intake. However there are a number of problems associated with using survey data to produce Local Authority District (LAD) estimates, including small or non-existent samples in some areas leading to large variances and unstable estimates and biases introduced by particular sampling strategies.
A great deal of work, particularly in the last twenty years, has gone into addressing these issues. Although a number of different approaches have been used, all the methods tend to fall somewhere on a continuum between using direct estimates, suitably weighted for sample design, and a modelling approach using local area covariates to estimate the indicator of interest. Some are based on only one or other of the methods. However the two methods each have their own particular problems. Direct estimates, weighted as necessary, are unbiased but may have large variances; on the other hand the modelled estimates will have small variances but will be biased. Hence many estimates attempt to combine information from both in order to solve the common problem of minimising the Mean Square Error of the final estimate.
The method used in the HPI required that a well-fitted micro level model could be identified. It also assumed that the important ways in which a group may have been over-sampled in a survey sample can be captured by covariates available in the survey and at a small area level. It involved combining all surveys available for the required year with the necessary dependent and independent variables (e.g. socio-economic status, age, gender and ethnicity). Data were gathered from the Health Survey for England (HSE) (1998 –2001). Due to changes in the question asked in the HSE, resulting in inconsistency in the definition used over time, the data has been frozen, and one model used.
In 1999, the focus of the HSE was the health of minority ethnic groups in order to increase understanding through the monitoring of trends and by enabling us to make predictions. For this purpose a boost sample was designed in order to yield interviews with members of the most populous six minority ethnic groups: Black Caribbean, Black African, Indian, Pakistani, Bangladeshi, Chinese and Irish. For the purpose of this sample, Irish is included under White, and the Black African group added (although this sample was not boosted, hence the low numbers). The table below shows the number of ethnic groups available for each year that were used in the modelling of estimates for ethnic groups:
| Year | Total | |||||
|---|---|---|---|---|---|---|
| Ethnic Group | 1998
|
1999
|
2000
|
2001
|
||
| White | 18019
|
10437
|
8851
|
17322
|
54629
|
|
| Black Carribean | 183
|
2029
|
143
|
296
|
2651
|
|
| Black African | 143
|
73
|
98
|
172
|
486
|
|
| Indian | 321
|
1909
|
203
|
287
|
2720
|
|
| Pakistani | 198
|
2148
|
91
|
225
|
2662
|
|
| Bangladeshi | 73
|
1905
|
64
|
83
|
2125
|
|
| Chinese | 39
|
961
|
17
|
37
|
1054
|
|
| Total | 18796
|
19462
|
9467
|
18422
|
66327
|
|
For the ethnic population estimates, the adult and 1999 ethnic minority boost was used.
Step 1
Using combined survey data, with LAD geocoding, a multi-level, variable intercepts,
logistic model was run, with level one being the individual i, level two
the primary sampling unit j and level three the LAD k. Covariates from within
the survey, shown in lower case, and LAD level data, shown in upper case,
were used to predict the individual level behaviour.
Logit (Pijk) = Xijk B + Ujk + Vk + Eijk
Where P is a vector of probabilities associated with individual i in Primary Sampling Unit (PSU) j within LAD k, B a vector of regression coefficients, X a matrix of covariates associated with the individual measured within the survey, U a random vector of area effects associated with the PSU and V the LAD and E is a vector of independent random 'noise' elements. The matrix of covariates included PSU area measures, based on aggregated individual level survey counts within the PSU. These covariates are given in the table below:
2001 and 2003 Total
Population and 2001 Ethnic Groups - Fresh Fruit Intake
|
||
| Covariates | ||
| Constant | 0.414 | |
| Individual effects (x) | Bangladeshi | -0.187 |
| Black African | 0.82 | |
| Black Caribbean | 0.144 | |
| Chinese | -1.304 | |
| Indian | 0.196 | |
| Pakistani | 0.325 | |
| 20-24 years | 0.039 | |
| 25-29 years | -0.194 | |
| 30-34 years | -0.403 | |
| 35-39 years | -0.481 | |
| 40-44 years | -0.642 | |
| 45-49 years | -0.762 | |
| 50-54 years | -0.925 | |
| 55-59 years | -1.255 | |
| 60-64 years | -1.095 | |
| 65-69 years | -0.984 | |
| 70-74 years | -1.012 | |
| 75+years | -1.085 | |
| Male | 0.551 | |
Step 2
The fixed effects part of the model were then taken and applied to the matrix X of
small area covariates held by SDRC for 100% of individuals and LADs across
England, the random LAD area effect added (where it was available for an LAD),
and the anti-logit applied. The probability was then summed and averaged over
the LAD to produce a vector of synthetic LAD level estimates:
Yk = 1 / Nk x Sum ( anti-Logit ( Xijk B + Vk ) )
This method does not use weighting to remove bias in the parameter estimators introduced by unequal selection probabilities in the survey sampling schemes. Instead important characteristics of the sample are included in the model as covariates. The sample indicator variable S will therefore be unrelated to Y conditional on these covariates. In this case the sample can be viewed as uninformative and ignorable. There is little conflict in including theses covariates because they are, by definition, predictors of Y and so should be included in the model. If they were not, the sample design would not bias the standard estimators of the parameters.
Included in our models are measures of non-manual social classes and a 'level' for the primary sampling unit. Together these will capture, to a great extent, the unequal selection probabilities associated with the sample design. Other variables such as age will ensure that where a question or measure was taken of only a particular age group in a specific survey year, the estimates will not be biased.
Component IB1_3: Alcohol abuse
| Definition | Directly age and gender standardised rate of admissions to hospital for alcohol related conditions |
| Source Numerator | 2001, 2001 Ethnic: All ethnic all coded admissions to hospital for alcohol related conditions, Hospital Episode Statistics (HES), 1998/99, 1999/00, 2000/01, 2001/02, Department of Health |
| 2003: All admissions to hospital for alcohol related conditions, Hospital Episode Statistics (HES), 1999/00, 2000/01, 2001/02, 2002/03 Department of Health | |
| 2005: All admissions to hospital for alcohol related conditions, Hospital Episode Statistics (HES), 2002/03, 2003/04, 2004/05, 2005/06 Department of Health | |
| Source Denominator | 2001, 2001 Ethnic: Mid year population estimate 2001, ONS |
| 2003: Mid year population estimate 2003, ONS | |
| 2005: Mid year population estimate 2005, ONS |
Additional details
There are many factors which influence how much someone drinks: occupation and family background, the acceptability of drinking in a culture, and the occurrence of stressful or major life events. Statistics show that people's drinking habits have changed over the last 30 years. Recent years have seen an increase in the number of women drinking above recommended levels, and a worrying trend for teenagers to drink large quantities. There is evidence also of substantial numbers of men and women drinking heavily and in a binge drinking pattern. Alcohol is a major contributor not only to death, injury and illness, but also social damage such as crime and disorder, and social exclusion (2001, Annual Report of the Chief Medical Officer).
Alcohol abuse, captured by the rate of admissions to hospital for alcohol related conditions, is one indicator of unhealthy behaviour leading to poor health outcomes, as well as wider social problems.
The International Classification of Diseases Version 10 (ICD-10) codes used to extract data on admissions for alcohol related conditions from the HES dataset were:
- E52, F10, G312, G621, G721, I426, K292, K70, K860, O354, P043, Q860. R780, T506, T510, T519, X65, Y15, Y573, Y90, Y91, Z133, Z502, Z637, Z714, Z721, Z811, Z864.
Cases were used if one or more of these codes were found in any of the fourteen diagnosis fields. Individuals who had more than one admission for an alcohol related condition in a given year were counted once only.
To control for differences in the age and gender structure across small areas, direct standardisation was used. Direct standardisation involves the application of small area age and gender structures to a standard population, which in this instance is derived from the HES data. This produces an expected number of events (admissions for alcohol abuse) in the standard population as if the risk profile of the individual areas was in place. This is contrasted with the actual number of observed events in the standard population to give a ratio. Thus a measure of higher or lower than expected occurrence of admissions for alcohol abuse is created.
For indicators derived from the Hospital Episode Statistics (HES) the estimates are based on the relationship between all hospital stays, and those recorded for a specific condition of interest. Detail is added from census data to depict the spatial distribution of individuals in ethnic groups. All estimates are statistically smoothed to reduce noise within the distribution, enabling the underlying trend to be highlighted. For more details see the discussion paper. <link to be added >
Component IB1_4: Drug misuse
| Definition | Directly age and sex standardised rate of admissions to hospital for drug related conditions |
| Source Numerator | 2001, 2001 Ethnic: All ethnically coded admissions to hospital for drug related conditions, Hospital Episode Statistics (HES), 1998/99, 1999/00, 2000/01, 2001/02, Department of Health |
| 2003: All admissions to hospital for drug related conditions, Hospital Episode Statistics (HES), 1998/99, 1999/00, 2000/01, 2001/02, Department of Health | |
| 2005: All admissions to hospital for drug related conditions, Hospital Episode Statistics (HES), 2002/03, 2003/04, 2004/05, 2005/06 Department of Health | |
| Source Denominator | 2001, 2001 Ethnic: Mid year population estimate 2001, ONS |
| 2003: Mid year population estimate 2003, ONS | |
| 2005: Mid year population estimate 2005, ONS |
Additional details
Around 4 million people use at least one illicit drug each year and around 1 million use at least one of the most dangerous drugs classified as Class A. Many of these individuals will take drugs once but for thousands of problematic drug users in England and Wales, drugs cause considerable harm to themselves and others (2002, Home Office Updated Drug Strategy).
Obviously there are significant health risks associated with drugs. Drug misuse, captured by the rate of admissions to hospital for drug related conditions, is one indicator of unhealthy behaviour leading to poor health outcomes. Research suggests that there are all kinds of reasons for misuse, key factors including unemployment, low self esteem, educational failure, boredom and physical, psychological or family problems.
There are also strong links between drug misuse and crime, violence and hidden social problems - in homes and schools, on the roads and in the workplace. This indicator can also reflect such problems in society.
The International Classification of Diseases Version 10 (ICD-10) codes used to extract data on admissions for drug related conditions from the HES dataset were:
- F11, F12, F13, F14, F15, F16, F18, F19
Cases were used if one or more of these codes were found in any of the fourteen diagnosis fields. Individuals who had more than one admission for a drug related condition in a given year were counted once only.
To control for differences in the age and gender structure across small areas, direct standardisation was used. Direct standardisation involves the application of small area age and gender structures to a standard population, which in this instance is derived from the HES data. This produces an expected number of events (admissions for drug misuse) in the standard population as if the risk profile of the individual areas was in place. This is contrasted with the actual number of observed events in the standard population to give a ratio. Thus a measure of higher or lower than expected occurrence of admissions for drug misuse is created.
For indicators derived from the Hospital Episode Statistics (HES) the estimates are based on the relationship between all hospital stays, and those recorded for a specific condition of interest. Detail is added from census data to depict the spatial distribution of individuals in ethnic groups. All estimates are statistically smoothed to reduce noise within the distribution, enabling the underlying trend to be highlighted. For more details see the discussion paper. <link to be added >
Component IB1_5: Physical Activity in a week
| Definition | Modelled estimate of proportion doing under five hours of physical activity in a week |
| Source | 2001, 2001 Ethnic and 2003: Health Survey for England,
1998 to 2001, Joint Survey Unit of the National Centre for Social Research
and the Department of Epidemiology and Public Health, University College
London / Department of Health (See: Health Survey for England) |
| Note | Due to changes in the question asked in the HSE, which has resulted in inconsistency in the definition used over time, the data has been frozen and one model used. |
Additional details
In the absence of any suitable administrative or census data, survey data was the only source of information available to construct an indicator of physical activity. However there are a number of problems associated with using survey data to produce Local Authority District (LAD) estimates, including small or non-existent samples in some areas leading to large variances and unstable estimates and biases introduced by particular sampling strategies.
A great deal of work, particularly in the last twenty years, has gone into addressing these issues. Although a number of different approaches have been used, all the methods tend to fall somewhere on a continuum between using direct estimates, suitably weighted for sample design, and a modelling approach using local area covariates to estimate the indicator of interest. Some are based on only one or other of the methods. However the two methods each have their own particular problems. Direct estimates, weighted as necessary, are unbiased but may have large variances; on the other hand the modelled estimates will have small variances but will be biased. Hence many estimates attempt to combine information from both in order to solve the common problem of minimising the Mean Square Error of the final estimate.
The method used in the HPI required that a well-fitted micro level model could be identified. It also assumed that the important ways in which a group may have been over-sampled in a survey sample can be captured by covariates available in the survey and at a small area level. It involved combining all surveys available for the required year with the necessary dependent and independent variables (e.g. socio-economic status, age, gender and ethnicity). Data were gathered from the Health Survey for England (HSE) (1998 – 2001). Due to changes in the question asked in the HSE, resulting in inconsistency in the definition used over time, the data has been frozen, and one model used.
In 1999, the focus of the HSE was the health of minority ethnic groups in order to increase understanding through the monitoring of trends and by enabling us to make predictions. For this purpose a boost sample was designed in order to yield interviews with members of the most populous six minority ethnic groups: Black Caribbean, Black African, Indian, Pakistani, Bangladeshi, Chinese and Irish. For the purpose of this estimate, Irish is included under White, and the Black African group added (although this sample was not boosted, hence the low numbers) The table below shows the number of ethnic groups available for each year that were used in the modelling:
Year
|
Total | |||||
| 1998 | 1999 | 2000 | 2001 | |||
| White | 18019 | 10437 | 8851 | 17322 | 54629 | |
| Black Caribbean | 183 | 2029 | 143 | 296 | 2651 | |
| Black African | 143 | 73 | 98 | 172 | 486 | |
| Indian | 321 | 1909 | 203 | 287 | 2720 | |
| Pakistani | 198 | 2148 | 91 | 225 | 2662 | |
| Bangladeshi | 73 | 1905 | 64 | 83 | 2125 | |
| Chinese | 39 | 961 | 17 | 37 | 1054 | |
| Total | 18796 | 19462 | 9467 | 18422 | 66327 | |
For the ethnic population estimates, the adult and 1999 ethnic minority boost was used.
Step 1
Using combined survey data, with LAD geocoding, a multi-level, variable intercepts,
logistic model was run, with level one being the individual i, level two
the primary sampling unit j and level three the LAD k. Covariates from within
the survey, shown in lower case, and LAD level data, shown in upper case,
were used to predict the individual level behaviour.
Logit (Pijk) = Xijk B + Ujk + Vk + Eijk
Where P is a vector of probabilities associated with individual i in Primary Sampling Unit (PSU) j within LAD k, B a vector of regression coefficients, X a matrix of covariates associated with the individual measured within the survey, U a random vector of area effects associated with the PSU and V the LAD and E is a vector of independent random 'noise' elements. The matrix of covariates included PSU area measures, based on aggregated individual level survey counts within the PSU. These covariates are given in the table below:
| Covariates | ||
| Constant | -0.816 | |
| Individual effects | Bangladeshi | 1.295 |
| Black African | 0.751 | |
| Black Caribbean | 0.118 | |
| Chinese | 0.796 | |
| Indian | 0.757 | |
| Pakistani | 1.097 | |
| 20-24 years | 0.064 | |
| 25-29 years | 0.046 | |
| 30-34 years | 0.136 | |
| 35-39 years | 0.139 | |
| 40-44 years | 0.254 | |
| 45-49 years | 0.298 | |
| 50-54 years | 0.373 | |
| 55-59 years | 0.577 | |
| 60-64 years | 1.002 | |
| 65-69 years | 1.284 | |
| 70-74 years | 1.53 | |
| 75+years | 2.481 | |
| Male | -0.434 | |
| LAD Area Effects | Proportion higher social class | 0.287 |
Step 2
The fixed effects part of the model are then taken and applied to the matrix
of small area covariates X held by SDRC for 100% of individuals
and LADs across England, the random LAD area effect added (where it was available
for an LAD), and the anti-logit applied. The probability was then summed
and averaged over the LAD to produce a vector of synthetic LAD level estimates:
Yk = 1 / Nk x Sum ( anti-Logit ( Xijk B + Vk ) )
This method does not use weighting to remove bias in the parameter estimators introduced by unequal selection probabilities in the survey sampling schemes. Instead important characteristics of the sample are included in the model as covariates. The sample indicator variable S will therefore be unrelated to Y conditional on these covariates. In this case the sample can be viewed as uninformative and ignorable. There is little conflict in including theses covariates because they are, by definition, predictors of Y and so should be included in the model. If they were not, the sample design would not bias the standard estimators of the parameters.
Included in our models are measures of non-manual social classes and a 'level' for the primary sampling unit. Together these will capture, to a great extent, the unequal selection probabilities associated with the sample design. Other variables such as age will ensure that where a question or measure was taken of only a particular age group in a specific survey year, the estimates will not be biased.


