Blog on Data

Data collection process

The data collection process started with a broad Google search of key words, such as malnutrition data. Then, a large number of sites were opened from the results page and scanned for data range and availability as well as for the sources’ credibility. The credibility was assessed by looking at the About section of the website and looking at their claimed data sources.

The main sources

Except for a few websites which claimed to have a great amount of otherwise unavailable data for a fee, most websites pointed towards the UN, World Bank and to the Food and Agriculture Organisation (FAO) of the UN.  After downloading various datasets from these three main sources, we decided to go with a Food Security Indicators Excel document available from the FAO.

The variables used

The dataset came with a total of 43 indicators ranging from GDP per capita to underweight population, political stability and railroad density and contained (sometimes incomplete) information on 230 countries and for some aggregate regions such as Oceania, Latin America and developed countries. The dataset was for the period 1990 until 2016.

For the Interactive visualization map we decided to use 10 indicators. The indicator-selection criterion was our subjective judgement of how closely or directly related each of the variables is to child malnutrition. Of course, the inclusion of some of the indicators was subject to thorough debate between us. The indicators selected are tabulated below with a brief description of each:

Indicator/Variable Description
GDP per capita GDP per capita based on purchasing power parity (PPP). PPP GDP is gross domestic product converted to international dollars using purchasing power parity rates. An international dollar has the same purchasing power over GDP as the U.S. dollar has in the United States. GDP at purchaser’s prices is the sum of gross value added by all resident producers in the economy plus any product taxes and minus any subsidies not included in the value of the products. It is calculated without making deductions for depreciation of fabricated assets or for depletion and degradation of natural resources. Data are in constant 2011 international dollars.
Depth of the food deficit The depth of the food deficit indicates how many calories would be needed to lift the undernourished from their status, everything else being constant. The average intensity of food deprivation of the undernourished, estimated as the difference between the average dietary energy requirement and the average dietary energy consumption of the undernourished population (food-deprived), is multiplied by the number of undernourished to provide an estimate of the total food deficit in the country, which is then normalized by the total population.
Access to improved water resources Access to an improved water source refers to the percentage of the population with reasonable access to an adequate amount of water from an improved source, such as a household connection, public standpipe, borehole, protected well or spring, and rainwater collection. Unimproved sources include vendors, tanker trucks, and unprotected wells and springs. Reasonable access is defined as the availability of at least 20 liters a person a day from a source within one kilometer of the dwelling.
Access to improved sanitation resources Access to improved sanitation facilities refers to the percentage of the population with at least adequate access to excreta disposal facilities that can effectively prevent human, animal, and insect contact with excreta. Improved facilities range from simple but protected pit latrines to flush toilets with a sewerage connection. To be effective, facilities must be correctly constructed and properly maintained.
Children under 5 affected by wasting

Wasting prevalence is the proportion of children under five whose weight for height is more than two standard deviations below the median for the international reference population ages 0-59.

 

This indicator belongs to a set of indicators whose purpose is to measure nutritional imbalance and malnutrition resulting in undernutrition (assessed by underweight, stunting and wasting) and overweight. Child growth is the most widely used indicator of nutritional status in a community and is internationally recognized as an important public-health indicator for monitoring health in populations. In addition, children who suffer from growth retardation as a result of poor diets and/or recurrent infections tend to have a greater risk of suffering illness and death.

Children under 5 who are stunted

Percentage of stunting (height-for-age less than -2 standard deviations of the WHO Child Growth Standards median) among children aged 0-5 years.

 

This indicator belongs to a set of indicators whose purpose is to measure nutritional imbalance and malnutrition resulting in undernutrition (assessed by underweight, stunting and wasting) and overweight. Child growth is the most widely used indicator of nutritional status in a community and is internationally recognized as an important public-health indicator for monitoring health in populations. In addition, children who suffer from growth retardation as a result of poor diets and/or recurrent infections tend to have a greater risk of suffering illness and death.

Children under 5 who are underweight

Percentage of underweight (weight-for-age less than -2 standard deviations of the WHO Child Growth Standards median) among children aged 0-5 years.

 

This indicator belongs to a set of indicators whose purpose is to measure nutritional imbalance and malnutrition resulting in undernutrition (assessed by underweight, stunting and wasting) and overweight. Child growth is the most widely used indicator of nutritional status in a community and is internationally recognized as an important public-health indicator for monitoring health in populations. In addition, children who suffer from growth retardation as a result of poor diets and/or recurrent infections tend to have a greater risk of suffering illness and death.

Adults who are underweight

Percentage of adults who are underweight, as defined by a Body Mass Index (BMI) below the international reference standard of 18.5. To calculate an individual’s BMI, weight and height data are need.  The BMI is weight (kg) divided by squared height (m).

This indicator belongs to a set of indicators whose purpose is to measure nutritional imbalance and malnutrition resulting in undernutrition (assessed by underweight, stunting and wasting) and overweight.

Anaemia amongst pregnant women

Prevalence of anaemia in pregnant women is the percentage of pregnant women whose hemoglobin level is smaller than 110 grams per liter at sea level.

Anaemia is a condition in which the number of red blood cells (and consequently their oxygen-carrying capacity) is insufficient to meet the body’s physiologic needs. Specific physiologic needs vary with a person’s age, gender, residential elevation above sea level (altitude), smoking behaviour, and different stages of pregnancy. Iron deficiency is thought to be the most common cause of anaemia globally, but other nutritional deficiencies (including folate, vitamin B12 and vitamin A), acute and chronic inflammation, parasitic infections, and inherited or acquired disorders that affect haemoglobin synthesis, red blood cell production or red blood cell survival, can all cause anaemia. The prevalence of anaemia is an important health indicator. When used with other measurements of iron status, the haemoglobin concentration can provide information about the severity of iron deficiency. The cut-off values for public health significance is 40%. A prevalence of anaemia equal or higher than this level signals a severe public health problem.

Number of people undernourished

Estimated number of people at risk of undernourishment. It is calculated by applying the estimated prevalence of undernourishment to total population in each period.

More details on the methodology for computing the prevalence of undernourishment are in “Annex 2” of the “State of Food Insecurity in the World 2015” Report (http://www.fao.org/publications/sofi/en/).

Source: Food and Agriculture Organisation of the United Nations Statistics Division. (n.d.). Home. Retrieved from http://faostat3.fao.org/home/E

 

The software involved

After a brief overview of the interactive visualization software programs, we decided to use TabIeau. For the infographic on Pakistan we decided to use Piktochart. In order to ‘clean’ the data and manipulate it such that it will be read appropriately in the software selected, we used Microsoft’s Excel as well as OpenRefine.

The challenges we had with it

We encountered two major obstacles when it comes to data. First, The data came in one large Excel document containing information on more than 250 countries or regions, over a period of 24 years and across 43 separate spreadsheets. This size of the document made it difficult for us to manipulate it by using more rudimentary, cut-and-paste techniques. At the same time, Tableau seemed to be rather inflexible when it comes to data format, so we had to find a way around it.

After looking up (either on the lecture slides or the Internet) and trying different tools for large data manipulation, we decided to stick with OpenRefine and Kutools, mainly due to their user-friendly interface. OpenRefine was used to transfer the information onto one spreadsheet (instead of 43). Then, Kutools was used for larger cut-and-paste operations.

The second second major problem encountered was the lack of observations across countries and time. That is, the ‘picture’ wouldn’t have been painted well across time and geographical location due to a large number of missing values.

In order to overcome this problem we decided to fill in the blank spaces with computed linear values where the data was missing between two given observations across a region or country. If no value was available either before or after (or neither) the missing value, we decided that that the most ‘fair’ thing to do is place a zero (a zero indicating that we had no value or estimate about what the value could’ve been). Fortunately, we were able to insert both the linear imputations and the zeros (across more than 250.000 rows) easily with Kutools.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s