1. What is Data?

Data are facts and figures — raw, unprocessed numerical information collected for a specific purpose. In economics, data could be prices of commodities, output of industries, wages of workers, literacy rates, or population figures.

Types of Data — Quantitative and Qualitative

Type Definition Example
Quantitative Data that can be expressed numerically — measured or counted Income ₹35,000; Temperature 38°C; 500 students enrolled
Qualitative Data describing attributes or characteristics that cannot be directly measured in numbers Gender (male/female); Religion; Satisfaction level (good/bad)

2. Sources of Data — Primary and Secondary

Data can be obtained from two broad sources:

Basis Primary Data Secondary Data
Definition Data collected for the first time, directly from original sources, by the investigator for a specific purpose Data that have already been collected and published by someone else for their own purpose; used in a secondary capacity
Original? Yes — first-hand, fresh, original No — second-hand; already processed
Collection method Surveys, interviews, questionnaires, observation — directly by investigator Published reports, government databases, newspapers, journals
Cost and time Expensive and time-consuming Cheaper and quicker — already available
Accuracy More accurate — tailored to exact purpose May not fit the exact purpose; must be evaluated for reliability
Precautions Proper questionnaire design, trained investigators, adequate sample Must check source reliability, definition used, time period, and purpose of original collection
Indian examples NSO/NSSO directly conducting surveys on household expenditure, employment A researcher using RBI Annual Report data; a student using Census 2011 population figures
⚠️ Important: The same data can be primary for one user and secondary for another. Example: When NSSO collects household data by direct survey → Primary for NSSO. When a researcher uses the published NSSO report → Secondary for that researcher.

Sources of Secondary Data — Where to Find It

Source Type Examples
Government publications Census of India; National Sample Survey (NSS); RBI Handbook of Statistics; Economic Survey of India; NITI Aayog reports; Annual Survey of Industries
International organisations World Bank Development Report; IMF World Economic Outlook; UN Human Development Report; WHO Health Statistics
Semi-government / autonomous bodies NABARD reports; SEBI Annual Reports; Stock Exchange data
Private publications Newspapers (The Hindu, Economic Times); research journals; industry associations; NGO reports

Precautions When Using Secondary Data

Before using secondary data, a researcher must verify:

  • Reliability of source: Is it a reputable government or international agency, or an anonymous website?
  • Suitability for the current purpose: Were the definitions used the same as what you need? (e.g., poverty line definition may vary)
  • Adequacy: Does the data cover the required time period, region, and population?
  • Accuracy: Were the methods of collection described? Were there known biases?
  • Time period: Is the data sufficiently recent, or outdated?

3. Methods of Collecting Primary Data

When primary data is needed, the investigator must choose the most appropriate method of collection. Five main methods exist:

Method How it works Merits Limitations
1. Direct Personal Investigation Investigator personally visits each respondent and collects data face-to-face Most reliable and accurate; can clarify doubts; observe reactions Very expensive and time-consuming; personal bias possible; limited geographic coverage
2. Indirect Oral Investigation Investigator interviews third parties (witnesses, experts) who have knowledge of the subject rather than the concerned persons themselves Useful when direct contact with concerned persons is not possible; wider coverage Data may be biased or inaccurate (hearsay); witnesses may not have full information
3. Questionnaire through Enumerators Trained enumerators visit respondents and fill the questionnaire by asking questions Good response rate; enumerators can explain questions; covers illiterate respondents Training enumerators is costly; enumerator bias possible
4. Questionnaire by Mail/Post Questionnaire is sent to respondents by post/email; they fill it and return Cheap; covers wide geographic area; respondent has time to think Low response rate; only literate respondents; no one to clarify doubts; late/no returns
5. Telephone / Online Interview Data collected via telephone calls or online forms (increasingly common) Quick, cheap, wide reach; easy to compile digital data Limited to those with phones/internet; no in-depth probing; possible refusals

4. Qualities of a Good Questionnaire

A questionnaire is a set of questions used to collect data from respondents. Its quality determines the quality of the entire survey. Characteristics of a good questionnaire:

  • Clear and simple language: Questions should be easily understood — no ambiguous, complex or technical language.
  • Logical sequence: Questions should flow in a logical order — general to specific.
  • Limited number of questions: Short questionnaires get better response rates — avoid unnecessary questions.
  • No leading questions: Questions must not suggest the desired answer (e.g., "Don't you agree that prices are too high?" is a leading question).
  • Specific and objective: Questions must have clear, definite answers — avoid vague questions like "How often do you buy vegetables?"
  • Confidentiality: Personal or sensitive questions should assure respondents that their answers are confidential.
  • Pre-tested (Pilot Survey): The questionnaire should be tested on a small sample before the main survey — called a pilot survey — to identify problems and refine questions.

What is a Pilot Survey?

A Pilot Survey (or Pre-test) is a small-scale trial run of the questionnaire on a limited number of respondents before the main survey. It helps:

  • Identify ambiguous or confusing questions
  • Test the time required to complete the survey
  • Reveal any missing important questions
  • Check if the method of collection is feasible

5. Census vs Sample Investigation

Once the source and method are decided, the investigator must choose how many units to study. Two approaches exist:

Basis Census (Complete Enumeration) Sample Investigation
Definition Every unit of the population is studied Only a representative subset (sample) of the population is studied
Coverage 100% — no unit is excluded Partial — only selected units
Cost and time Very expensive and time-consuming Cheaper and faster
Accuracy High — no sampling error; but non-sampling errors can occur Subject to sampling error; but intensive study possible
When used Small population; when complete coverage is essential (Census of India) Large populations; when cost/time is a constraint; destructive testing
Indian example Census of India (every 10 years — studies every household) NSS surveys; agricultural surveys; election exit polls

Key Terms — Population, Sample, Sampling Unit

Term Definition Example
Population (Universe) The complete set of ALL units under study All students in India; all factories in a state
Sample A representative subset of the population selected for study 500 students selected from all students in India
Sampling Unit The individual unit selected for the sample A household, a farm, a student
Sampling Frame The complete list of all units in the population from which the sample is drawn Electoral rolls; list of all schools in a district

6. Sampling Methods

Sampling methods fall into two broad categories:

A. Random (Probability) Sampling

Every unit in the population has a known, non-zero probability of being selected. Selection is free from personal bias. Results can be used to make statistical inferences about the whole population.

Type How it works Example
Simple Random Sampling Every unit has an equal chance; selection by lottery method (chits drawn) or random number table Names of 1000 employees written on chits; 50 drawn at random for a survey
Systematic Sampling Every kth unit selected from a numbered list; k = N/n (N = population size, n = sample size); first unit randomly chosen from 1 to k N=500, n=50 → k=10. If first unit = 7, select: 7, 17, 27, 37... (every 10th unit)
Stratified Sampling Population divided into non-overlapping subgroups (strata) based on a characteristic; random sample drawn from each stratum Strata: urban / rural. Random sample of 200 from urban and 300 from rural; ensures both groups are represented
Cluster Sampling Population divided into clusters (natural groupings); entire clusters randomly selected and all units within studied India's 700 districts = clusters; 50 districts randomly chosen; all households in those districts surveyed

B. Non-Random (Non-Probability) Sampling

Not every unit has a known probability of selection. Based on the researcher's judgement, convenience, or quotas. Results cannot be used to make statistical inferences about the full population. Cheaper and faster, but potentially biased.

Type How it works When used
Purposive / Judgement Researcher deliberately selects units believed to be most representative Selecting expert economists for a panel discussion
Quota Sampling Field workers fill specified quotas for each group (e.g., 50 men, 50 women) Market research surveys; opinion polls
Convenience Sampling Whoever is easily accessible is selected Student surveys interviewing friends or classmates

7. Sampling and Non-Sampling Errors

Error Type Definition Present in Census? Present in Sample?
Sampling Error Difference between the sample estimate and the true value of the population parameter — arises because a sample (not the full population) is studied ❌ No (no sampling involved) ✅ Yes — inherent in sampling
Non-Sampling Error Errors arising from defects in data collection regardless of whether census or sample — due to faulty questionnaire, enumerator bias, respondent lying, processing mistakes ✅ Yes ✅ Yes

How to reduce sampling error: Increase the sample size — larger samples are more representative. However, increasing sample size also increases cost.

Key insight: A large sample with non-sampling errors can be less accurate than a smaller, well-designed sample. Quality of data collection matters as much as quantity.