Practice A4.2 Data preprocessing (HL only) with authentic IB Computer Science (First Exam 2027) exam questions for both SL and HL students. This question bank mirrors Paper 1, 2, 3 structure, covering key topics like programming concepts, algorithms, and data structures. Get instant solutions, detailed explanations, and build exam confidence with questions in the style of IB examiners.
A social media analytics platform preprocesses text data from multiple languages and platforms for sentiment analysis and trend detection.
Design a text preprocessing pipeline that handles multilingual content, emojis, hashtags, and informal language:
Compare different text vectorization techniques (Bag of Words, TF-IDF, Word Embeddings) for social media analysis, considering computational efficiency and semantic understanding.
A financial technology company preprocesses market data, transaction records, and economic indicators for algorithmic trading and risk assessment models.
Compare different data normalization and standardization techniques for financial time-series data, considering the impact on model performance and interpretability.
Explain how sliding window techniques and lag features can capture temporal patterns in financial data for predictive modelling.
A retail analytics team preprocesses customer transaction data from multiple sources to build predictive models for inventory management and sales forecasting.
Analyse the benefits and challenges of multi-model databases that support document, graph, and key-value data models within a single system. In your analysis, discuss the impact on data modelling flexibility, query language complexity, performance considerations, and operational management. Include specific examples of implementation considerations and use cases where multi-model databases would be advantageous or problematic.
Analyse the trade-offs between different missing value imputation strategies (deletion, mean imputation, advanced methods) for retail transaction data.
A computer vision system preprocesses satellite imagery and drone footage for environmental monitoring and agricultural assessment.
Describe image preprocessing techniques specific to satellite imagery including geometric correction, radiometric calibration, and atmospheric correction.
Explain how data augmentation techniques can improve the robustness of computer vision models for environmental applications.
A healthcare research project preprocesses electronic health records (EHR) containing patient demographics, medical history, lab results, and clinical notes for predictive modelling.
Explain the specific challenges of preprocessing healthcare data including privacy requirements, data standardization, and temporal relationships.
Describe how feature engineering can create meaningful variables from raw EHR data for disease prediction models.