Math AA IA Research Question Generator

Exam prep

Exemplars

Review

HOT

Tutoring

Knowledge Hub

Math AA IA Research Question Generator

Use the tabs below to generate a new Math AA IA idea or evaluate your current research question.

0/5 used

Sample Math AA IA Topic Ideas

Browse these sample topics to get inspired, or scroll up to generate your own custom ideas based on your specific interests.

Medium

How well can a discrete-time Markov chain with a fitted transition matrix model the state-to-state movement of passengers between the three ticket classes (economy, premium economy, business) on flights of Airline X during July–September 2024, as measured by predicted steady-state proportions and one-step transition error (RMSE)?

Suggested Approach

Start by stating the research question exactly as given and explain concisely what you will model: passenger movement between economy, premium economy and business using a discrete-time Markov chain for July–September 2024. Describe the data you need (individual passenger class on consecutive flights or boarding records with class changes, timestamps, flight IDs) and how to collect or simulate it ethically and legally. Clean the data so each record becomes a time-ordered sequence of class states per passenger; remove ambiguous or incomplete sequences and document the number of transitions removed. Define clearly the state space {E, P, B} and any modelling assumptions you are making (time-homogeneity, Markov property, independence between passengers, flight-level versus passenger-level time step). Put these assumptions in the introduction and note their practical implications for interpretation of results without changing the research question itself. In the main analysis, show step-by-step how you estimate the transition matrix from data: count observed transitions n_{ij} from state i to j, compute empirical probabilities p_{ij}=n_{ij}/sum_j n_{ij}, and present the fitted 3×3 matrix with a full worked example of at least one row. Check matrix properties (rows sum to 1) and test model conditions: examine irreducibility and aperiodicity to justify computing a unique stationary distribution. Compute the steady-state (stationary) distribution by solving πP=π (show algebra or eigenvector method) and interpret the steady-state proportions in the airline context. For error analysis, define the one-step prediction RMSE explicitly (compare observed next-state one-hot vectors to predicted probabilities), compute RMSE per origin state and overall, and show sample calculations and tables of results. Include visual aids: transition diagrams, heatmap of the matrix, time-series plots of class proportions, and a small appendix showing code snippets or calculator steps used to compute matrix algebra and RMSE so the examiner can follow your process. In the conclusion and evaluation, restate how the research question was addressed and summarise the key quantitative findings (fitted matrix, steady-state proportions, RMSE values). Critically evaluate model fit and assumptions: discuss sources of bias (sampling periods, seasonality within July–September, passenger churn, non-Markov effects), statistical uncertainty (bootstrap or confidence intervals for p_{ij}), and sensitivity (how small changes in counts affect steady states). Suggest realistic, exam-appropriate extensions that do not change the research question—for example comparing different weeks, using separate matrices per route, or adding validation by holding out days to test predictive RMSE—and explicitly state any limitations that would affect the reliability of your conclusions.

Relevant Exemplars

View 100+

Modeling Monopoly as a Markov Chain

Mathematics behind airline ticket pricing: modelling the prices and overbooking

Investigating winning strategies for the board game 'Ticket to Ride'.

Medium

To what extent can a truncated Fourier series (up to the 5th harmonic) accurately model and predict monthly average temperatures in Reykjavik from 2000 to 2020, measured by parameter estimates and out-of-sample RMSE for 2019–2020?

Suggested Approach

Start by framing your research question clearly on the cover page and introduction: state that you will model monthly average temperatures in Reykjavik from 2000–2020 using a truncated Fourier series up to the 5th harmonic, and that you will compare parameter estimates and calculate out-of-sample RMSE for 2019–2020. Collect the monthly temperature data from a reliable source (Icelandic Meteorological Office, NOAA, or other official datasets), record any preprocessing steps (handling missing months, converting units, detrending if needed) and justify them. Explain the mathematical background of Fourier series briefly and list the assumptions you make (periodicity of one year, stationarity of seasonal pattern across years, independence of residuals). Keep a reproducible workflow: supply the exact dataset, show code or calculator steps for fitting, and include a table of parameter estimates for each harmonic (amplitude and phase) and the constant term so the examiner can follow your calculations step by step.

In the main analysis, show how you construct the truncated Fourier model: y(t) = a0 + sum_{n=1}^5 [an cos(2π n t/12) + bn sin(2π n t/12)] where t indexes months. Explain how you estimate the coefficients (least squares linear regression using the basis functions, matrix notation or normal equations) and present sample calculations. Produce diagnostic plots: fitted curve versus observed monthly averages across 2000–2018 (training), residual plots, and the periodogram or Fourier transform to justify the choice of five harmonics. Split the data explicitly so 2000–2018 is training and 2019–2020 is the test set; compute in-sample measures (R^2, residual standard error) and the out-of-sample RMSE for 2019–2020, showing formulas and worked examples. Discuss statistical significance or uncertainty of parameter estimates (standard errors, confidence intervals) and how they affect prediction reliability.

Conclude by evaluating to what extent the truncated Fourier series answers your research question: summarize numerical findings (parameter magnitudes, how much variance is explained, test RMSE) and interpret whether five harmonics capture the seasonal and sub-seasonal variation. Critically assess limitations (climatic trends, extreme months, nonstationarity, autocorrelation) and suggest realistic extensions that you can mention briefly in the evaluation (longer harmonics, trend components, AR terms, cross-validation). Finally, ensure your IA follows the required structure (introduction, detailed math in the main body with all steps shown, conclusion/evaluation, references) and include clear annotated graphs and appendices with code or full calculations so the examiner can verify your work.

Relevant Exemplars

View 100+

Modeling the Global Mean Temperatures

Modelling a trigonometric function of annual temperature in Warsaw, Poland

Exploring whether the greatest footballer in history can be accurately modeled using Fourier series

Medium

How well does a logistic regression model predict the probability that a vehicle sold in Germany between 2010 and 2020 is electric based on its manufacturer, price band, and year, as evaluated by model coefficients, AUC, and classification accuracy on a holdout sample?

Suggested Approach

Start by framing your research question clearly at the top of your introduction and explain the real-world motivation for studying electric vehicle adoption in Germany from 2010–2020. Describe your data source(s) and give an itemised plan in prose: obtain a dataset of vehicle sales with manufacturer, price band, year and a binary electric indicator; perform initial cleaning (remove duplicates, handle missing values, verify units and time range); and create sensible categorical encodings (one‑hot or effect coding for manufacturers and price bands, or combine rare manufacturers into an “other” category). Justify your choices briefly with respect to preserving statistical power and avoiding perfect separation. State the assumptions of logistic regression you will rely on (independent observations, linearity of log-odds for any numeric predictors such as year or price if treated continuously) and note which assumptions you will check empirically (multicollinearity, influential observations, goodness-of-fit). Mention the page limits and structure expectations so you plan content accordingly: concise introduction, detailed main body with calculations and graphs, and a focused conclusion/evaluation section that links back to the research question.

In the main body, show every analytical step with enough detail that an examiner can follow and award partial credit. Split your modelling into clear stages: exploratory data analysis (frequency tables, bar charts of EV proportion by manufacturer and price band, trends over time), model specification (exact predictor coding and any interaction terms you choose), model fitting on a training set (e.g. 70–80% split) and final evaluation on a holdout sample. Report logistic regression coefficients with standard errors and odds ratios, and interpret them in plain language (for example, how much the odds of being electric change for different manufacturers or between price bands). Produce and discuss ROC curves, compute AUC with confidence intervals (bootstrap if possible), and present a confusion matrix for a chosen probability threshold together with overall classification accuracy, precision and recall. Explain trade-offs when choosing thresholds and consider using cross-validation to assess model stability and to avoid overfitting.

Finish with a concise conclusion that restates how well the model answers the research question using the chosen metrics, and include a critical evaluation: discuss limitations (sample bias, omitted variables like range or incentives, temporal changes in technology), robustness checks you performed, and sensible extensions (different model families, finer price splits, or temporal models). Throughout, include worked example calculations, labelled figures and tables, and references for data sources and any statistical methods; keep your writing mathematical but accessible, focusing on clarity of interpretation and honest appraisal of what the model can and cannot claim about electric vehicle probabilities.

Relevant Exemplars

View 100+

Investigating how to ensure retention of value by selling your car at the optimum moment

Where would be the most optimal region to travel to, based on a statistical investigation and the potential correlation between traffic casualties and oil prices?

Can Mathematics Predict The Future?

Easy

What is the cylinder radius and height that minimise the surface area (material used) for a canned product with fixed volume 500 cm³, determined using Lagrange multipliers and validated by a numerical search for the global minimum?

Suggested Approach

Begin by framing your essay around the research question exactly as written: “What is the cylinder radius and height that minimise the surface area (material used) for a canned product with fixed volume 500 cm³, determined using Lagrange multipliers and validated by a numerical search for the global minimum?” In your introduction explain why this is realistic (packaging efficiency), state the fixed volume (500 cm³) and list assumptions clearly (thin-walled cylinder, closed top and bottom, no seams, material uniform), and define the variables r and h with units. Give the necessary theoretical background concisely: formulae for volume V = πr²h and surface area S = 2πr² + 2πrh, and a short sentence about constrained optimisation and why Lagrange multipliers are appropriate. Make clear the aim: derive the analytic minimum using Lagrange multipliers and then confirm it numerically to guard against local minima or algebraic mistakes. Mention what software or tools you will use for the numerical search (e.g. GeoGebra/Desmos, a Python script with NumPy, or a spreadsheet) so the examiner can replicate your work. State expected page limits and where each piece will appear (briefly: intro, derivation, numerical validation, graphs, conclusion and evaluation, references).

In the main body show every mathematical step with clarity. Start by setting up the Lagrangian L(r,h,λ)=S(r,h)+λ(V_target−πr²h), substitute V_target = 500, take partial derivatives ∂L/∂r, ∂L/∂h, ∂L/∂λ and solve the system symbolically for r and h. Explain algebraic manipulations and include a justification that solutions are physically meaningful (r>0, h>0). Perform a second-derivative test appropriate for constrained problems — either compute the bordered Hessian or use substitution to reduce to one variable and check the second derivative sign — and explain why this indicates a minimum. Present the numerical values with units and appropriate significant figures, and include representative hand calculations to show your process.

For validation and write-up, perform a numerical global search across a sensible domain (for example r in (0.1,10) cm and solve h = 500/(πr²) or grid-evaluate S(r) after substitution) and plot S versus r to show the minimum visually. Describe your search method (grid resolution, any optimization routine used, and how you checked for global vs local minima) and include screenshots/plots and code snippets in an appendix. Conclude by restating whether the numerical minimum matches the analytic result, discuss limitations (idealised cylinder, manufacturing constraints), suggest small extensions or sensitivity tests (varying volume), and provide a precise reference list for all tools and sources used.

Relevant Exemplars

View 100+

Surface area minimization for 3-dimensional goods

What is the optimal relationship between radius and height that minimises the surface area of cylinders and cones while keeping a fixed volume?

Cylinder chronicles - Unlocking the mathematical secrets of can packaging

Hard

How does the dominant eigenvalue of a Leslie matrix constructed for the coastal trout population in Lake Y (age classes 0–4, using 2015–2020 survival and fecundity estimates) determine long-term growth rate, and how sensitive is the projected population growth to ±10% changes in juvenile survival?

Suggested Approach

Begin by anchoring your work to the exact research question: "How does the dominant eigenvalue of a Leslie matrix constructed for the coastal trout population in Lake Y (age classes 0–4, using 2015–2020 survival and fecundity estimates) determine long-term growth rate, and how sensitive is the projected population growth to ±10% changes in juvenile survival?" Explain the biological context briefly and state why a Leslie matrix is appropriate. Describe your data sources (2015–2020 survival and fecundity estimates), show how you build the 5×5 Leslie matrix from those values, and state assumptions explicitly (e.g., closed population, constant rates aside from the juvenile perturbation). Include any data cleaning steps and justify choices such as averaging the multi-year estimates or using medians. Make clear which entries correspond to fecundity and which to survival, and provide at least one worked example so the examiner can follow your matrix construction step by step.

Carry out the mathematical analysis in a logical sequence and show all calculations. Compute eigenvalues and eigenvectors of the Leslie matrix (analytically when feasible, otherwise numerically using a clear method or software such as a matrix-capable calculator, Python, R, or Excel) and identify the dominant eigenvalue; interpret it as the long-run population growth factor (λ) and, if helpful, convert to a percentage or intrinsic growth rate r = ln(λ). Use the dominant eigenvector to discuss stable age distribution and relate that to ecological implications. For sensitivity analysis, vary the juvenile survival rate by ±10% and recompute the dominant eigenvalue for each scenario; present results in a small table and with a simple graph showing λ versus juvenile survival. Explain the math behind sensitivity or elasticity analysis briefly and, if possible, compute the sensitivity of λ to changes in juvenile survival using partial derivatives or matrix perturbation formulas.

In writing the essay, keep the structure required by the IA: concise introduction with rationale and objectives, a detailed main body showing derivations, calculations, graphs and interpretations, and a conclusion that answers the research question directly. Discuss limitations of the Leslie model and data (time frame, environmental variability, density dependence), and suggest realistic extensions (e.g., stochastic matrices, finer age classes, or longer data spans). Reference all data and software, include appendices for raw data and full code or calculator steps, and ensure your mathematics is accurate, well-explained and connected to the ecological meaning throughout so the examiner can follow both the math and its real-world significance.

Relevant Exemplars

View 100+

Which animal family can best undergo optimal muscle tissue growth constrained by their physiological parameters by applying constrained optimisation using Lagrange multipliers?

Modelling Canada Lynx and Snowshoe Hare Populations Using Lotka-Volterra Differential Equations

Investigating How the Population of a Species can be Modelled Using Differential Equations

Generate the Best Math AA IA Research Questions

Our AI quickly transforms your keywords into unique, high-quality research questions. The process is simple: Select your subject, enter a few keywords, or leave the field blank for instant inspiration. Click 'Generate' to start browsing ideas.

Master Your Coursework, Maximize Your Grade.

Gain unlimited AI topic generations & evaluations, unlimited access to all exemplars, examiner mark schemes, and more.

The Fast Track To Your
Best IB Coursework & College Essays

Products

New

Exemplars

IA Exemplars

EE Exemplars

TOK Exemplars

Common App Essay Exemplars

Supplements Essay Exemplars

Company

Legal

All content on this website has been developed independently from and is not endorsed by the International Baccalaureate Organization. International Baccalaureate and IB are registered trademarks owned by the International Baccalaureate Organization.