Application of Multivariate Statistical Techniques in Bioprocess Optimization and Production Yield Improvement

Bernabe Canqui Flores; Percy Huata Panca; Juan Reinaldo Paredes Quispe; Charles Mendoza Mollocondo; Godofredo Quispe Mamani; Juan Carlos Juarez Vargas; Fred Torres-Cruz

doi:10.51847/q27bv1w51D

2026 Volume 17 Issue 2

Application of Multivariate Statistical Techniques in Bioprocess Optimization and Production Yield Improvement

Bernabe Canqui Flores , Percy Huata Panca , Juan Reinaldo Paredes Quispe , Charles Mendoza Mollocondo , Godofredo Quispe Mamani , Juan Carlos Juarez Vargas , Fred Torres-Cruz^*✉

Abstract

Bioprocess optimization depends on understanding many interacting variables that rarely change independently. Temperature, pH, dissolved oxygen, aeration, agitation, substrate feed, and base addition often move together across a batch, creating strongly correlated process trajectories rather than simple one-factor effects. Multivariate techniques such as principal component analysis and partial least squares regression are therefore natural tools for reducing dimensionality, monitoring deviations, and relating process conditions to production yield. Many published examples of multivariate bioprocess modeling report clean data structures, high explained variance, and visually convincing separation between normal and abnormal batches. Industrial and pilot-scale data are usually less orderly. Missing values, probe drift, irregular sampling, outlying batches, collinearity, non-normal distributions, and unbalanced operating conditions often reduce model stability and limit the reliability of optimization claims. This manuscript applies multivariate analysis to a realistically imperfect penicillin fermentation dataset derived from the industrial-scale IndPenSim benchmark. The objective is not to demonstrate an unrealistically accurate predictive model, but to evaluate how PCA and PLS behave when the dataset contains missing values, outliers, correlated variables, batch heterogeneity, and time-varying dynamics. The study focuses on identifying process variables associated with yield while reporting the uncertainty and fragility of those conclusions. The working dataset consisted of 50 fed-batch fermentation runs selected from the IndPenSim benchmark, with eight principal process variables and final penicillin yield as the response. Five percent of online sensor observations were masked to represent intermittent sensor failure, and missing values were handled by NIPALS-based PCA imputation rather than row deletion. PCA was used for exploratory monitoring, Hotelling’s T² and SPE statistics were used for outlier diagnosis, and PLS regression was fitted using leave-one-batch-out cross-validation. PCA identified two strong outlier batches, representing 4% of the analyzed batch set, with abnormal dissolved oxygen and substrate-feed trajectories. The first two principal components explained 62% of total process variance, which was sufficient for monitoring but not enough to claim complete process compression. A three-component PLS model explained 61% of calibration yield variance and achieved a cross-validated prediction value of R²_pred = 0.58, with RMSEP still large enough to caution against narrow setpoint prescriptions. The analysis shows that multivariate statistical techniques can extract useful structure from imperfect bioprocess data, but they do not remove the practical difficulty of interpreting noisy and correlated process histories. Temperature, feed rate, and dissolved oxygen behavior emerged as influential yield-related variables, but VIP scores varied across validation folds and were sensitive to abnormal batches. The resulting optimization recommendations should therefore be treated as operating ranges for further experimental confirmation, not as final process-control prescriptions.

How to cite this article

Vancouver

Flores BC, Panca PH, Quispe JRP, Mollocondo CM, Mamani GQ, Vargas JCJ, et al. Application of Multivariate Statistical Techniques in Bioprocess Optimization and Production Yield Improvement. J Biochem Technol. 2026;17(2):19-29. https://doi.org/10.51847/q27bv1w51D

APA

Flores, B. C., Panca, P. H., Quispe, J. R. P., Mollocondo, C. M., Mamani, G. Q., Vargas, J. C. J., & Torres-Cruz, F. (2026). Application of Multivariate Statistical Techniques in Bioprocess Optimization and Production Yield Improvement. Journal of Biochemical Technology, 17(2), 19-29. https://doi.org/10.51847/q27bv1w51D

Articles

Anticuagulants: An Overview of Natural and Synthetic Therapeutic Anticoagulants

Vol 12 Issue 1, 2021 | Chandrasekhar Chanda

View Download

Pleurotus ostreatus: an oyster mushroom with nutritional and medicinal properties

Vol 5 Issue 2, 2014 |

View Download

Effects of Sleep Deprivation on Learning and Memory: A Review Study

Vol 14 Issue 4, 2023 | Darius Davidescu

View Download

MICROBIAL DEGRADATION OF PLASTIC: A REVIEW

Vol 6 Issue 2, 2015 |