A three-stage clinical decision support console for Systemic Lupus Erythematosus.

Biomed 09 · Team
Manna Berry  ·  Lezhi Lin  ·  Udit Samant  ·  Hadi Shafat  ·  Minh Hieu Tran  ·  Jillian Zhao

DATA3888 · 2026
The University of Sydney

Background and Motivations

Disease Context

SLE is complex and unpredictable

Target disease - Systemic Lupus Erythematosus (SLE) (NIAMS, 2022; MedlinePlus, 2024).
Autoimmune, multi-organ disease (NIAMS, 2022; MedlinePlus, 2024).
Symptoms vary among patients (NIAMS, 2022; NHS, 2023; MedlinePlus Medical Encyclopedia, 2025).
No single definitive test (Aringer & Bertsias, 2025).

A three-stage machine learning pipeline

01Diagnosis

Does this patient likely have SLE?

02Progression

Is the disease likely to worsen or flare?

03Treatment

How likely is the patient to respond to treatment?

Methodology: modelling workflow

01 Data collection

  • DiagnosisGSE72509
  • ProgressionGSE65391, GSE49454
  • TreatmentGSE224705

02 Pre-processing & EDA

  • Dataset cleaning
  • Quality check
  • Standardisation

03 Gene selection

Finalised gene panel.

04 Modelling

Limma
LASSO
Elastic Net
GBM
Linear SVM
Random Forest

05 Performance evaluation

Imbalanced data
Stratified 5-fold cross-validation
AUROC, Balanced Accuracy, Macro-F1

Final model: Random Forest

01 Diagnosis
02 Progression
03 Treatment
AUROC Higher is better
RF 0.972
LASSO 0.962
limma 0.953

0.972AUROC

RF 0.852
Elastic Net 0.845
GBM 0.836

0.852AUROC

RF 0.865
Elastic Net 0.780
LASSO 0.757

0.865AUROC

Balanced Accuracy Higher is better
RF 0.939
LASSO 0.934
limma 0.881

0.939BAL-ACC

RF 0.790
Elastic Net 0.801
GBM 0.791

0.790BAL-ACC

RF 0.789
Elastic Net 0.736
LASSO 0.722

0.789BAL-ACC

From a blood sample to a recommendation in one click.

Conclusion

Shared workflow

Data & features
Three-stage ML
Deploy

Primary limitation

The app currently requires a fixed input format.

Future work

Real-world integration

Connect with hospital lab or Electronic Health Record systems.

Preprocessing support

Add automatic checks for incoming clinical data.

Gene panel

Expand coverage for other lupus related diseases.

Thank you.

Biomed 09 · Team
Manna Berry  ·  Lezhi Lin  ·  Udit Samant  ·  Hadi Shafat  ·  Minh Hieu Tran  ·  Jillian Zhao

DATA3888 · 2026
The University of Sydney

Appendix · 01 / 05

Full model performance.

01 Diagnosis
Model AUROC Balanced accuracy macro_F1
RF 0.972 0.939 0.950
LASSO 0.962 0.934 0.934
limma 0.953 0.881 0.849
02 Progression
Model AUROC Balanced accuracy macro_F1
RF 0.852 0.790 0.789
Elastic Net 0.845 0.801 0.800
GBM 0.836 0.791 0.789
03 Treatment
Model AUROC Balanced accuracy macro_F1
RF 0.865 0.789 0.791
Elastic Net 0.780 0.736 0.723
LASSO 0.757 0.722 0.698
Linear SVM 0.792 0.709 0.682
limma 0.656 0.621 0.621

Appendix · 02 / 05

Dataset sources.

  • Hung, T., Pratt, G. A., Sundararaman, B., Townsend, M. J., Chaivorapol, C., Bhangale, T., Graham, R. R., Ortmann, W., Bhangale, T. R., Behrens, T. W., Yeo, G. W., & Chaussabel, D. (2015). The Ro60 autoantigen binds endogenous retroelements and regulates inflammatory gene expression in systemic lupus erythematosus [Data set; GSE72509]. NCBI Gene Expression Omnibus. https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE72509
  • Banchereau, R., Hong, S., Cantarel, B., Baldwin, N., Baisch, J., Edens, M., Cepika, A.-M., Acs, P., Turner, J., Anguiano, E., Vinod, P., Kahn, S., Obermoser, G., Blankenship, D., Wakeland, E., Nassi, L., Gotte, A., Punaro, M., Liu, Y.-J., … Pascual, V. (2016). Personalized immunomonitoring uncovers molecular networks that stratify lupus patients. Cell, 165(3), 551–565. https://doi.org/10.1016/j.cell.2016.03.008
  • Chiche, L., Jourde-Chiche, N., Whalen, E., Presnell, S., Gersuk, V., Dang, K., Anguiano, E., Quinn, C., Burtey, S., Berland, Y., Kaplanski, G., Harlé, J.-R., Pascual, V., & Chaussabel, D. (2014). Modular transcriptional repertoire analyses of adults with systemic lupus erythematosus reveal distinct type I and type II interferon signatures. Arthritis & Rheumatology, 66(6), 1583–1595. https://doi.org/10.1002/art.38628
  • NCBI Gene Expression Omnibus. (2023). Whole-blood microarray expression in lupus nephritis: Treatment response by SRI-4 [Data set; GSE224705]. https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE224705

Appendix · 03 / 05

Libraries & frameworks.

  • R Core Team. (2024). R: A language and environment for statistical computing (Version 4.x) [Computer software]. R Foundation for Statistical Computing. https://www.R-project.org/
  • Liaw, A., & Wiener, M. (2002). Classification and regression by randomForest. R News, 2(3), 18–22.
  • Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 785–794). Association for Computing Machinery. https://doi.org/10.1145/2939672.2939785
  • Friedman, J., Hastie, T., & Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33(1), 1–22. https://doi.org/10.18637/jss.v033.i01
  • Kursa, M. B., & Rudnicki, W. R. (2010). Feature selection with the Boruta package. Journal of Statistical Software, 36(11), 1–13. https://doi.org/10.18637/jss.v036.i11
  • Ritchie, M. E., Phipson, B., Wu, D., Hu, Y., Law, C. W., Shi, W., & Smyth, G. K. (2015). limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Research, 43(7), e47. https://doi.org/10.1093/nar/gkv007
  • Siriseriwan, W. (2019). smotefamily: A collection of oversampling techniques for class imbalance problem based on SMOTE (Version 1.3.1) [R package]. https://CRAN.R-project.org/package=smotefamily
  • Schloerke, B., & Allen, J. (2024). plumber: An API generator for R [R package]. https://www.rplumber.io/

Appendix · 04 / 05

Research papers & models.

Clinical background, methods, prior art, and scoring referenced in the deck.

  • National Institute of Arthritis and Musculoskeletal and Skin Diseases. (2022). Systemic lupus erythematosus (lupus) [Last reviewed October 2022]. Retrieved April 23, 2026, from https://www.niams.nih.gov/health-topics/lupus
  • MedlinePlus. (2024). Lupus [Last updated July 1, 2024]. Retrieved April 23, 2026, from https://medlineplus.gov/lupus.html
  • National Health Service. (2023). Lupus [Page last reviewed July 19, 2023]. Retrieved April 23, 2026, from https://www.nhs.uk/conditions/lupus/
  • MedlinePlus Medical Encyclopedia. (2025). Systemic lupus erythematosus [Review date January 28, 2025]. Retrieved April 23, 2026, from https://medlineplus.gov/ency/article/000435.htm
  • Aringer, M., & Bertsias, G. (2025). Early diagnosis of systemic lupus erythematosus. Rare Disease and Orphan Drugs Journal, 4, 13. https://doi.org/10.20517/rdodj.2024.59
  • Lin, A., Wakhlu, A., & Connelly, K. (2024). Disease activity assessment in systemic lupus erythematosus. Frontiers in Lupus, 2. https://doi.org/10.3389/flupu.2024.1442013
  • Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32. https://doi.org/10.1023/A:1010933404324
  • Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 785–794). Association for Computing Machinery. https://doi.org/10.1145/2939672.2939785
  • Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273–297. https://doi.org/10.1007/BF00994018
  • Kursa, M. B., & Rudnicki, W. R. (2010). Feature selection with the Boruta package. Journal of Statistical Software, 36(11), 1–13. https://doi.org/10.18637/jss.v036.i11
  • Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321–357. https://doi.org/10.1613/jair.953
  • Lundberg, S. M., & Lee, S.-I. (2017). A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems (Vol. 30, pp. 4765–4774). Curran Associates.
  • Gladman, D. D., Ibañez, D., & Urowitz, M. B. (2002). Systemic lupus erythematosus disease activity index 2000. The Journal of Rheumatology, 29(2), 288–291.
  • Furie, R., Petri, M. A., Wallace, D. J., Ginzler, E. M., Merrill, J. T., Stohl, W., Chatham, W. W., Strand, V., Weinstein, A., & Chevrier, M. (2009). Novel evidence-based systemic lupus erythematosus responder index. Arthritis & Rheumatism, 61(9), 1143–1151. https://doi.org/10.1002/art.24698

Appendix · 05 / 05

Team & acknowledgements.

DATA3888 · 2026
The University of Sydney

Team members Biomed 09
Manna BerryDevelopment of Progression Model & Assistance with Backend
[email protected] Faculty of Engineering J12, The University of Sydney, NSW 2006
Lezhi LinDevelopment of App Frontend & Presentation Slides
[email protected] School of Mathematics and Statistics F07, The University of Sydney, NSW 2006 Australia
Udit SamantDevelopment of Diagnosis Model & General App Backend
email pending School of Computer Science J12, The University of Sydney, NSW 2006 Australia
Hadi ShafatInterdisciplinary Aspects Research& Assistance with Backend
[email protected] School of Computer Science J12, The University of Sydney, NSW 2006 Australia
Minh Hieu TranAssistance with Initial Data Analysis
email pending School of Computer Science J12, The University of Sydney, NSW 2006 Australia
Jillian ZhaoDevelopment of Treatment Model & Assistance with Backend & Background Research
[email protected] School of Computer Science J12, The University of Sydney, NSW 2006 Australia
Supervisors

Dr. Andy Tran · Elyna Lin

Many thanks to our supervisors Andy and Elyna for the weekly guidance, thoughtful feedback, and steady support throughout the project.

Acknowledgements
  • Submitted in partial fulfillment of the assessment requirements for DATA3888 Data Science Capstone at The University of Sydney.
  • We also acknowledge the original data contributors and study participants behind the public GEO cohorts. Their shared expression and clinical metadata made the modelling, validation, and patient-level demonstrations possible.
  • Our work also rests on the work of open-source maintainers across R, Bioconductor, and the modelling libraries used here, as well as the DATA3888 teaching team for project structure and course support.