A three-stage clinical decision support console for Systemic Lupus Erythematosus.

Biomed09 · Team
Manna Berry  ·  Kiwi Lin  ·  Udit Samant  ·  Hadi Shafat  ·  Minh Hieu Tran  ·  Jillian Zhao

DATA3888 · 2026
The University of Sydney

Background and Motivations

Disease Context

SLE is complex and unpredictable

Target disease - Systemic Lupus Erythematosus (SLE) (NIAMS, 2022; MedlinePlus, 2024).
Autoimmune, multi-organ disease (NIAMS, 2022; MedlinePlus, 2024).
Symptoms vary among patients (NIAMS, 2022; NHS, 2023; MedlinePlus Medical Encyclopedia, 2025).
No single definitive test (Aringer & Bertsias, 2025).

Current Gap

Current care is still reactive

Diagnosis is difficult (Aringer & Bertsias, 2025).
Progression is hard to anticipate (Lin et al., 2024).
Treatment varies (Lin et al., 2024).

A three-stage machine learning pipeline

01Diagnosis

Does this patient likely have SLE?

02Progression

Is the disease likely to worsen or flare?

03Treatment

How likely is the patient to respond to treatment?

Methodology: modelling workflow

01 Data collection

  • DiagnosisGSE72509
  • ProgressionGSE65391, GSE49454
  • TreatmentGSE224705

02 Pre-processing & EDA

  • Dataset cleaning
  • Quality check
  • Standardisation

03 Gene selection

Finalised gene panel.

04 Modelling

limma
LASSO
Elastic Net
GBM
Linear SVMs
Random Forests

05 Performance evaluation

Imbalanced data
Stratified 5-fold cross-validation
AUROC, balanced accuracy, macro-F1

Table of Performance Metrics

Random Forest outperformed candidate models across the three clinical stages.

01 Diagnosis
AUROC 0.972
Bal-Acc 0.939
Macro F1 0.950
Accuracy 0.974

0.97AUROC

RF beat limma signature and LASSO on macro-F1. 5-fold stratified CV · GSE72509 · 117 samples.

02 Progression
AUC 0.823
Accuracy 0.821
Bal-Acc 0.734
Sensitivity 0.903

0.82AUC

Best balance of AUC and balanced accuracy. Trained on GSE65391, tested on GSE49454.

03 Treatment
AUROC 0.865
Accuracy 0.834
Macro F1 0.791
Bal-Acc 0.789

0.87AUROC

RF topped balanced accuracy, macro-F1, MCC and AUROC. 5-fold CV with balanced bootstrap · GSE224705.

Selection rule All candidates used the same stratified folds; final model choice prioritised imbalance-aware metrics.

Live demonstration

From a blood sample to a recommendation in one click.

In summary.

Data & features
ML prediction
Deploy

One shared workflow keeps every stage consistent and extensible.

Where we go next.

Current limitation: expects a specific input data format.
Integration need: preprocessing support for real-world data.
Next steps: prospective validation and gene-panel expansion.

Thank you.

Biomed09 · Team
Manna Berry  ·  Kiwi Lin  ·  Udit Samant  ·  Hadi Shafat  ·  Minh Hieu Tran  ·  Jillian Zhao

DATA3888 · 2026
The University of Sydney

Appendix · 03 / 06

Dataset sources.

  • Hung, T., Pratt, G. A., Sundararaman, B., Townsend, M. J., Chaivorapol, C., Bhangale, T., Graham, R. R., Ortmann, W., Bhangale, T. R., Behrens, T. W., Yeo, G. W., & Chaussabel, D. (2015). The Ro60 autoantigen binds endogenous retroelements and regulates inflammatory gene expression in systemic lupus erythematosus [Data set; GSE72509]. NCBI Gene Expression Omnibus. https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE72509
  • Banchereau, R., Hong, S., Cantarel, B., Baldwin, N., Baisch, J., Edens, M., Cepika, A.-M., Acs, P., Turner, J., Anguiano, E., Vinod, P., Kahn, S., Obermoser, G., Blankenship, D., Wakeland, E., Nassi, L., Gotte, A., Punaro, M., Liu, Y.-J., … Pascual, V. (2016). Personalized immunomonitoring uncovers molecular networks that stratify lupus patients. Cell, 165(3), 551–565. https://doi.org/10.1016/j.cell.2016.03.008
  • Chiche, L., Jourde-Chiche, N., Whalen, E., Presnell, S., Gersuk, V., Dang, K., Anguiano, E., Quinn, C., Burtey, S., Berland, Y., Kaplanski, G., Harlé, J.-R., Pascual, V., & Chaussabel, D. (2014). Modular transcriptional repertoire analyses of adults with systemic lupus erythematosus reveal distinct type I and type II interferon signatures. Arthritis & Rheumatology, 66(6), 1583–1595. https://doi.org/10.1002/art.38628
  • NCBI Gene Expression Omnibus. (2023). Whole-blood microarray expression in lupus nephritis: Treatment response by SRI-4 [Data set; GSE224705]. https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE224705

Appendix · 04 / 06

Libraries & frameworks.

  • R Core Team. (2024). R: A language and environment for statistical computing (Version 4.x) [Computer software]. R Foundation for Statistical Computing. https://www.R-project.org/
  • Liaw, A., & Wiener, M. (2002). Classification and regression by randomForest. R News, 2(3), 18–22.
  • Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 785–794). Association for Computing Machinery. https://doi.org/10.1145/2939672.2939785
  • Friedman, J., Hastie, T., & Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33(1), 1–22. https://doi.org/10.18637/jss.v033.i01
  • Kursa, M. B., & Rudnicki, W. R. (2010). Feature selection with the Boruta package. Journal of Statistical Software, 36(11), 1–13. https://doi.org/10.18637/jss.v036.i11
  • Ritchie, M. E., Phipson, B., Wu, D., Hu, Y., Law, C. W., Shi, W., & Smyth, G. K. (2015). limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Research, 43(7), e47. https://doi.org/10.1093/nar/gkv007
  • Siriseriwan, W. (2019). smotefamily: A collection of oversampling techniques for class imbalance problem based on SMOTE (Version 1.3.1) [R package]. https://CRAN.R-project.org/package=smotefamily
  • Schloerke, B., & Allen, J. (2024). plumber: An API generator for R [R package]. https://www.rplumber.io/

Appendix · 05 / 06

Research papers & models.

Clinical background, methods, prior art, and scoring referenced in the deck.

  • National Institute of Arthritis and Musculoskeletal and Skin Diseases. (2022). Systemic lupus erythematosus (lupus) [Last reviewed October 2022]. Retrieved April 23, 2026, from https://www.niams.nih.gov/health-topics/lupus
  • MedlinePlus. (2024). Lupus [Last updated July 1, 2024]. Retrieved April 23, 2026, from https://medlineplus.gov/lupus.html
  • National Health Service. (2023). Lupus [Page last reviewed July 19, 2023]. Retrieved April 23, 2026, from https://www.nhs.uk/conditions/lupus/
  • MedlinePlus Medical Encyclopedia. (2025). Systemic lupus erythematosus [Review date January 28, 2025]. Retrieved April 23, 2026, from https://medlineplus.gov/ency/article/000435.htm
  • Aringer, M., & Bertsias, G. (2025). Early diagnosis of systemic lupus erythematosus. Rare Disease and Orphan Drugs Journal, 4, 13. https://doi.org/10.20517/rdodj.2024.59
  • Lin, A., Wakhlu, A., & Connelly, K. (2024). Disease activity assessment in systemic lupus erythematosus. Frontiers in Lupus, 2. https://doi.org/10.3389/flupu.2024.1442013
  • Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32. https://doi.org/10.1023/A:1010933404324
  • Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 785–794). Association for Computing Machinery. https://doi.org/10.1145/2939672.2939785
  • Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273–297. https://doi.org/10.1007/BF00994018
  • Kursa, M. B., & Rudnicki, W. R. (2010). Feature selection with the Boruta package. Journal of Statistical Software, 36(11), 1–13. https://doi.org/10.18637/jss.v036.i11
  • Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321–357. https://doi.org/10.1613/jair.953
  • Lundberg, S. M., & Lee, S.-I. (2017). A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems (Vol. 30, pp. 4765–4774). Curran Associates.
  • Gladman, D. D., Ibañez, D., & Urowitz, M. B. (2002). Systemic lupus erythematosus disease activity index 2000. The Journal of Rheumatology, 29(2), 288–291.
  • Furie, R., Petri, M. A., Wallace, D. J., Ginzler, E. M., Merrill, J. T., Stohl, W., Chatham, W. W., Strand, V., Weinstein, A., & Chevrier, M. (2009). Novel evidence-based systemic lupus erythematosus responder index. Arthritis & Rheumatism, 61(9), 1143–1151. https://doi.org/10.1002/art.24698

Appendix · 06 / 06

Team & acknowledgements.

DATA3888 · 2026
The University of Sydney

Team members Biomed09
Manna Berry [email protected] Faculty of Engineering J12, The University of Sydney, NSW 2006
Lezhi Lin [email protected] School of Mathematics and Statistics F07, The University of Sydney, NSW 2006 Australia
Udit Samant email pending School of Computer Science J12, The University of Sydney, NSW 2006 Australia
Hadi Shafat [email protected] School of Computer Science J12, The University of Sydney, NSW 2006 Australia
Minh Hieu Tran email pending School of Computer Science J12, The University of Sydney, NSW 2006 Australia
Jillian Zhao [email protected] School of Computer Science J12, The University of Sydney, NSW 2006 Australia
Supervisors

Dr. Andy Tran · Elyna Lin

Many thanks to our supervisors Andy and Elyna for the weekly guidance, thoughtful feedback, and steady support throughout the project.

Acknowledgements
  • Submitted in partial fulfillment of the assessment requirements for DATA3888 Data Science Capstone at The University of Sydney.
  • We also acknowledge the original data contributors and study participants behind the public GEO cohorts. Their shared expression and clinical metadata made the modelling, validation, and patient-level demonstrations possible.
  • Our work also rests on the work of open-source maintainers across R, Bioconductor, and the modelling libraries used here, as well as the DATA3888 teaching team for project structure and course support.