A three-stage clinical decision support console for Systemic Lupus Erythematosus.

Biomed09 · Team
Manna Berry  ·  Kiwi Lin  ·  Udit Samant  ·  Hadi Shafat  ·  Minh Hieu Tran  ·  Jillian Zhao

DATA3888 · 2026
The University of Sydney

Background and Motivations

Disease Context

SLE is complex and unpredictable

Target disease - Systemic Lupus Erythematosus (SLE) (NIAMS, 2022; MedlinePlus, 2024).
Autoimmune, multi-organ disease (NIAMS, 2022; MedlinePlus, 2024).
Symptoms vary among patients (NIAMS, 2022; NHS, 2023; MedlinePlus Medical Encyclopedia, 2025).
No single definitive test (Aringer & Bertsias, 2025).

Current Gap

Current care is still reactive

Diagnosis is difficult (Aringer & Bertsias, 2025).
Progression is hard to anticipate (Lin et al., 2024).
Treatment varies (Lin et al., 2024).

A three-stage machine learning pipeline

01Diagnosis

Does this patient likely have SLE?

02Progression

Is the disease likely to worsen or flare?

03Treatment

How likely is the patient to respond to treatment?

Methodology: modelling workflow

01 Data collection

  • DiagnosisGSE72509
  • ProgressionGSE65391, GSE49454
  • TreatmentGSE224705

02 Pre-processing & EDA

  • Dataset cleaning
  • Quality check
  • Standardisation

03 Gene selection

Finalised gene panel.

04 Modelling

limma
LASSO
Elastic Net
GBM
Linear SVMs
Random Forests

05 Performance evaluation

Imbalanced data
Stratified 5-fold cross-validation
AUROC, Balanced Accuracy, Macro-F1

Random Forest outperformed candidate models

01 Diagnosis
02 Progression
03 Treatment
AUROC
RF 0.972
LASSO 0.962
limma 0.953

0.972AUROC

RF 0.852
Weighted Ensemble 0.852
Elastic Net 0.845

0.852AUROC

RF 0.865
Elastic Net 0.780
LASSO 0.757

0.865AUROC

Balanced Accuracy
RF 0.939
LASSO 0.934
limma 0.881

0.939BAL-ACC

RF 0.790
Weighted Ensemble 0.801
Elastic Net 0.801

0.790BAL-ACC

RF 0.789
Elastic Net 0.736
LASSO 0.722

0.789BAL-ACC

Live demonstration

From a blood sample to a recommendation in one click.

In summary.

Data & features
ML prediction
Deploy

One shared workflow keeps every stage consistent and extensible.

Where we go next.

Current limitation: expects a specific input data format.
Integration need: preprocessing support for real-world data.
Next steps: prospective validation and gene-panel expansion.

Thank you.

Biomed09 · Team
Manna Berry  ·  Kiwi Lin  ·  Udit Samant  ·  Hadi Shafat  ·  Minh Hieu Tran  ·  Jillian Zhao

DATA3888 · 2026
The University of Sydney

Appendix · 03 / 06

Dataset sources.

  • Hung, T., Pratt, G. A., Sundararaman, B., Townsend, M. J., Chaivorapol, C., Bhangale, T., Graham, R. R., Ortmann, W., Bhangale, T. R., Behrens, T. W., Yeo, G. W., & Chaussabel, D. (2015). The Ro60 autoantigen binds endogenous retroelements and regulates inflammatory gene expression in systemic lupus erythematosus [Data set; GSE72509]. NCBI Gene Expression Omnibus. https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE72509
  • Banchereau, R., Hong, S., Cantarel, B., Baldwin, N., Baisch, J., Edens, M., Cepika, A.-M., Acs, P., Turner, J., Anguiano, E., Vinod, P., Kahn, S., Obermoser, G., Blankenship, D., Wakeland, E., Nassi, L., Gotte, A., Punaro, M., Liu, Y.-J., … Pascual, V. (2016). Personalized immunomonitoring uncovers molecular networks that stratify lupus patients. Cell, 165(3), 551–565. https://doi.org/10.1016/j.cell.2016.03.008
  • Chiche, L., Jourde-Chiche, N., Whalen, E., Presnell, S., Gersuk, V., Dang, K., Anguiano, E., Quinn, C., Burtey, S., Berland, Y., Kaplanski, G., Harlé, J.-R., Pascual, V., & Chaussabel, D. (2014). Modular transcriptional repertoire analyses of adults with systemic lupus erythematosus reveal distinct type I and type II interferon signatures. Arthritis & Rheumatology, 66(6), 1583–1595. https://doi.org/10.1002/art.38628
  • NCBI Gene Expression Omnibus. (2023). Whole-blood microarray expression in lupus nephritis: Treatment response by SRI-4 [Data set; GSE224705]. https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE224705

Appendix · 04 / 06

Libraries & frameworks.

  • R Core Team. (2024). R: A language and environment for statistical computing (Version 4.x) [Computer software]. R Foundation for Statistical Computing. https://www.R-project.org/
  • Liaw, A., & Wiener, M. (2002). Classification and regression by randomForest. R News, 2(3), 18–22.
  • Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 785–794). Association for Computing Machinery. https://doi.org/10.1145/2939672.2939785
  • Friedman, J., Hastie, T., & Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33(1), 1–22. https://doi.org/10.18637/jss.v033.i01
  • Kursa, M. B., & Rudnicki, W. R. (2010). Feature selection with the Boruta package. Journal of Statistical Software, 36(11), 1–13. https://doi.org/10.18637/jss.v036.i11
  • Ritchie, M. E., Phipson, B., Wu, D., Hu, Y., Law, C. W., Shi, W., & Smyth, G. K. (2015). limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Research, 43(7), e47. https://doi.org/10.1093/nar/gkv007
  • Siriseriwan, W. (2019). smotefamily: A collection of oversampling techniques for class imbalance problem based on SMOTE (Version 1.3.1) [R package]. https://CRAN.R-project.org/package=smotefamily
  • Schloerke, B., & Allen, J. (2024). plumber: An API generator for R [R package]. https://www.rplumber.io/

Appendix · 05 / 06

Research papers & models.

Clinical background, methods, prior art, and scoring referenced in the deck.

  • National Institute of Arthritis and Musculoskeletal and Skin Diseases. (2022). Systemic lupus erythematosus (lupus) [Last reviewed October 2022]. Retrieved April 23, 2026, from https://www.niams.nih.gov/health-topics/lupus
  • MedlinePlus. (2024). Lupus [Last updated July 1, 2024]. Retrieved April 23, 2026, from https://medlineplus.gov/lupus.html
  • National Health Service. (2023). Lupus [Page last reviewed July 19, 2023]. Retrieved April 23, 2026, from https://www.nhs.uk/conditions/lupus/
  • MedlinePlus Medical Encyclopedia. (2025). Systemic lupus erythematosus [Review date January 28, 2025]. Retrieved April 23, 2026, from https://medlineplus.gov/ency/article/000435.htm
  • Aringer, M., & Bertsias, G. (2025). Early diagnosis of systemic lupus erythematosus. Rare Disease and Orphan Drugs Journal, 4, 13. https://doi.org/10.20517/rdodj.2024.59
  • Lin, A., Wakhlu, A., & Connelly, K. (2024). Disease activity assessment in systemic lupus erythematosus. Frontiers in Lupus, 2. https://doi.org/10.3389/flupu.2024.1442013
  • Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32. https://doi.org/10.1023/A:1010933404324
  • Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 785–794). Association for Computing Machinery. https://doi.org/10.1145/2939672.2939785
  • Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273–297. https://doi.org/10.1007/BF00994018
  • Kursa, M. B., & Rudnicki, W. R. (2010). Feature selection with the Boruta package. Journal of Statistical Software, 36(11), 1–13. https://doi.org/10.18637/jss.v036.i11
  • Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321–357. https://doi.org/10.1613/jair.953
  • Lundberg, S. M., & Lee, S.-I. (2017). A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems (Vol. 30, pp. 4765–4774). Curran Associates.
  • Gladman, D. D., Ibañez, D., & Urowitz, M. B. (2002). Systemic lupus erythematosus disease activity index 2000. The Journal of Rheumatology, 29(2), 288–291.
  • Furie, R., Petri, M. A., Wallace, D. J., Ginzler, E. M., Merrill, J. T., Stohl, W., Chatham, W. W., Strand, V., Weinstein, A., & Chevrier, M. (2009). Novel evidence-based systemic lupus erythematosus responder index. Arthritis & Rheumatism, 61(9), 1143–1151. https://doi.org/10.1002/art.24698

Appendix · 06 / 06

Team & acknowledgements.

DATA3888 · 2026
The University of Sydney

Team members Biomed09
Manna Berry [email protected] Faculty of Engineering J12, The University of Sydney, NSW 2006
Lezhi Lin [email protected] School of Mathematics and Statistics F07, The University of Sydney, NSW 2006 Australia
Udit Samant email pending School of Computer Science J12, The University of Sydney, NSW 2006 Australia
Hadi Shafat [email protected] School of Computer Science J12, The University of Sydney, NSW 2006 Australia
Minh Hieu Tran email pending School of Computer Science J12, The University of Sydney, NSW 2006 Australia
Jillian Zhao [email protected] School of Computer Science J12, The University of Sydney, NSW 2006 Australia
Supervisors

Dr. Andy Tran · Elyna Lin

Many thanks to our supervisors Andy and Elyna for the weekly guidance, thoughtful feedback, and steady support throughout the project.

Acknowledgements
  • Submitted in partial fulfillment of the assessment requirements for DATA3888 Data Science Capstone at The University of Sydney.
  • We also acknowledge the original data contributors and study participants behind the public GEO cohorts. Their shared expression and clinical metadata made the modelling, validation, and patient-level demonstrations possible.
  • Our work also rests on the work of open-source maintainers across R, Bioconductor, and the modelling libraries used here, as well as the DATA3888 teaching team for project structure and course support.