Case study

How T-CAIREM and Upside built a secure and compliant data platform to accelerate AI research in healthcare

Client
T-Cairem (University of Toronto)
Year
2021 - ongoing
Services

Software development consulting

Cloud engineering

Governance-first software development

Full-stack platform implementation

Data platform engineering

Secure data architecture

Privacy-compliant system design

Data governance modeling

UI/UX design

Overview

To advance AI research in medicine using real-world health data, the Temerty Centre for Artificial Intelligence Research and Education in Medicine (T-CAIREM) at the University of Toronto partnered with Upside to develop the Health Data Nexus (HDN) — a secure, cloud-based platform designed to streamline access to clinical datasets for researchers, educators, and data custodians.

Engineered to meet the highest privacy and ethical standards, HDN enables cross-institutional collaboration while ensuring full compliance with relevant privacy regulations. Built on Google Cloud Platform, it balances security, usability, and scalability to meet the needs of a wide range of users.

By May 2025, HDN had onboarded nine datasets, including multimodal data sourced from real medical records, representing over 15,000 patients. Each dataset underwent rigorous legal, privacy, and governance reviews to ensure full regulatory compliance. The platform has supported researchers from five countries and powered several national medical datathons, advancing data-driven healthcare research and education.

Project Background

T-CAIREM, a global leader in AI health research, initiated the Health Data Nexus to advance medical innovation through secure and responsible data sharing. The platform was designed to support compliant research using real-world health data, including patient records, medical imaging, and population-level datasets.

To bring this vision to life, Upside was selected by T-CAIREM as the development partner to build the scalable, flexible, and privacy-first platform that meets the needs of data providers, researchers, and educators.

The Challenge

Given the scale and sensitivity of real-world health data, Upside was tasked with building a robust cloud infrastructure capable of supporting AI-driven medical research while meeting strict security and compliance requirements.

Beyond infrastructure, the platform also had to simplify complex governance workflows—including time-consuming data access approvals—support differentiated access for researchers and educators, and accommodate a broad spectrum of health data formats including tabular data, imaging, and clinical text.

Key Challenges and Requirements

  • Data Sensitivity & Compliance: Manage de-identified patient data (e.g., clinical records, CT scans) in full compliance with PHIPA, and TCPS 2. This required a formal Privacy Impact Assessment and Threat Risk Assessment.
  • Security & Governance: Meet hospital-grade requirements for data protection, including isolated cloud-based environments with no download capability. Also address governance challenges such as REB approvals and access controls.
  • Access & Approval Bottlenecks: Reduce administrative friction caused by lengthy data access approval workflows.
  • Multimodal & Poorly Standardized Data: Support diverse datasets, including tabular (e.g., St. Michael’s GIM dataset), imaging (e.g., CSpine, 1,000+ CT scans), and text, while overcoming inconsistent formatting and metadata.
  • Diverse Function Requirements: The platform needed to support a wide range of use cases: secure data storage for providers, AI analysis tools for researchers, simple and siloed access for professors and students, all managed through flexible role-based permissions and an intuitive interface.

How the solution was built

To address these complex challenges, Upside worked alongside T-CAIREM to deliver the first version of the Health Data Nexus within one year, applying Agile sprints with weekly check-ins to ensure fast, focused progress. The collaboration then continued over the following years to expand functionality, support adoption, and maintain long-term stability. 

A key technical foundation for the platform was PhysioNet, an open-source system developed at MIT. Upside extended PhysioNet to meet HDN’s privacy and scalability requirements and contributed enhancements—such as Kubernetes support—back to the project, becoming part of its ongoing development lifecycle.

Drawing on its experience in data-sensitive system development, Upside executed across six core focus areas to deliver a secure, compliant, and production-grade platform

1. User-Centered Design

HDN was built around three core goals: Maintaining datasets, enabling research, and supporting education. Upside collaborated with T-CAIREM to gather requirements and design the user experience, ensuring the platform aligned with the unique needs of:

  • Data Holders: Secure, cost-effective storage with governance tools and access control.
  • Researchers: Streamlined credentialing workflows and Jupyter/RStudio-based research environments.
  • Educators: Easy dataset provisioning, student access, and integrated learning materials.

2. Data Governance & Compliance Foundations

While designing the architecture, Upside worked with T-CAIREM to create a governance model aligned with PHIPA and TCPS 2:

  • Independent Privacy Impact Assessment and Threat Risk Assessment
  • Standardized Data Transfer Agreements
  • De-identification workflows, which were led by Data Holders for anonymizing datasets and providing the necessary documentation and metadata for secure onboarding. 

3. Access Workflow & Credentialing Systems

To operationalize compliance and manage sensitive data access, Upside supported the implementation of a three-tiered access control model:

  • Zone 1: Credentialing, mandatory ethics training, and Data Use Agreement
  • Zone 2: Adds a required research plan approved by the data holder
  • Zone 3: Designed for highly sensitive datasets. Along with Zone 2 requirements, it requires Research Ethics Board (REB) approval. This process is conducted externally by the relevant institutional bodies.

Credentialing is streamlined through eduGAIN Single Sign-On and ORCID identification, reducing administrative burden and enabling secure, scalable onboarding.

4. Cloud Infrastructure & Environment Design

Upside engineered the backend using Google Cloud Platform, providing:

  • Isolated research environments for each user, with preconfigured tools like Jupyter and RStudio
  • Cloud Storage FUSE integration for familiar file access
  • Kubernetes-based autoscaling for dynamic resource provisioning
  • Region-based data residency and download prevention. 

Additionally, the infrastructure supports external billing methods and shared billing accounts (e.g., for datathons), easing operational management for T-CAIREM and institutional partners.

5. Modular Software Architecture

To ensure long-term scalability and maintainability, Upside applied 12-factor app principles and a modern engineering stack:

  • Python, Django, Flask for the backend
  • Kubernetes and Terraform for infrastructure-as-code
  • Fully automated CI/CD pipelines for reliable, repeatable deployments

This modular architecture enables rapid iteration, seamless updates, and support for emerging research tools and data formats.

6. Accessible & Familiar Interface

To ensure usability and seamless adoption, Upside customized the open-source PhysioNet platform as the frontend, offering:

  • Familiar UX for health researchers (e.g., MIMIC-IV users)
  • Native support for multimodal datasets (signals, images, tabular data, text)
  • Open-source extensibility (BSD 3-Clause license)
  • Event-based workflows for datathons and academic training
  • Metadata-rich dataset publishing and integrated credentialing

Results and Impact

Since launch, the Health Data Nexus has demonstrated tangible value across its three core pillars in data acquisition, research, and education, validating the collaborative work between T-CAIREM and Upside. These early results showcase the platform's scalability, usability, and relevance for the research community.

  1. Nine Datasets Hosted: As of May 2025, the platform hosts nine datasets spanning tabular data, imaging, voice recordings, and population-level health information. These include the St. Michael’s Hospital General Internal Medicine (GIM) dataset (14,000 patients across 22,000 visits with comprehensive clinical data), the CSpine dataset with over 1,000 CT scans, COVID-19 inpatient data, the Canadian Heart Health Database, randomized trial files for LLM research, national epidemiology datasets, the Bridge2AI voice dataset, and the Sunnybrook Hospital Sleep Laboratory dataset.

These contributions showcase the platform's flexibility in hosting multimodal data and its potential as a national health data infrastructure.

  1. Education and training: HDN was used in classrooms and workshops as a teaching tool, giving university students hands-on experience with real-world medical data. Notable examples include:
  • A biomedical engineering professor who has run student projects for over two years using datasets like GIM and Canada Heart Health to explore real-world data issues such as bias and class imbalance.
  • The VADA Summer Schools workshop, a week-long program that used the HDN for student-led analytics projects. 

Additionally, Upside’s work on credentialing, preloaded training, and dataset access workflows has made the HDN a plug-and-play solution for data-centric workshops and university courses.

  1. Multiple International Datathons Supported: Since its launch, the Health Data Nexus has supported 4 provincial and national datathons focused on health and medical research. These events gave researchers and students hands-on experience with real clinical datasets. Feedback from participants led to meaningful platform enhancements, including improved data storage workflows, the addition of shared billing for collaborative teams, and greater computing capacity through access to Google Vertex AI. 
  2. User Growth & Diversity: The Health Data Nexus has attracted a diverse and expanding user base, primarily researchers, academics, and students from institutions around the world. As of 2025, the platform has been adopted in over 12 countries across four continents, amplifying its growing global impact in health data research.

Key learnings from Upside for a successful academic-industry collaboration

The success of HDN demonstrates what’s possible when academic and industry teams work as true partners. T-CAIREM contributed governance and clinical insight, while Upside delivered design, engineering, and implementation.

In Upside’s experience, delivering such data-sensitive, long-term research projects requires more than technical execution. Here are the key principles that guided Upside’s successful collaboration with T-CAIREM:

  • Embedded Collaboration Model: Upside operated not just as a vendor, but as an embedded partner in the project. The team actively challenged assumptions, refined requirements, and worked closely with T-CAIREM to shape effective solutions—while ensuring that critical knowledge was transferred to internal teams. Their “one big team” approach, supported by weekly ideation sessions and shared product ownership, fostered strong alignment from day one.
  • Full Lifecycle Ownership: From initial concept to deployment and beyond, Upside took full responsibility for the platform’s lifecycle. This included supporting key events such as datathons, refining onboarding flows, and maintaining technical stability, ensuring the platform continues to deliver value at every stage. 
  • Future-Proof Architecture: Upside always design with the client’s long-term goals in mind. For T-CAIREM, a Python-based stack was selected to align with scientific computing standards. Tools like Terraform and Kubernetes enabled automation and future scalability.
  • Multidisciplinary Expertise: Delivering a four-year project with strict security and compliance requirements demanded cross-functional knowledge. Upside brought together experts in cloud architecture, privacy compliance, devops, and academic workflows to accelerate execution and ensure robustness.
  • Open Source in Practice: In keeping with its mission to support transparent, scalable technology, Upside contributed directly to the MIT-developed PhysioNet codebase. These enhancements, such as Kubernetes support, benefit both HDN and the broader research community.

Want to learn more about Health Data Nexus or how we built it? Book a call with our team to explore the process.

T-Cairem (University of Toronto)

In numbers

Collaboration

Breakdown of our work

Technologies we used

No items found.

Conclusion

Reach out for
a project partnership

Help us understand your needs
Please enter your name
Please enter a valid email address
You need to agree to terms & conditions
Thanks for reaching out!
We will get back to you soon.
Oops! Something went wrong while submitting the form.
Or drop a line directly: