DICOM + Python in 2026: building an AI preprocessing pipeline from scratch: ML and AI

Navigation

RSS Feeds

Articles Downloads Forums News Web Links

Member Polls

DICOM + Python in 2026: building an AI preprocessing pipeline from scratch

Last updated on 8 hours ago

ai programs

Track thread Print

admin2Member

Posted 8 hours ago

Anika_Kowalczyk ✓ PACS Admin
Clinical Informatics · Warsaw
Mar 2026
The number of people building medical imaging AI pipelines who don't properly understand DICOM metadata is alarming, and it causes silent bugs that are incredibly hard to track down. The most common one: using pixel data from a CT series without checking the RescaleIntercept and RescaleSlope DICOM tags, which means you're working with raw stored values rather than Hounsfield Units. The formula is always HU = pixel_value * RescaleSlope + RescaleIntercept and most CT series have a slope of 1 and intercept of -1024, but not all. Scanner manufacturers and reconstruction kernels vary. Similarly, always check ImageOrientationPatient before assuming axial/coronal/sagittal orientation — some PACS systems export series in non-standard orientations and your model will receive flipped or transposed volumes silently. Pydicom's documentation at pydicom.github.io covers these tags thoroughly and SimpleITK handles most of this correctly if you use its DICOM reader rather than rolling your own. Use SimpleITK where you can.

admin2Member

Posted 8 hours ago

Building on Anika's excellent metadata warning — here's a production-grade DICOM-to-numpy preprocessing function that handles the HU conversion, handles missing tags gracefully, and checks for multi-frame DICOM (which newer scanners increasingly use and many tutorials ignore entirely):

Code Download source

# Python - robust DICOM to HU numpy array

import pydicom

import numpy as np



def dicom_to_hu(filepath: str) -> np.ndarray:

 ds = pydicom.dcmread(filepath)

 px = ds.pixel_array.astype(np.float32)



 # Handle multi-frame (enhanced CT / compressed series)

 if px.ndim == 2:

 px = px[np.newaxis, ...] # add slice dim



 slope = float(getattr(ds, "RescaleSlope", 1.0))

 intercept = float(getattr(ds, "RescaleIntercept", -1024.0))

 hu = px * slope + intercept



 # Check modality — only CT has HU meaning

 modality = getattr(ds, "Modality", "CT")

 if modality != "CT":

 raise ValueError(f"HU conversion invalid for {modality}")



 return hu # shape: (slices, H, W) or (1, H, W)



# For a full series: load all slices, sort by ImagePositionPatient Z

import os

slices = [pydicom.dcmread(os.path.join(series_dir, f))

 for f in os.listdir(series_dir) if f.endswith(".dcm")]

slices.sort(key=lambda s: float(s.ImagePositionPatient[2]))

volume = np.stack([s.pixel_array for s in slices]) 

 * float(slices[0].RescaleSlope) 

 + float(slices[0].RescaleIntercept)

Also worth knowing: the InstanceNumber tag is NOT reliable for slice ordering — always sort by ImagePositionPatient[2]. I've seen this mistake in published research code, which is frightening.

admin2Member

Posted 8 hours ago

Nadia_Bassett
Research Physicist · UCSF
Apr 2026
If you're building a pipeline that needs to handle MRI alongside CT, be very aware that MRI pixel values have no absolute physical meaning — they're scanner-dependent, protocol-dependent, and even session-dependent for the same patient on the same machine. There's no MRI equivalent of Hounsfield Units. This means you can never do population-level intensity normalization the way you do with CT. The standard approaches for MRI are z-score normalization per-volume, percentile clipping (clip to 1st and 99th percentile then scale to [0,1]), or histogram matching to a reference template. For brain MRI specifically, the Nyul & Udupa (2000) histogram standardization method implemented in intensity-normalization library (github.com/jcreinhold/intensity-normalization) is still widely used and works well. Not knowing this distinction between CT and MRI normalization requirements is a root cause of a lot of poor model performance that people incorrectly attribute to the model architecture.

You can view all discussion threads in this forum.
You cannot start a new discussion thread in this forum.
You cannot reply in this discussion thread.
You cannot start on a poll in this forum.
You cannot upload attachments in this forum.
You cannot download attachments in this forum.

Users Online Now

Guests Online 2
Members Online 0

Total Members: 40
Newest Member: Remax14