Oh no! Where's the JavaScript?
Your Web browser does not have JavaScript enabled or does not support JavaScript. Please enable JavaScript on your Web browser to properly view this Web site, or upgrade to a Web browser that does support JavaScript.

DICOM + Python in 2026: building an AI preprocessing pipeline from scratch

Last updated on 8 hours ago
A
admin2Member
Posted 8 hours ago
Anika_Kowalczyk ✓ PACS Admin
Clinical Informatics · Warsaw
Mar 2026
The number of people building medical imaging AI pipelines who don't properly understand DICOM metadata is alarming, and it causes silent bugs that are incredibly hard to track down. The most common one: using pixel data from a CT series without checking the RescaleIntercept and RescaleSlope DICOM tags, which means you're working with raw stored values rather than Hounsfield Units. The formula is always HU = pixel_value * RescaleSlope + RescaleIntercept and most CT series have a slope of 1 and intercept of -1024, but not all. Scanner manufacturers and reconstruction kernels vary. Similarly, always check ImageOrientationPatient before assuming axial/coronal/sagittal orientation — some PACS systems export series in non-standard orientations and your model will receive flipped or transposed volumes silently. Pydicom's documentation at pydicom.github.io covers these tags thoroughly and SimpleITK handles most of this correctly if you use its DICOM reader rather than rolling your own. Use SimpleITK where you can.
A
admin2Member
Posted 8 hours ago
Building on Anika's excellent metadata warning — here's a production-grade DICOM-to-numpy preprocessing function that handles the HU conversion, handles missing tags gracefully, and checks for multi-frame DICOM (which newer scanners increasingly use and many tutorials ignore entirely):

# Python - robust DICOM to HU numpy array
import pydicom
import numpy as np

def dicom_to_hu(filepath: str) -> np.ndarray:
 ds = pydicom.dcmread(filepath)
 px = ds.pixel_array.astype(np.float32)

 # Handle multi-frame (enhanced CT / compressed series)
 if px.ndim == 2:
 px = px[np.newaxis, ...] # add slice dim

 slope = float(getattr(ds, "RescaleSlope", 1.0))
 intercept = float(getattr(ds, "RescaleIntercept", -1024.0))
 hu = px * slope + intercept

 # Check modality — only CT has HU meaning
 modality = getattr(ds, "Modality", "CT")
 if modality != "CT":
 raise ValueError(f"HU conversion invalid for {modality}")

 return hu # shape: (slices, H, W) or (1, H, W)

# For a full series: load all slices, sort by ImagePositionPatient Z
import os
slices = [pydicom.dcmread(os.path.join(series_dir, f))
 for f in os.listdir(series_dir) if f.endswith(".dcm")]
slices.sort(key=lambda s: float(s.ImagePositionPatient[2]))
volume = np.stack([s.pixel_array for s in slices])
 * float(slices[0].RescaleSlope)
 + float(slices[0].RescaleIntercept)


Also worth knowing: the InstanceNumber tag is NOT reliable for slice ordering — always sort by ImagePositionPatient[2]. I've seen this mistake in published research code, which is frightening.
A
admin2Member
Posted 8 hours ago
Nadia_Bassett
Research Physicist · UCSF
Apr 2026
If you're building a pipeline that needs to handle MRI alongside CT, be very aware that MRI pixel values have no absolute physical meaning — they're scanner-dependent, protocol-dependent, and even session-dependent for the same patient on the same machine. There's no MRI equivalent of Hounsfield Units. This means you can never do population-level intensity normalization the way you do with CT. The standard approaches for MRI are z-score normalization per-volume, percentile clipping (clip to 1st and 99th percentile then scale to [0,1]), or histogram matching to a reference template. For brain MRI specifically, the Nyul & Udupa (2000) histogram standardization method implemented in intensity-normalization library (github.com/jcreinhold/intensity-normalization) is still widely used and works well. Not knowing this distinction between CT and MRI normalization requirements is a root cause of a lot of poor model performance that people incorrectly attribute to the model architecture.
You can view all discussion threads in this forum.
You cannot start a new discussion thread in this forum.
You cannot reply in this discussion thread.
You cannot start on a poll in this forum.
You cannot upload attachments in this forum.
You cannot download attachments in this forum.
Sign In
Not a member yet? Click here to register.
Forgot Password?
Users Online Now
Guests Online 2
Members Online 0

Total Members: 40
Newest Member: Remax14