| | --- |
| | license: cc-by-nc-sa-4.0 |
| | library_name: keras |
| | pipeline_tag: image-classification |
| | language: en |
| | tags: |
| | - medical-imaging |
| | - ct |
| | - lung-cancer |
| | - efficientnet-b0 |
| | - transfer-learning |
| | - grad-cam |
| | model-index: |
| | - name: EfficientNetB0 Lung CT Classifier (4-class) |
| | results: |
| | - task: |
| | type: image-classification |
| | name: Image Classification |
| | dataset: |
| | name: Hany Lung Cancer CT (derived; cleaned) |
| | type: custom |
| | split: test |
| | metrics: |
| | - type: accuracy |
| | value: TODO:0.XX |
| | - type: precision |
| | value: TODO:0.XX |
| | - type: recall |
| | value: TODO:0.XX |
| | - type: f1 |
| | value: TODO:0.XX |
| | --- |
| | |
| | ## Attribution |
| |
|
| | **Original Source:** |
| | > Hany H. (2020). *Chest CT-Scan Images Dataset*. Kaggle. |
| | > [https://www.kaggle.com/datasets/hanyhossam/chest-ctscan-images-dataset](https://www.kaggle.com/datasets/hanyhossam/chest-ctscan-images-dataset) |
| |
|
| | **Original License:** |
| | > Database: Open Database Commons Open Database License (ODbL v1.0) |
| | > [https://opendatacommons.org/licenses/odbl/1-0/](https://opendatacommons.org/licenses/odbl/1-0/) |
| |
|
| | **Derived Dataset Author:** |
| | > Ashley Blackwell (2025). *Chest CT-Scan Images (Cleaned, Derived from Hany et al.)*. Hugging Face Datasets. |
| | > https://huggingface.co/datasets/ashleyblackwell/lung-ct-cleaned-hany |
| |
|
| | --- |
| |
|
| | ## Cleaning & Preprocessing Summary |
| |
|
| | The original dataset was processed and curated to ensure **consistency, quality, and reproducibility** for use in deep-learning experiments (i.e.., the *EfficientNet-B0 Lung CT Classifier*). |
| |
|
| | ### Steps Performed |
| | 1. **Integrity Checks:** Removed corrupted or unreadable `.jpg` and `.png` files. |
| | 2. **Resolution Standardization:** Resized all images to `224 × 224 × 3` pixels. |
| | 3. **Color Normalization:** Converted grayscale scans to RGB format. |
| | 4. **Class Organization:** Verified folder structure for four diagnostic categories: |
| | - Adenocarcinoma |
| | - Large-Cell Carcinoma |
| | - Squamous-Cell Carcinoma |
| | - Normal |
| | 5. **Stratified Splits:** |
| | - Train: 70% |
| | - Validation: 20% |
| | - Test: 10% |
| | 6. **Metadata File:** Generated `metadata.csv` containing filename, class label, and original resolution for traceability. |
| |
|
| | --- |
| |
|
| | ## Dataset Overview |
| |
|
| | | Split | Approx. Images | Notes | |
| | |:------|---------------:|:------| |
| | | Train | ~TODO | Stratified by class | |
| | | Validation | ~TODO | For hyperparameter tuning | |
| | | Test | ~TODO | Final evaluation set | |
| | | **Total** | ~TODO | All cleaned and standardized | |
| |
|
| | --- |
| |
|
| | ## Intended Use |
| |
|
| | - **Purpose:** |
| | Designed for research, coursework, and educational demonstrations in medical image classification, model interpretability (Grad-CAM), and reproducible machine learning pipelines. |
| |
|
| | - **Out of Scope:** |
| | This dataset **must not** be used for clinical diagnosis, treatment decisions, or commercial medical software development. |
| |
|
| | --- |
| |
|
| | ## Legal & License Information |
| |
|
| | ### License |
| | This dataset is distributed under the **Open Data Commons Open Database License (ODbL v1.0)**. |
| | You are free to: |
| | - **Share:** Copy, distribute, and use the database. |
| | - **Create:** Produce works from the database. |
| | - **Adapt:** Modify, transform, and build upon the database. |
| |
|
| | Full legal text: |
| | [https://opendatacommons.org/licenses/odbl/1-0/](https://opendatacommons.org/licenses/odbl/1-0/) |
| |
|
| | --- |
| |
|
| | ## Intended Use |
| |
|
| | - **Purpose:** |
| | Designed for research, coursework, and educational demonstrations in medical image classification, model interpretability (Grad-CAM), and reproducible machine learning pipelines. |
| |
|
| | ## Scope |
| |
|
| | - **Intended**: |
| | Research, UMGC coursework, model-interpretability demos (Grad-CAM), benchmarking. |
| |
|
| | ## Out-of-scope: Clinical diagnosis, patient triage, or any safety-critical application. |
| | - **Model Architecture** |
| | - **Backbone**: EfficientNet-B0 (ImageNet-initialized, fine-tuned) |
| | - **Input size**: 224 × 224 × 3 |
| | - **Head**: GlobalAveragePooling → Dropout (TODO: rate) → Dense(4, softmax) |
| | - **Loss**: Categorical Cross-Entropy |
| | - **Optimizer**: TODO (e.g., Adam, lr = 1e-4 with decay) |
| | - **Epochs / Batch size**: TODO |
| | - **Class labels (index)**: |
| | 0: Adenocarcinoma |
| | 1: Large-Cell Carcinoma |
| | 2: Squamous-Cell Carcinoma |
| | 3: Normal |
| |
|
| | --- |
| |
|
| | ## Data & Preprocessing |
| | Source: Derived from Hany Lung Cancer CT Scan dataset (Kaggle). Corrupted and irregular-resolution images were removed and all remaining images standardized to 224×224. |
| | Split: Train/Val/Test = 70/20/10 (stratified). |
| | Transforms: Resize → RGB conversion → normalize to [0,1] or use preprocess_input. |
| | Artifacts logged: Confusion matrix, classification report, Grad-CAM overlays. |
| | Attribution: Credit original dataset per its license when sharing or publishing. |
| | |
| | --- |
| | |
| | ## Evaluation |
| | Test set size: TODO:N |
| | Metrics (macro): Accuracy, Precision, Recall, F1 |
| | Class Precision Recall F1 Support |
| | Adenocarcinoma TODO TODO TODO TODO |
| | Large-Cell TODO TODO TODO TODO |
| | Squamous TODO TODO TODO TODO |
| | Normal TODO TODO TODO TODO |
| | Macro Avg TODO TODO TODO N |
| | |
| | ## Suggested Environment |
| | tensorflow==2.15.0 |
| | keras==2.15.0 |
| | huggingface_hub>=0.23.0 |
| | numpy>=1.24 |
| |
|
| | --- |
| |
|
| | ## Explainability (Grad-CAM) |
| | Last conv layer: top_conv for EfficientNet-B0. |
| | Tip: Use Grad-CAM to overlay heatmaps and validate that the model focuses on pathologically relevant regions. |
| | ## Limitations, Bias & Ethical Considerations |
| | ## Domain shift: CT protocols and scanners vary; may affect generalization. |
| | Label noise: Community datasets can contain mislabels. |
| | Generalization: Model is not clinically validated. |
| | Mitigation: Use Grad-CAM audits and external validation before any applied use. |
| | |
| | --- |
| | |
| | ## Training & Reproducibility |
| | Hardware: TODO (e.g., NVIDIA T4 / A100 / local GPU). |
| | Training time: TODO |
| | Seed / Determinism: TODO |
| | Reproduction steps: TODO (link to notebook or script if available). |
| | ## License |
| | Model weights & code: CC BY-NC-SA 4.0 (non-commercial, share-alike, with attribution). |
| | Dataset (derived): Follow the original dataset’s license terms and provide credit to the creator. |
| | |
| | ## Citation |
| | If you use this model, please cite: |
| | Blackwell, A. (2025). EfficientNet-B0 Lung CT Classifier (4-class) [Computer software]. Hugging Face. https://huggingface.co/TODO |
| | @software{blackwell2025lungct, |
| | author = {Blackwell, Ashley}, |
| | title = {EfficientNet-B0 Lung CT Classifier (4-class)}, |
| | year = {2025}, |
| | publisher = {Hugging Face}, |
| | url = {https://huggingface.co/TODO} |
| | } |
| | 👩🏫 Maintainers |
| | Ashley Blackwell — **Questions and feedback welcome via the Hugging Face Discussions tab.** |
| | 🗒 Changelog |
| | 2025-10-06: Initial public release (.keras weights), added model card, class map, and metric placeholders. |
| | |
| | --- |
| | |
| | |
| | ## Citation |
| | |
| | If you use this dataset, please cite both the original source and the derived version: |
| | |
| | **Original dataset:** |
| | > Hany H. (2020). *Chest CT-Scan Images Dataset*. Kaggle. |
| | > https://www.kaggle.com/datasets/hanyhossam/chest-ctscan-images-dataset |
| | |
| | **Derived version:** |
| | > Blackwell, A. (2025). *Chest CT-Scan Images (Cleaned, Derived from Hany et al.)* [Dataset]. Hugging Face. |
| | > https://huggingface.co/datasets/ashleyblackwell/lung-ct-cleaned-hany |
| | |
| | ```bibtex |
| | @dataset{hany2020chestct, |
| | author = {Hany, H.}, |
| | title = {Chest CT-Scan Images Dataset}, |
| | year = {2020}, |
| | publisher = {Kaggle}, |
| | url = {https://www.kaggle.com/datasets/hanyhossam/chest-ctscan-images-dataset} |
| | } |
| | |
| | @dataset{blackwell2025lungctcleaned, |
| | author = {Blackwell, Ashley}, |
| | title = {Chest CT-Scan Images (Cleaned, Derived from Hany et al.)}, |
| | year = {2025}, |
| | publisher = {Hugging Face}, |
| | url = {https://huggingface.co/datasets/ashleyblackwell/lung-ct-cleaned-hany} |
| | } |
| | |
| | --- |
| | |
| | ## How to Use (Load & Inference) |
| | **Option A — Download from the Hub** |
| | - from huggingface_hub import hf_hub_download |
| | import json, numpy as np, tensorflow as tf |
| | from tensorflow.keras.preprocessing import image |
| |
|
| | REPO_ID = "TODO:your-username/efficientnetb0-lung-ct-4class" |
| | |
| | model_path = hf_hub_download(repo_id=REPO_ID, filename="model.keras") |
| | class_map_path = hf_hub_download(repo_id=REPO_ID, filename="class_map.json") |
| | |
| | model = tf.keras.models.load_model(model_path, compile=False) |
| | with open(class_map_path) as f: |
| | idx_to_label = json.load(f) |
| | |
| | def preprocess(img_path): |
| | img = image.load_img(img_path, target_size=(224, 224)) |
| | x = image.img_to_array(img) |
| | x = np.expand_dims(x, 0) |
| | x = x / 255.0 # or use tf.keras.applications.efficientnet.preprocess_input(x) |
| | return x |
| | |
| | x = preprocess("path/to/ct_slice.png") |
| | probs = model.predict(x, verbose=0)[0] |
| | for i, p in enumerate(probs): |
| | print(f"{idx_to_label[str(i)]}: {p:.3f}") |
| | print("Predicted:", idx_to_label[str(int(np.argmax(probs)))]) |
| | **Option B — Snapshot Download (Local Folder)** |
| | from huggingface_hub import snapshot_download |
| | local_dir = snapshot_download(repo_id="TODO:your-username/efficientnetb0-lung-ct-4class") |
| | # loads ./model.keras and ./class_map.json from local_dir |
| |
|
| | --- |