Update README.md

49cb8e3 verified 5 months ago

8.89 kB

	---
	license: cc-by-nc-sa-4.0
	library_name: keras
	pipeline_tag: image-classification
	language: en
	tags:
	- medical-imaging
	- ct
	- lung-cancer
	- efficientnet-b0
	- transfer-learning
	- grad-cam
	model-index:
	- name: EfficientNetB0 Lung CT Classifier (4-class)
	results:
	- task:
	type: image-classification
	name: Image Classification
	dataset:
	name: Hany Lung Cancer CT (derived; cleaned)
	type: custom
	split: test
	metrics:
	- type: accuracy
	value: TODO:0.XX
	- type: precision
	value: TODO:0.XX
	- type: recall
	value: TODO:0.XX
	- type: f1
	value: TODO:0.XX
	---

	## Attribution

	Original Source:
	> Hany H. (2020). Chest CT-Scan Images Dataset. Kaggle.
	> [https://www.kaggle.com/datasets/hanyhossam/chest-ctscan-images-dataset](https://www.kaggle.com/datasets/hanyhossam/chest-ctscan-images-dataset)

	Original License:
	> Database: Open Database Commons Open Database License (ODbL v1.0)
	> [https://opendatacommons.org/licenses/odbl/1-0/](https://opendatacommons.org/licenses/odbl/1-0/)

	Derived Dataset Author:
	> Ashley Blackwell (2025). Chest CT-Scan Images (Cleaned, Derived from Hany et al.). Hugging Face Datasets.
	> https://huggingface.co/datasets/ashleyblackwell/lung-ct-cleaned-hany

	---

	## Cleaning & Preprocessing Summary

	The original dataset was processed and curated to ensure consistency, quality, and reproducibility for use in deep-learning experiments (i.e.., the EfficientNet-B0 Lung CT Classifier).

	### Steps Performed
	1. Integrity Checks: Removed corrupted or unreadable `.jpg` and `.png` files.
	2. Resolution Standardization: Resized all images to `224 × 224 × 3` pixels.
	3. Color Normalization: Converted grayscale scans to RGB format.
	4. Class Organization: Verified folder structure for four diagnostic categories:
	- Adenocarcinoma
	- Large-Cell Carcinoma
	- Squamous-Cell Carcinoma
	- Normal
	5. Stratified Splits:
	- Train: 70%
	- Validation: 20%
	- Test: 10%
	6. Metadata File: Generated `metadata.csv` containing filename, class label, and original resolution for traceability.

	---

	## Dataset Overview

	\| Split \| Approx. Images \| Notes \|
	\|:------\|---------------:\|:------\|
	\| Train \| ~TODO \| Stratified by class \|
	\| Validation \| ~TODO \| For hyperparameter tuning \|
	\| Test \| ~TODO \| Final evaluation set \|
	\| Total \| ~TODO \| All cleaned and standardized \|

	---

	## Intended Use

	- Purpose:
	Designed for research, coursework, and educational demonstrations in medical image classification, model interpretability (Grad-CAM), and reproducible machine learning pipelines.

	- Out of Scope:
	This dataset must not be used for clinical diagnosis, treatment decisions, or commercial medical software development.

	---

	## Legal & License Information

	### License
	This dataset is distributed under the Open Data Commons Open Database License (ODbL v1.0).
	You are free to:
	- Share: Copy, distribute, and use the database.
	- Create: Produce works from the database.
	- Adapt: Modify, transform, and build upon the database.

	Full legal text:
	[https://opendatacommons.org/licenses/odbl/1-0/](https://opendatacommons.org/licenses/odbl/1-0/)

	---

	## Intended Use

	- Purpose:
	Designed for research, coursework, and educational demonstrations in medical image classification, model interpretability (Grad-CAM), and reproducible machine learning pipelines.

	## Scope

	- Intended:
	Research, UMGC coursework, model-interpretability demos (Grad-CAM), benchmarking.

	## Out-of-scope: Clinical diagnosis, patient triage, or any safety-critical application.
	- Model Architecture
	- Backbone: EfficientNet-B0 (ImageNet-initialized, fine-tuned)
	- Input size: 224 × 224 × 3
	- Head: GlobalAveragePooling → Dropout (TODO: rate) → Dense(4, softmax)
	- Loss: Categorical Cross-Entropy
	- Optimizer: TODO (e.g., Adam, lr = 1e-4 with decay)
	- Epochs / Batch size: TODO
	- Class labels (index):
	0: Adenocarcinoma
	1: Large-Cell Carcinoma
	2: Squamous-Cell Carcinoma
	3: Normal

	---

	## Data & Preprocessing
	Source: Derived from Hany Lung Cancer CT Scan dataset (Kaggle). Corrupted and irregular-resolution images were removed and all remaining images standardized to 224×224.
	Split: Train/Val/Test = 70/20/10 (stratified).
	Transforms: Resize → RGB conversion → normalize to [0,1] or use preprocess_input.
	Artifacts logged: Confusion matrix, classification report, Grad-CAM overlays.
	Attribution: Credit original dataset per its license when sharing or publishing.

	---

	## Evaluation
	Test set size: TODO:N
	Metrics (macro): Accuracy, Precision, Recall, F1
	Class Precision Recall F1 Support
	Adenocarcinoma TODO TODO TODO TODO
	Large-Cell TODO TODO TODO TODO
	Squamous TODO TODO TODO TODO
	Normal TODO TODO TODO TODO
	Macro Avg TODO TODO TODO N

	## Suggested Environment
	tensorflow==2.15.0
	keras==2.15.0
	huggingface_hub>=0.23.0
	numpy>=1.24

	---

	## Explainability (Grad-CAM)
	Last conv layer: top_conv for EfficientNet-B0.
	Tip: Use Grad-CAM to overlay heatmaps and validate that the model focuses on pathologically relevant regions.
	## Limitations, Bias & Ethical Considerations
	## Domain shift: CT protocols and scanners vary; may affect generalization.
	Label noise: Community datasets can contain mislabels.
	Generalization: Model is not clinically validated.
	Mitigation: Use Grad-CAM audits and external validation before any applied use.

	---

	## Training & Reproducibility
	Hardware: TODO (e.g., NVIDIA T4 / A100 / local GPU).
	Training time: TODO
	Seed / Determinism: TODO
	Reproduction steps: TODO (link to notebook or script if available).
	## License
	Model weights & code: CC BY-NC-SA 4.0 (non-commercial, share-alike, with attribution).
	Dataset (derived): Follow the original dataset’s license terms and provide credit to the creator.

	## Citation
	If you use this model, please cite:
	Blackwell, A. (2025). EfficientNet-B0 Lung CT Classifier (4-class) [Computer software]. Hugging Face. https://huggingface.co/TODO
	@software{blackwell2025lungct,
	author = {Blackwell, Ashley},
	title = {EfficientNet-B0 Lung CT Classifier (4-class)},
	year = {2025},
	publisher = {Hugging Face},
	url = {https://huggingface.co/TODO}
	}
	👩‍🏫 Maintainers
	Ashley Blackwell — Questions and feedback welcome via the Hugging Face Discussions tab.
	🗒 Changelog
	2025-10-06: Initial public release (.keras weights), added model card, class map, and metric placeholders.

	---


	## Citation

	If you use this dataset, please cite both the original source and the derived version:

	Original dataset:
	> Hany H. (2020). Chest CT-Scan Images Dataset. Kaggle.
	> https://www.kaggle.com/datasets/hanyhossam/chest-ctscan-images-dataset

	Derived version:
	> Blackwell, A. (2025). Chest CT-Scan Images (Cleaned, Derived from Hany et al.) [Dataset]. Hugging Face.
	> https://huggingface.co/datasets/ashleyblackwell/lung-ct-cleaned-hany

	```bibtex
	@dataset{hany2020chestct,
	author = {Hany, H.},
	title = {Chest CT-Scan Images Dataset},
	year = {2020},
	publisher = {Kaggle},
	url = {https://www.kaggle.com/datasets/hanyhossam/chest-ctscan-images-dataset}
	}

	@dataset{blackwell2025lungctcleaned,
	author = {Blackwell, Ashley},
	title = {Chest CT-Scan Images (Cleaned, Derived from Hany et al.)},
	year = {2025},
	publisher = {Hugging Face},
	url = {https://huggingface.co/datasets/ashleyblackwell/lung-ct-cleaned-hany}
	}

	---

	## How to Use (Load & Inference)
	Option A — Download from the Hub
	- from huggingface_hub import hf_hub_download
	import json, numpy as np, tensorflow as tf
	from tensorflow.keras.preprocessing import image

	REPO_ID = "TODO:your-username/efficientnetb0-lung-ct-4class"

	model_path = hf_hub_download(repo_id=REPO_ID, filename="model.keras")
	class_map_path = hf_hub_download(repo_id=REPO_ID, filename="class_map.json")

	model = tf.keras.models.load_model(model_path, compile=False)
	with open(class_map_path) as f:
	idx_to_label = json.load(f)

	def preprocess(img_path):
	img = image.load_img(img_path, target_size=(224, 224))
	x = image.img_to_array(img)
	x = np.expand_dims(x, 0)
	x = x / 255.0 # or use tf.keras.applications.efficientnet.preprocess_input(x)
	return x

	x = preprocess("path/to/ct_slice.png")
	probs = model.predict(x, verbose=0)[0]
	for i, p in enumerate(probs):
	print(f"{idx_to_label[str(i)]}: {p:.3f}")
	print("Predicted:", idx_to_label[str(int(np.argmax(probs)))])
	Option B — Snapshot Download (Local Folder)
	from huggingface_hub import snapshot_download
	local_dir = snapshot_download(repo_id="TODO:your-username/efficientnetb0-lung-ct-4class")
	# loads ./model.keras and ./class_map.json from local_dir

	---