Explainable Deep Learning (xDL) as Human-Machine Interaction in diagnosis process of lung cancer using Deep Learning -

Introduction

Deep Learning has shown remarkable potential in medical diagnostics, particularly in detecting lung cancer early through medical imaging analysis. However, the lack of transparency and interpretability of deep learning models has raised concerns about their reliability in critical medical scenarios. To address this challenge, Explainable Deep Learning (xDL) techniques are being developed to provide clear and understandable explanations for AI predictions. xDL aims to bridge the gap between the capabilities of complex neural networks and the understanding of human experts. By offering transparent insights into how the AI arrives at its diagnostic decisions, xDL promotes a more collaborative human-machine interaction. In the context of lung cancer diagnosis, xDL allows medical practitioners to comprehend the rationale behind the AI’s conclusions based on analyzed imaging data. The integration of xDL in the lung cancer diagnosis process promises to revolutionize the interaction between medical professionals and AI systems. Instead of treating deep learning models as opaque black boxes, xDL empowers clinicians to validate the AI’s decision-making processes, building trust in its diagnostic capabilities and providing valuable insights into relevant patterns and features for lung cancer detection.

lung cancer

Lung cancer is a dangerous form of cancer that originates in the lung cells and is responsible for a significant number of global cancer-related deaths. This condition arises when normal lung cells experience genetic mutations, leading to uncontrolled growth and the formation of tumors. Consequently, lung function is affected, causing symptoms like persistent cough, chest pain, shortness of breath, and coughing up blood. There are two main types of lung cancer: non-small cell lung cancer (NSCLC) and small cell lung cancer (SCLC), with NSCLC being more prevalent. Smoking is the primary risk factor for lung cancer, though it can also occur in non-smokers due to various environmental exposures. Early detection is critical for improving outcomes, and diagnosis involves imaging tests and biopsies. Treatment options include surgery, radiation therapy, chemotherapy, targeted therapy, and immunotherapy, tailored to individual factors such as cancer type and stage, overall health, and patient preferences.

HSA Kit

HSA kit is a deep learning software which can analyze lung cancer by detecting high-resolution images of tissues

HSA kit detect the lung cancer by two most important structures:

1. stroma-red
2. tumor-green

Whole slide images (WSI)

Whole Slide Images (WSI) are complete digital scans of tissue slides used in medical imaging and pathology. These images provide highly detailed representations of tissue samples at different magnifications, resembling traditional microscopy. WSI files can be quite large, but they offer the advantage of remote access, enabling easy collaboration and analysis. WSI finds applications in pathology diagnosis, research, education, and quality assurance. Despite some challenges like file management and data security, the use of WSI in healthcare is on the rise due to its potential benefits.

Digital representation of images

Digital representation of images involves converting visual information, including colors and intensity, into a numerical format that can be processed and interpreted by computers and electronic devices. In this process, images are broken down into small square units called pixels, with each pixel assigned specific color and brightness values. These values are typically represented using the RGB color model, where combinations of red, green, and blue intensities create a wide range of colors. The resulting digital image is essentially a two-dimensional grid of pixels, with higher resolution images containing more pixels and finer detail. Various image file formats are used to store digital images, employing compression techniques to reduce file size while maintaining image quality. Once in digital form, images can be easily edited and manipulated using software, allowing for operations like resizing, color adjustments, and filters to alter the image’s appearance.

we have three types of slides:

1. HE slides
2. KI67 slides
3. CK slides

the different between each slide based on their staining, also each of them has different exposure of time and different Lense to be scanned.

HE slides

HE slides in biology pertain to tissue samples that have been treated with Hematoxylin and Eosin staining techniques. This widely used method in histology and pathology involves coloring tissue components differently to facilitate microscopic examination. Through this technique, nuclei are stained in shades of blue-purple using hematoxylin, while eosin imparts various shades of pink to cytoplasm and extracellular structures. This staining process is crucial for identifying cellular structures and abnormalities within tissues, aiding in the diagnosis of medical conditions.

KI67 slides

KI67 slides in biology refer to tissue samples that have been prepared and stained for the Ki-67 protein. This protein is a marker of cell proliferation, and its presence indicates actively dividing cells within the tissue. Staining for Ki-67 involves using specific antibodies that bind to the protein, allowing researchers to visualize and quantify the number of proliferating cells under a microscope. This technique is commonly used in research and medical settings to assess the growth rate and activity of cells in various tissues, helping to understand processes like tissue regeneration, cancer growth, and other cellular dynamics.

CK slides

CK slides in biology involve tissue samples that have been subjected to staining techniques targeting Cytokeratins (CKs), a group of proteins found in cells, especially epithelial cells. These proteins play a key role in maintaining the structural integrity of cells and are particularly abundant in tissues like skin, glands, and lining of organs. Staining for CKs employs specific antibodies that bind to these proteins, allowing researchers to identify and visualize epithelial cells and their patterns under a microscope. This staining method is widely used in histology and pathology to classify and study various types of tumors, as different CKs can indicate the origin and characteristics of cancerous cells.

Digital representation of images involves converting a 2D image into a grid of pixels, forming a matrix with each pixel represented by numerical values. These values encode the color and intensity of the corresponding part of the image. In the RGB color model, each pixel is described by three numerical values for red, green, and blue intensities. These values range from 0 to 255, allowing a broad spectrum of colors to be displayed. The resolution of the image depends on the number of pixels, with higher resolution images containing more pixels and finer details, but also requiring more memory.

In this table we can see all information that we need, we had 9 slides of lung cancer tissues, all of them are scanned, also as you can see the number of annotations of each slide and each one is done.

Deep learning (DL)

Deep Learning (DL) is a specialized field within machine learning that centers on artificial neural networks with multiple layers. These networks are designed to emulate the human brain’s structure and functioning, enabling them to comprehend intricate patterns and connections in data. DL has become increasingly popular due to its capacity to manage vast amounts of data and autonomously extract crucial features for decision-making. A crucial element of deep learning models is the artificial neuron, which processes input, applies mathematical transformations, and produces an output. Neurons are organized into layers, including input, hidden, and output layers. The deep neural networks consist of multiple hidden layers, allowing them to learn complex patterns and hierarchies. During training, the model adjusts the numerical parameters of neurons to minimize the difference between predicted and actual outputs. This training process involves forward propagation, where input data is passed through the network to calculate neuron activations, and backpropagation, which adjusts the neuron parameters based on error propagation. Deep learning has achieved significant success in image and speech recognition, natural language processing, recommendation systems, autonomous vehicles, and healthcare diagnostics. Its ability to learn complex representations from raw data, handle vast datasets, and leverage hardware advancements has made it a revolutionary technology in various domains.

Feature Extraction

Feature extraction is an essential process used in machine learning and computer vision to transform raw data into a more concise and meaningful representation. This step involves selecting or creating relevant features that capture important characteristics of the data, making it easier for machine learning algorithms to process and make accurate predictions. Feature extraction is performed to simplify the data while retaining its essential information, which can lead to more efficient processing and better generalization of machine learning models. Two types of feature extraction methods are handcrafted features, where experts manually design features based on domain knowledge, and learned features, which are automatically derived from the data using unsupervised or supervised learning techniques. Handcrafted features are designed to capture specific patterns relevant to the task, while learned features adapt to complex data distributions and often outperform handcrafted features in performance. Feature extraction plays a critical role in preparing data for various applications, including image recognition, speech processing, and natural language understanding.

Ground truth data

Ground truth data is verified and labeled data used as a reliable reference to evaluate the performance of machine learning algorithms and models. It represents the accurate outcomes or attributes associated with the input data. In supervised learning, the algorithm learns from the input data and their corresponding ground truth labels to make predictions or classifications. By comparing its predictions to the correct labels, the algorithm refines its parameters to improve accuracy. Ground truth data is obtained through human annotation or expert judgment, ensuring its reliability. It is crucial for training machine learning models to make accurate predictions on real-world data. However, obtaining high-quality ground truth data can be time-consuming and costly. In cases where it is challenging to acquire ground truth data, researchers may use methods like crowdsourcing or active learning. Evaluating the model on separate data with known ground truth labels allows researchers to measure its performance metrics. Ground truth data plays a fundamental role in various tasks, such as image recognition, natural language processing, and data analysis, ensuring the effectiveness and accuracy of machine learning models in real-world applications. in this figure you can see the number of files.

Neural Networks

Neural networks are artificial intelligence models inspired by the human brain’s structure and functioning. They are used to solve complex problems by learning patterns and relationships in data. Composed of interconnected artificial neurons, neural networks process input data through layers, applying mathematical operations to produce outputs. During training, neural networks adjust their connections (weights) using labeled data to minimize prediction errors. This process, called backpropagation, updates the network’s parameters to improve accuracy. Neural networks find applications in various domains, such as image and speech recognition, natural language processing, and decision-making. Their ability to handle complex patterns in data has led to their widespread adoption and success. Different types of neural networks, including feedforward, CNNs, and RNNs, cater to specific tasks like image processing, sequential data analysis, and language modeling. Despite their effectiveness, neural networks face challenges, such as overfitting and the need for ample labeled data. Ongoing research seeks to improve neural network performance and address these issues through advanced architectures and optimization techniques.

Convolutional Neural Network (CNN)

A Convolutional Neural Network (CNN) is a specialized type of artificial neural network that excels in analyzing visual data like images and videos. It mimics the human brain’s visual processing capabilities, automatically learning and extracting meaningful features from the input data. The CNN’s core component is the convolutional layer, which applies filters to detect patterns in the input data. These filters are adjusted during training to capture various visual features. Activation functions and pooling layers further enhance the network’s ability to recognize patterns efficiently. CNNs use fully connected layers to map the extracted features to the final output classes. They excel in learning hierarchical representations, enabling them to handle complex visual tasks. CNNs are trained through supervised learning, adjusting their parameters to minimize prediction errors using labeled training data. They are widely used in image recognition, object detection, and other visual recognition tasks.

Mask Region-Based Convolutional Neural Network (Mask R-CNN)

Vision Transformer (ViT)

Explainable DL (xDL) and Human Machine Interaction (HMI).

Explainable Deep Learning (xDL) is a specialized branch of deep learning that focuses on creating models and methods that can offer clear and understandable explanations for their predictions and decisions. Unlike traditional deep learning models that are often seen as inscrutable black boxes, xDL strives to produce interpretable outcomes, making it easier for humans to comprehend and trust the model’s decisions. xDL employs various techniques to visualize and explain the learned features, highlight the significant input factors, and provide insights into the model’s decision-making process. This transparency is especially valuable in critical applications like healthcare and finance, as it helps build user confidence and ensures that the model’s predictions are reliable and easily understandable. Human-Machine Interaction (HMI) involves studying and designing interfaces that enable efficient communication and collaboration between humans and machines. It encompasses the ways humans interact with machines, including computers and robots, to accomplish tasks, access information, and control functions. HMI focuses on creating user interfaces that are easy to use, intuitive, and responsive to human needs. The goal is to establish a smooth and natural interaction between humans and machines, minimizing the learning curve and enhancing the overall user experience. HMI considers various interaction modes, such as visual interfaces, voice recognition, touch screens, and gestures. HMI is critical in domains like consumer electronics, automotive systems, virtual reality, and healthcare. The effectiveness of HMI significantly impacts user productivity, safety, and satisfaction, making it an essential area of research and development in human-computer interaction.

Heat Map.

A heat map is a graphical representation that uses colors to visualize data values within a dataset. It helps display large amounts of data in an easily interpretable way. In a heat map, the intensity of colors corresponds to the magnitude of the data values, with warmer colors representing higher values and cooler colors representing lower values. Heat maps are valuable for identifying patterns and trends in data, especially in large datasets or when dealing with tables or grids. They find applications in data analysis, finance, weather forecasting, biology, and sports analytics, among other fields

Cost and loss function.

Cost and loss functions are crucial components in machine learning, particularly in supervised learning tasks. They serve as measures of the model’s performance and guide the model’s parameter adjustments during training. The loss function quantifies the discrepancy between the model’s predictions and the actual labels in the training data. The objective is to minimize this function, as a lower value indicates better prediction accuracy. The choice of the loss function depends on the task and data type. The cost function, which is synonymous with the objective function or loss function, aggregates the loss values across the entire training dataset. Minimizing the cost function helps find the best model parameters to fit the training data accurately and produce generalized predictions.

Human-Machine Interaction (HMI)

(HMI) is the relationship between human and machines. To classify and make the GTD the annotation tool is used which contains large set of data from hundreds to thousands. For an excellent and accurate concept, a large data base is needed Accuracy is the key in this work. Good results depend on the amount of the given data. Increasing data leads to better results.

Size of the data is important but at the same point the quality of the generated samples is not less.

Model name	Model Type	Structures	Epochs	Batch Size	Learning Rate
HyperlungNet V1	Instance segmentation	Stroma tumor	200	2	0.0001
HyperlungNet V2	Instance segmentation	Stroma tumor	200	2	0.0001
HyperlungNet V3	segmentation	stroma	200	2	0.0001
HyperlungNet V4	segmentation	tumor	200	2	0.0001
HyperlungNet V5	segmentation	tumor	200	2	0.0001
HyperlungNet V6	segmentation	Stroma tumor	200	2	0.0001
HyperlungNet V7	segmentation	Stroma tumor	200	2	0.0001
HyperlungNet V8	segmentation	Stroma tumor	200	2	0.0001
HyperlungNet V9	Instance segmentation	tumor	200	2	0.0001