Plankton is a biotic component at the base of an ecological pyramid and plays an undeniably crucial role in ocean ecosystems and their interconnected environmental dynamics ranging from sustaining marine food webs to influencing the global carbon cycle. Plankton, the collective term encompassing aquatic organisms transported by tides and currents, holds vital insights into these ecosystems. Understanding intricate relationships, distribution patterns, demographic cycles, and their implications for marine food webs and global climate change necessitates detecting, classifying, and monitoring plankton taxa in their ecosystem using specialized imaging devices to collect microscopic samples year-round. Leveraging modern technologies such as machine learning and computer vision, researchers have begun analyzing plankton diversity and abundance. However, the real-world plankton data gathered during monitoring follows an exponential distribution pattern. This non-uniform, class-imbalanced distribution pattern in datasets (including WHOI and NDSB) poses a formidable challenge for classification tasks, especially in identifying rare classes. While a few good attempts have been made to automate the classification of these plankton categories, significant hurdles remain. To address this challenge, we present a novel and systematic approach designed to handle exponentially distributed datasets (specifically WHOI and NDSB) with non-uniform class samples for plankton classification. Departing from traditional methods like resampling and synthetic data generation, we introduce a two-stage complexity-mitigating treatment: Dataset Class Imbalance Treatment (DIT) and Dataset Class-Overlap Treatment (DOT). In the DIT stage, we judiciously prune imbalanced classes based on exclusion criteria we formulated with Ir, and in the DOT stage, we employ our proposed M2 measure to prune classes with overlaps. We then develop the model using this refined dataset for classification. For this purpose, we incorporate a tailored knowledge transfer strategy that involves training and fine-tuning the ResNet Model hyperparameters with an optimizer equipped with a customized cyclic learning rate (CLR) schedule or policy. This strategy enhances our classifier’s ability to grasp new and learned knowledge, producing appreciable outcomes. The results we are achieving are remarkable. [...]
Table of Contents
1 Introduction
1.1 Related works
2 Datasets
2.1 WHOI
2.2 Kaggle NDSB
2.3 Dataset characteristics Measure
2.3.1 Class-Imbalance Ratio Ir
2.3.2 Class Overlap Measure M2
3 Methodology for our Experimental design
3.1 Stage-1 DIT-Criteria
3.2 ResNet Architecture
3.2.1 Architecture Description
3.2.2 Implementation
3.3 Experimental Results Stage-1 Criteria
3.4 Performance metrics
3.5 Stage-2 DOT Criteria
3.6 Stage 2 Experimental Result
4 Results & Discussion
5 Conclusion
Research Objectives
The primary goal of this research is to develop a robust, automated plankton classification system capable of handling the challenges posed by exponentially distributed, complex, and imbalanced datasets, specifically the WHOI and NDSB plankton datasets, without relying on traditional synthetic data generation or class normalization techniques.
- Mitigation of class imbalance using a two-stage treatment approach (DIT and DOT).
- Application of class-exclusion criteria based on Geometric Mean (GM) and a novel complexity measure (M2).
- Implementation of a tailored knowledge transfer strategy utilizing ResNet-50/101 architectures.
- Integration of a customized Cyclic Learning Rate (CLR) policy for optimized model training.
- Validation of classification performance across various plankton taxa using F1-score, accuracy, and AUC-ROC metrics.
Excerpt from the Book
3.1 Stage-1 DIT-Criteria
DIT (Data Imbalance Treatment) is a thoughtfully designed approach aimed at to mitigate class imbalance effects in datasets exhibiting exponential class sample distribution. The treatment which involved in systematically excluding(pruning) specific classes based on exclusion criteria with two key limit points : The lower limit which we referred it as threshold value (T) and the upper limit as cutoff point. The threshold value is determined by the central measure i.e (Geometrical Mean) of our dataset’s sample distribution, while the cutoff point, we fixed as an upper limit for samples per class is based on the class specific Imbalance ratio of the datasets.
The threshold (T), in the Exclusion criteria fix the lower limit for the samples in the class, only classes which contains samples above the threshold value are allowed for imbalance treatment, whereas class with samples lower than threshold (T) are get pruned. From the dataset distribution plot we have observed that the WHOI plankton sample dataset follows a Power rule pattern(y = ax−c), while the NDSB dataset exactly fits with an exponential curve (y = ae−x). since our dataset sample volumes indirectly represents the Planktons population samples or plankton species abundance which are commonly modeled as power-law or exponential function. In ecology the study related to species abundance distribution SAD model [13] [34] & SAR model(Species Area Relationship ) [18] the Power-law or exponential functions is ubiquitous. also, geometrical mean is the central measure represent these kind of distribution. In uniform distribution If we consider the classes around the vicinity of the mean values have nearly equal samples whereas in our datasets distribution the samples volume above the distribution mean increases exponentially, sample volume lower than GM of the distribution are decreased exponentially so the classes falls in the decreasing tail of the distribution suffers from inadequate sample representation. so we adopted the geometrical mean (GM) as the threshold value offers the lower limit. This value helps to prune the classes, In the case of WHOI, classes with fewer than 42 samples (Geometrical Mean of WHOI is 42) are pruned, and similarly, for NDSB, classes with fewer than 117 samples (Geometrical Mean of NDSB is 117) are pruned.
Summary of Chapters
1 Introduction: Provides the rationale for plankton classification and outlines the fundamental challenges of automated ecosystem modeling and data patterns.
2 Datasets: Introduces the WHOI and NDSB datasets and details the methodology for quantifying data complexity through imbalance ratios and class overlap measures (M2).
3 Methodology for our Experimental design: Describes the two-stage treatment framework (DIT and DOT), the ResNet architecture adaptations, and the experimental setup for evaluation.
4 Results & Discussion: Analyzes the experimental outcomes, demonstrating the effectiveness of the proposed treatments in reducing class imbalance and overlap while improving model performance.
5 Conclusion: Summarizes the contributions of the proposed systematic approach and its success in improving classification performance for plankton recognition.
Keywords
Plankton classification, Deep learning, Class imbalance, Data complexity, Dataset Class Imbalance Treatment (DIT), Dataset Class-Overlap Treatment (DOT), ResNet, Transfer learning, OneCycleLR, Geometric Mean, Imbalance gap, F1-score, AUC-ROC, Taxonomic identification, Ecological informatics
Frequently Asked Questions
What is the primary objective of this study?
The study aims to create an effective automated plankton classification model by systematically addressing the significant challenges of class imbalance and class overlap found in real-world, exponentially distributed ecological datasets.
What are the central themes discussed in this work?
The central themes include machine learning in ecology, deep learning architectures for classification, data complexity measures, and the development of strategies to handle non-uniform data distributions without conventional data resampling.
What is the core research question being explored?
The research explores how to achieve robust, unbiased plankton classification accuracy and improved F1-scores while maintaining a high number of taxonomic classes, especially when datasets show exponential sample distribution.
Which scientific methods are utilized?
The authors utilize a two-stage approach: Data Imbalance Treatment (DIT) based on geometric mean thresholds, and Dataset Class-Overlap Treatment (DOT) using a custom M2 measure, alongside fine-tuning deep ResNet-50 and ResNet-101 models with a cyclic learning rate policy.
What is covered in the main body of the paper?
The main body covers dataset characterization, the formulation of the DIT and DOT pruning criteria, detailed architecture descriptions, experimental setup for fine-tuning, and a comprehensive analysis of classification results using metrics like F1-score and AUC-ROC.
Which keywords characterize this work?
Key terms include Plankton classification, Deep learning, Class imbalance, DIT, DOT, M2 measure, ResNet, Transfer learning, and AUC-ROC.
Why did the authors choose ResNet over other architectures?
ResNet was selected for its proven ability to train extremely deep networks without gradient degradation issues and its capacity for feature representational efficiency, which is critical for complex, imbalanced ecological data.
What is the function of the M2 measure in this study?
The M2 measure is a custom metric defined by the authors to quantify the degree of class overlap and ambiguity in the data, serving as a basis for pruning classes that confuse the classifier.
How does the proposed approach differ from standard data augmentation?
Unlike standard methods that rely on data resampling or synthetic sample generation (like SMOTE) to force uniform distribution, the proposed approach preserves the natural distribution of the dataset via selective pruning to improve overall robustness.
What do the final AUC-ROC scores indicate about the model?
The high AUC-ROC scores (reaching 98-99) demonstrate that the model demonstrates excellent discriminative power in distinguishing between dominant and rare plankton classes even within complex high-dimensional feature spaces.
- Quote paper
- Showkat Ahmad (Author), 2024, Navigating the Depths. Effective Classification of Imbalanced Plankton Classes, Munich, GRIN Verlag, https://www.grin.com/document/1523793