GB-BC-FS
GB-BC-FS (Graph-Based Budget-Constraint Feature Selection) is an approach designed to efficiently handle large datasets with numerous features under budget constraints. Unlike traditional methods that generate multiple candidate solutions, GB-BC-FS starts with a single solution and refines it to meet budgetary limits, significantly reducing computational time. The process utilizes the GB-AFS method, which selects a minimal set of features necessary for accuracy in multi-class classification by assessing the discriminative power of features across class pairs.
To ensure feature diversity and accommodate budget constraints, the method includes a heuristic refinement step that adjusts the initial feature set. This adjustment is based on a scoring function that favors lower-cost features, simplifying the feature selection process and enhancing robustness. The approach also evaluates the interrelationships among features, moving beyond traditional isolated assessments.
Using GB-BC-FS: Code Examples and Visualization
GBFS offers a versatile and user-friendly Python library for feature selection in multi-class classification tasks. This section guides you through initializing the GB-BC-FS object with your dataset, selecting features, and visualizing the feature space.
Initialization and Parameters
To start using GB-AFS, you first need to initialize the GB-AFS object with your dataset and selection criteria:
main.py | |
---|---|
Parameters Explained
dataset_path
: Path to your dataset file. Ensure your dataset is in a CSV format or another compatible format.separability_metric
: Metric for evaluating feature separability.dim_reducer_model
: Dimensionality reduction model applying to your dataset. Must implement afit_transform
method for compatibility.budget
: Numeric limit for the total allowable cost of selected features.label_column
: Name of the column with labels in your dataset. Defaults to'class'
.alpha
: A parameter defines the cost function's scoring method related to the heuristic. Defaults to0.5
. See the paper for further details.epochs
: The number of iterations the heuristic uses to solve for each potential k value. Defaults to100
. See the paper for further details.
Current supported metrics for separability_metric
are jm
, bhattacharyya
, and wasserstein
. To request support for additional metrics, please open an issue in the repository.
NOTE: This method requires that the cost associated with each feature be specified in the dataset file, specifically on the second line. Please verify this arrangement to ensure proper functionality.
Feature Selection
Once the GB-BC-FS object is initialized, you can proceed with the feature selection process:
main.py | |
---|---|
Visualizing the Feature Space
GB-BC-FS includes a visualization method for presenting selected features within the feature space, offering insights into their distribution and separation. This visualization features two adjacent graphs: the left graph depicts the initial outcomes of the GB-AFS method, while the right graph shows the final results of the GB-BC-FS algorithm after heuristic adjustments have been applied.
The feature spaces displayed in both graphs will be similar, yet there may be notable differences in how the features are clustered and selected. This assumes that the initial solution provided by the GB-AFS algorithm was deemed inadequate, prompting the activation of heuristics.
main.py | |
---|---|
This method generates a scatter plot highlighting the selected features. Features are displayed with their separability power indicated by color intensity, and selected features are marked distinctly.
Get the Selected Features
You can get the selected features in a dictionary format using the following command:
main.py | |
---|---|
References and Further Reading
For a deeper understanding of the GB-AFS method and its background, consider exploring the official paper.