Skip to content

Marketclustering

flowtask.components.MarketClustering

MarketClustering

MarketClustering(loop=None, job=None, stat=None, **kwargs)

Bases: FlowComponent

Offline clustering of stores using BallTree+DBSCAN (in miles or km), then generating a fixed number of ghost employees for each cluster, refining if store-to-ghost distance > threshold, and optionally checking daily route constraints.

Steps

1) Clustering with DBSCAN (haversine + approximate). 2) Create ghost employees at cluster centroid (random offset). 3) Remove 'unreachable' stores if no ghost employee can reach them within a threshold (e.g. 25 miles). 4) Check if a single ghost can cover up to max_stores_per_day in a route < day_hours or max_distance_by_day. If not, we mark that store as 'rejected' too. 5) Return two DataFrames: final assignment + rejected stores.

Parameters:

Name Type Description Default
cluster_radius default

150.0)

required
Purpose

Controls the search radius for the BallTree clustering algorithm

required
Usage

Converted to radians and used in tree.query_radius() to find nearby stores during cluster formation

required
Effect

Determines how far apart stores can be and still be considered for the same cluster during the initial clustering phase

required
Location

Used in _create_cluster() method

required
max_cluster_distance default

50.0)

required
Purpose

Controls outlier detection within already-formed clusters

required
Usage

Used in _detect_outliers() to check if stores are too far from their cluster's centroid

required
Effect

Stores farther than this distance from their cluster center get marked as outliers

required
Location

Used in validation after clusters are formed

required

get_rejected_stores

get_rejected_stores()

Return the DataFrame of rejected stores (those removed from any final market).

load_graph_from_pbf

load_graph_from_pbf(pbf_path, bounding_box)

Load a road network graph from a PBF file for the specified bounding box. Args: pbf_path (str): Path to the PBF file. north, south, east, west (float): Bounding box coordinates. Returns: nx.MultiDiGraph: A road network graph for the bounding box.

run async

run()

1) Cluster with BallTree + K-Means validation. 2) Road-based validation: assign stores to ghost employees via VRP. 3) Remove any stores that cannot be assigned within constraints. 4) Re-assign rejected stores if possible. 5) Add cluster centroids to result DataFrame. 6) Return final assignment + rejected stores.

start async

start(**kwargs)

Validate input DataFrame and columns.

create_data_model

create_data_model(distance_matrix, num_vehicles, depot=0, max_distance=150, max_stores_per_vehicle=3)

Stores the data for the VRP problem.

print_routes

print_routes(routes, store_ids)

Prints the routes in a readable format.

solve_vrp

solve_vrp(data)

Solves the VRP problem using OR-Tools and returns the routes.