ML-2-DBSCAN

January 2, 2020

What is DBSCAN

DBSCAN(Density-based spatial clustering of applications with noise)是一種基於密度的演算法。給定一個數量點的閾值,表示該群集要超過一定的密度程度。密度程度則會透過距離方式來做計算。

Concept

Parameters

Density Definition

$N_{\varepsilon}(p):{q|d(p,q) \leq \varepsilon}$

${\varepsilon}$-Neighborhood of $p$ ${\varepsilon}$-Neighborhood of $q$ Density of p is high (MinPts = 4) Density of q is low (MinPts = 4)

Core, Border & Outlier

給定 ${\varepsilon}$ 和 MinPts,將對象分為三個群集。

Density-reachability(directly and indirectly)

Example

import matplotlib.pyplot as plt  
import numpy as np  
from sklearn import datasets 
from  sklearn.cluster import DBSCAN

iris = datasets.load_iris() 
X = iris.data[:, :4]
plt.scatter(X[:, 0], X[:, 1], c="red", marker='o', label='see')  
plt.xlabel('petal length')  
plt.ylabel('petal width')  
plt.legend(loc=2)  
plt.show()  

dbscan = DBSCAN(eps=0.4, min_samples=9)
dbscan.fit(X) 
label_pred = dbscan.labels_

x0 = X[label_pred == 0]
x1 = X[label_pred == 1]
x2 = X[label_pred == 2]
plt.scatter(x0[:, 0], x0[:, 1], c="red", marker='o', label='label0')  
plt.scatter(x1[:, 0], x1[:, 1], c="green", marker='*', label='label1')  
plt.scatter(x2[:, 0], x2[:, 1], c="blue", marker='+', label='label2')  
plt.xlabel('petal length')  
plt.ylabel('petal width')  
plt.legend(loc=2)  
plt.show()  

Ref