3. KC Tools (module)¶
Some klustering related modules
This modeule is a collection of many useful functions.
Contents
-
kClusterLib.kcTools.
centerVectors
(sensor_vectors)[source]¶ Applies Mean centering (and scaling too..) to SPND data.
\(\tilde{s_{ij}} = \frac{ s_{ij} - \bar{s} }{ \sqrt {\sigma}}\)
Parameters: sensor_vectors (numpy.ndarray) – SPND data matrix. SPNDs along columns. Returns: - centered_array (numpy.ndarray) – Matrix(SPNDs along column) with elements centered and scaled.
- mean_vector (list) – A row vector containing the sensor means.
- sqrt_var_vector (list) – A row vector containing the \(\sqrt{\sigma^2}\) for each sensor.
-
kClusterLib.kcTools.
getCentroids
(mat, clusters)[source]¶ Gives Centroids of clusters. Requires data matrix and cluster map.
Note
This function is obsolete now. Use Pycluster.clustercentroids instead.
-
kClusterLib.kcTools.
separateClusters
(kcluster, labels=[])[source]¶ Returns a list of lists containing separated clusters i.e. SPND’s that belong to same cluster are put together in a list. As many list’s as there are clusters.
Parameters: kcluster (list) – Takes a 1-D cluster list representing {SPND <–> Cluster} mapping. Returns: cluster_result – A list containing members list which contain the integer indexes of SPNDs belonging to a cluster Return type: list (of lists) Example
>>> from kcTools import separateClusters >>> clusters = [ 2, 0, 1, 3, 2, 0, 1, 3, 2, 0, 1, 3] >>> print separateClusters(clusters) >>> ... [[1, 5, 9], [2, 6, 10], [0, 4, 8], [3, 7, 11]]
-
kClusterLib.kcTools.
prettyPrint
(kcluster, labels=[])[source]¶ Prints human readable format from cluster data.
Parameters: - kcluster (list) – cluster list representing SPNDs to Cluster mapping.
- labels (list, optional) – list containing string names of SPNDs. If no list is passed automatic numbering is used.
Returns: Return type: Displays output on stdout.
Example
-
kClusterLib.kcTools.
getInterClusterS
(mat, clusters)[source]¶ Calculates the sum of intercluster distances.
Parameters: - mat (numpy.ndarrray) – SPND data used for clustering.
- clusters (list) – cluster map of SPND.
Returns: interClusterDistance – Sum of cluster means from global mean.
Return type: float
Example
-
kClusterLib.kcTools.
count_nan
(arr)[source]¶ Counts occurances of numpy.nan in the passed data structure.
Parameters: arr (numpy.ndarray, list) – The dataset in which nan(s) is to be counted. Returns: count – number of occurances of nan within the data passed. Return type: int Example
-
kClusterLib.kcTools.
custom_filter
(mat)[source]¶ Converts negative elements to 0 (zero) in the passed data.
Parameters: mat (numpy.ndarray) – matrix to be cleaned of -ve values Returns: cleanMat – matrix with negative elements converted to zero Return type: numpy.ndarray
-
kClusterLib.kcTools.
removeFaultySensors
(matSensor, labelSensor, minSensorOutput=0, allowedFault=25)[source]¶ Eliminates those SPNDs which have x%% bad sensor readings, x is provided by user.
Parameters: - matSensor (numpy.ndarray) – Data to be filtered.
- labelSensor (list) – list of SPND labels.
- minSensorOutput (float) – sensor faulty value(output) or bad value threshold. If a sensor value is below this value then it is considered a bad value.
- allowedFault (float) – %% threshold for total bad values. If a SPND has these many (or more) %% of bad readings then it is eliminated. Corresponding label is also removed from the list of labels
Returns: - matSensor (numpy.ndarray) – SPND data after removval of faulty SPND column(s).
- labelSensor – label list after removal of faulty SPND label(s).
-
kClusterLib.kcTools.
getOptimalCluster
(sensor_mat, Si_threshold=0.1, Kmax=8, **kwargs)[source]¶ Does clustering multiple times and tries to find the optimal cluster.
Parameters: - sensor_mat (numpy.ndarray) – SPND data matrix.
- Si_threshold (int) – Desired ratio of Intra to Inter Cluster distance. This value is a measure of closeness of the similar SPNDs in a cluster.
- Kmax (int) – Upper limit on number of clusters. The clustering program starts clustering from k=2 (two clusters) and then keeps increases k to find the smaller Si.
Returns: kcluster (list) – A list containing the cluster mapping of SPNDs. The index corresponds to SPND and value corresponds to Cluster. Eg
kcluster = [2,0,1]
signifies\(0^{th} spnd :\to cluster 2\)\(1^{st} spnd :\to cluster 0\)\(2^{nd} spnd :\to cluster 1\)error (float) – Represents the sum of intra cluster distances.
freq (int) – Represents how many times the optimal solution was found while clustering.
Example
>>> from pylab import randn >>> import numpy >>> from kcTools import getoptimalCluster, prettyPrint >>> #Lets create a random matrix >>> a = numpy.array(randn(100)).reshape(10,10) >>> #perform clustering >>> kc, er, fc = getoptimalCluster(a,0.5,5) >>> prettyPrint(kc,['a','b','c','d','e','f','g','h','i','j']) 0: ['d', 'h'] 1: ['a', 'g'] 2: ['b', 'f', 'i'] 3: ['c', 'e', 'j']
...
-
kClusterLib.kcTools.
loadPcaData
(**kwargs)[source]¶ Returns relMatrix and residueCovMat.
PCA model data. This function searches for the stored data in current directory or some other directory (based on cpath arguments)
Parameters: - fname (str, optional) – String Name of cluster to load from current dir
- cpath (str, optional) – System path of directory where cluster is stored.
Returns: - relMatrix (ndarray) – PCA relations Matrix
- residueCovMat (ndarray) – Data-Covariance matrix
- .. note (order of arguments is to be tracked carefully.)
-
kClusterLib.kcTools.
savePcaData
(relMat, residueCovMat, **kwargs)[source]¶ Saves a given cluster to memory.
Parameters: - relMatrix (ndarray) – PCA relations Matrix
- residueCovMat (ndarray) – Data-Covariance matrix
Returns: name – Returns
<filename>.np
if successfully written the data to disk/dir.Return type: str
...
-
kClusterLib.kcTools.
loadKCFromDisk
(**kwargs)[source]¶ Returns clusterMap, CoVcombinedMat, spndLabels, spndMeans, spndVars from disk (saved data).
This function searches for the stored data in current directory or some other directory (based on cpath arguments)
Parameters: - fname (str, optional) – String Name of cluster to load from current dir
- cpath (str, optional) – System path of directory where cluster is stored.
Returns: - clusterMap (list) –
If a cluster is located in the given path( or current directory)
else
None
is returned. - CoVcombinedMat (numpy.ndarray) – matrix stored in memory
- spndLabels (list) – List of string names for SPNDs
- spndMeans (ndarray) – 1-D numpy array containing Mean of each SPND sensor.
- spndVars (ndarray) – 1-D numpy array containing Mean of each SPND sensor.
Example
Here’s a use case of this function
>>> from kcTools import * >>> from pylab import randn >>> import numpy as np >>> mat = np.array(randn(1500)).reshape(150,10) >>> kc = np.array(range(6)) >>> labels = ['a','b','c','d','e','f','g','h','i', 'j'] >>> means = [np.mean(mat[:,i] for i in range(10))] >>> vars = [np.var(mat[:,i] for i in range(10))] >>> saveKCToDisk(kc, mat, labels, means, vars) >>> clusterMap, CoVcombinedMat, spndLabels, spndMeans, spndVars = loadKCFromDisk() >>> print( "kc: {}\r\n\r\nmat:{} \r\n\r\nlbl:{} \r\n\r\nmeans:{} \r\n\r\nvars:{}".format(clusterMap, CoVcombinedMat, spndLabels, spndMeans, spndVars) )
-
kClusterLib.kcTools.
saveKCToDisk
(clusterMap, CoVcombinedMat, spndLabels, spndMeans, spndVars, **kwargs)[source]¶ Saves a given cluster to memory.
Parameters: - clusterMap (list) – kcluster information to be saved to disk
- CoVcombinedMat (ndarray) – data used for clustering (mean centered and scaled)
- spndLabels (list) – String names for SPNDs
- spndMeans (ndarray) – 1-D numpy array containing Mean of each SPND sensor.
- spndVars (ndarray) – 1-D numpy array containing Mean of each SPND sensor.
- fname (str, optional) – Name of data file to be created and stored on dir.
- cpath (str, optional) – Valid system path where the cluster must be stored.
Returns: name – Returns
<filename>.np
if successfully written the data to disk/dir.Return type: str
...
-
kClusterLib.kcTools.
getFreshCluster
(**kwargs)[source]¶ Searches for a valid cluster on disk.
Note
No need to use this function directly, use
loadKCFromDisk()
instead
-
kClusterLib.kcTools.
get_datetime
(fstr)[source]¶ Pass a valid DATE-Time string to this function and get a valid
datetime
object
Parameters: fstr (str) – A valid datetime string Returns: timestamp – Valid datetime object. This allows us to check the freshness of files or checking how mush time has elapsed since a perticular file was created or Return type: datetime instance
-
kClusterLib.kcTools.
halt
()[source]¶ Halts the execution of program until user chooses to proceeed
Parameters: None – Returns: Return type: None
-
kClusterLib.kcTools.
isSingletonCluster
(cluster)[source]¶ Check if a singleton cluster exists in the passed 1-D SPND-cluster mapping array.
Parameters: cluster (list) – 1-D array of length( total number of SPNDs ) containing the SPND - cluster mapping. Returns: status – Returns True if one or more singleton cluster labels occur in cluster. Return type: bool Example
>>> from kcTools import isSingletonCluster >>> clustr = [2,3,0,1,2,0,3,2,0] # Here 1 is singleton >>> print (isSingletonCluster(clustr)) # printing >>> clustr = [2,3,0,2,2,0,3,2,0] # No singleton >>> print (isSingletonCluster(clustr)) # printing
Produces Following output
True False
-
kClusterLib.kcTools.
mergeSingletonCluster
(kcluster, mat, **kwargs)[source]¶ Merges the singleton cluster(if present) to nearest neighbour(S=1-abs(PC)) in the 1-D SPND to cluster MAP.
Parameters: - kcluster (list) – List containing the 1-D SPND to cluster-id Mapping.
- mat (ndarray) – Matrix like structure containing the SPND sensor values.
Returns: cluster – Updated cluster mapping
Return type: list
Example