3. KC Tools (module)¶

Some klustering related modules

This modeule is a collection of many useful functions.

Contents

KC Tools (module)

kClusterLib.kcTools.centerVectors(sensor_vectors)[source]¶

Applies Mean centering (and scaling too..) to SPND data.

\(\tilde{s_{ij}} = \frac{ s_{ij} - \bar{s} }{ \sqrt {\sigma}}\)

Parameters:	sensor_vectors (numpy.ndarray) – SPND data matrix. SPNDs along columns.
Returns:	centered_array (numpy.ndarray) – Matrix(SPNDs along column) with elements centered and scaled. mean_vector (list) – A row vector containing the sensor means. sqrt_var_vector (list) – A row vector containing the \(\sqrt{\sigma^2}\) for each sensor.

kClusterLib.kcTools.getCentroids(mat, clusters)[source]¶: Gives Centroids of clusters. Requires data matrix and cluster map.

Note

This function is obsolete now. Use Pycluster.clustercentroids instead.

kClusterLib.kcTools.separateClusters(kcluster, labels=[])[source]¶

Returns a list of lists containing separated clusters i.e. SPND’s that belong to same cluster are put together in a list. As many list’s as there are clusters.

Parameters:	kcluster (list) – Takes a 1-D cluster list representing {SPND <–> Cluster} mapping.
Returns:	cluster_result – A list containing members list which contain the integer indexes of SPNDs belonging to a cluster
Return type:	list (of lists)

Example

>>> from kcTools import separateClusters
>>> clusters = [ 2, 0, 1, 3, 2, 0, 1, 3,  2, 0, 1, 3]
>>> print separateClusters(clusters)
>>> ...
[[1, 5, 9], [2, 6, 10], [0, 4, 8], [3, 7, 11]]

kClusterLib.kcTools.prettyPrint(kcluster, labels=[])[source]¶

Prints human readable format from cluster data.

Parameters:	kcluster (list) – cluster list representing SPNDs to Cluster mapping. labels (list, optional) – list containing string names of SPNDs. If no list is passed automatic numbering is used.
Returns:
Return type:	Displays output on stdout.

Example

kClusterLib.kcTools.getInterClusterS(mat, clusters)[source]¶

Calculates the sum of intercluster distances.

Parameters:	mat (numpy.ndarrray) – SPND data used for clustering. clusters (list) – cluster map of SPND.
Returns:	interClusterDistance – Sum of cluster means from global mean.
Return type:	float

Example

kClusterLib.kcTools.count_nan(arr)[source]¶

Counts occurances of numpy.nan in the passed data structure.

Parameters:	arr (numpy.ndarray, list) – The dataset in which nan(s) is to be counted.
Returns:	count – number of occurances of nan within the data passed.
Return type:	int

Example

kClusterLib.kcTools.custom_filter(mat)[source]¶

Converts negative elements to 0 (zero) in the passed data.

Parameters:	mat (numpy.ndarray) – matrix to be cleaned of -ve values
Returns:	cleanMat – matrix with negative elements converted to zero
Return type:	numpy.ndarray

kClusterLib.kcTools.removeFaultySensors(matSensor, labelSensor, minSensorOutput=0, allowedFault=25)[source]¶

Eliminates those SPNDs which have x%% bad sensor readings, x is provided by user.

Parameters:

matSensor (numpy.ndarray) – Data to be filtered.
labelSensor (list) – list of SPND labels.
minSensorOutput (float) – sensor faulty value(output) or bad value threshold. If a sensor value is below this value then it is considered a bad value.
allowedFault (float) – %% threshold for total bad values. If a SPND has these many (or more) %% of bad readings then it is eliminated. Corresponding label is also removed from the list of labels

Returns:

matSensor (numpy.ndarray) – SPND data after removval of faulty SPND column(s).
labelSensor – label list after removal of faulty SPND label(s).

kClusterLib.kcTools.getOptimalCluster(sensor_mat, Si_threshold=0.1, Kmax=8, **kwargs)[source]¶

Does clustering multiple times and tries to find the optimal cluster.

Parameters:

sensor_mat (numpy.ndarray) – SPND data matrix.
Si_threshold (int) – Desired ratio of Intra to Inter Cluster distance. This value is a measure of closeness of the similar SPNDs in a cluster.
Kmax (int) – Upper limit on number of clusters. The clustering program starts clustering from k=2 (two clusters) and then keeps increases k to find the smaller Si.

Returns:

kcluster (list) – A list containing the cluster mapping of SPNDs. The index corresponds to SPND and value corresponds to Cluster. Eg kcluster = [2,0,1] signifies

\(0^{th} spnd :\to cluster 2\)

\(1^{st} spnd :\to cluster 0\)

\(2^{nd} spnd :\to cluster 1\)
error (float) – Represents the sum of intra cluster distances.
freq (int) – Represents how many times the optimal solution was found while clustering.

Example

>>> from pylab import randn
>>> import numpy
>>> from kcTools import getoptimalCluster, prettyPrint
>>> #Lets create a random matrix
>>> a = numpy.array(randn(100)).reshape(10,10)
>>> #perform clustering
>>> kc, er, fc = getoptimalCluster(a,0.5,5)
>>> prettyPrint(kc,['a','b','c','d','e','f','g','h','i','j'])
0: ['d', 'h']
1: ['a', 'g']
2: ['b', 'f', 'i']
3: ['c', 'e', 'j']

...

kClusterLib.kcTools.loadPcaData(**kwargs)[source]¶

Returns relMatrix and residueCovMat.

PCA model data. This function searches for the stored data in current directory or some other directory (based on cpath arguments)

Parameters:

fname (str, optional) – String Name of cluster to load from current dir
cpath (str, optional) – System path of directory where cluster is stored.

Returns:

relMatrix (ndarray) – PCA relations Matrix
residueCovMat (ndarray) – Data-Covariance matrix
.. note (order of arguments is to be tracked carefully.)

kClusterLib.kcTools.savePcaData(relMat, residueCovMat, **kwargs)[source]¶

Saves a given cluster to memory.

Parameters:	relMatrix (ndarray) – PCA relations Matrix residueCovMat (ndarray) – Data-Covariance matrix
Returns:	name – Returns `<filename>.np` if successfully written the data to disk/dir.
Return type:	str

...

kClusterLib.kcTools.loadKCFromDisk(**kwargs)[source]¶

Returns clusterMap, CoVcombinedMat, spndLabels, spndMeans, spndVars from disk (saved data).

This function searches for the stored data in current directory or some other directory (based on cpath arguments)

Parameters:

fname (str, optional) – String Name of cluster to load from current dir
cpath (str, optional) – System path of directory where cluster is stored.

Returns:

clusterMap (list) – If a cluster is located in the given path( or current directory) else None is returned.
CoVcombinedMat (numpy.ndarray) – matrix stored in memory
spndLabels (list) – List of string names for SPNDs
spndMeans (ndarray) – 1-D numpy array containing Mean of each SPND sensor.
spndVars (ndarray) – 1-D numpy array containing Mean of each SPND sensor.

Example

Here’s a use case of this function

>>> from kcTools import *
>>> from pylab import randn
>>> import numpy as np
>>> mat = np.array(randn(1500)).reshape(150,10)
>>> kc = np.array(range(6))
>>> labels = ['a','b','c','d','e','f','g','h','i', 'j']
>>> means = [np.mean(mat[:,i] for i in range(10))]
>>> vars = [np.var(mat[:,i] for i in range(10))]
>>> saveKCToDisk(kc, mat, labels, means, vars)
>>> clusterMap, CoVcombinedMat, spndLabels, spndMeans, spndVars = loadKCFromDisk()
>>> print( "kc: {}\r\n\r\nmat:{} \r\n\r\nlbl:{} \r\n\r\nmeans:{} \r\n\r\nvars:{}".format(clusterMap, CoVcombinedMat, spndLabels, spndMeans, spndVars) )

kClusterLib.kcTools.saveKCToDisk(clusterMap, CoVcombinedMat, spndLabels, spndMeans, spndVars, **kwargs)[source]¶

Saves a given cluster to memory.

Parameters:	clusterMap (list) – kcluster information to be saved to disk CoVcombinedMat (ndarray) – data used for clustering (mean centered and scaled) spndLabels (list) – String names for SPNDs spndMeans (ndarray) – 1-D numpy array containing Mean of each SPND sensor. spndVars (ndarray) – 1-D numpy array containing Mean of each SPND sensor. fname (str, optional) – Name of data file to be created and stored on dir. cpath (str, optional) – Valid system path where the cluster must be stored.
Returns:	name – Returns `<filename>.np` if successfully written the data to disk/dir.
Return type:	str

...

kClusterLib.kcTools.getFreshCluster(**kwargs)[source]¶: Searches for a valid cluster on disk.

Note

No need to use this function directly, use loadKCFromDisk() instead

kClusterLib.kcTools.get_datetime(fstr)[source]¶

Pass a valid DATE-Time string to this function and get a valid datetime object

Parameters:	fstr (str) – A valid datetime string
Returns:	timestamp – Valid datetime object. This allows us to check the freshness of files or checking how mush time has elapsed since a perticular file was created or
Return type:	datetime instance

kClusterLib.kcTools.halt()[source]¶

Halts the execution of program until user chooses to proceeed

Parameters:	None –
Returns:
Return type:	None

kClusterLib.kcTools.isSingletonCluster(cluster)[source]¶

Check if a singleton cluster exists in the passed 1-D SPND-cluster mapping array.

Parameters:	cluster (list) – 1-D array of length( total number of SPNDs ) containing the SPND - cluster mapping.
Returns:	status – Returns True if one or more singleton cluster labels occur in cluster.
Return type:	bool

Example

>>> from kcTools import isSingletonCluster
>>> clustr = [2,3,0,1,2,0,3,2,0]  # Here 1 is singleton
>>> print (isSingletonCluster(clustr))  # printing
>>> clustr = [2,3,0,2,2,0,3,2,0]  # No singleton
>>> print (isSingletonCluster(clustr))  # printing

Produces Following output

True
False

kClusterLib.kcTools.mergeSingletonCluster(kcluster, mat, **kwargs)[source]¶

Merges the singleton cluster(if present) to nearest neighbour(S=1-abs(PC)) in the 1-D SPND to cluster MAP.

Parameters:	kcluster (list) – List containing the 1-D SPND to cluster-id Mapping. mat (ndarray) – Matrix like structure containing the SPND sensor values.
Returns:	cluster – Updated cluster mapping
Return type:	list

Example

3. KC Tools (module)¶

Table Of Contents

This Page