Handling Tabulated Input (CSV Data)

The Kolbert components are aimed to be used for BI and interactive reporting. Thus tabulated data (CSV files also) are a common data input that should be handled correctly. From such consideration, we have enabled these reporting components with a common model for fast and effective CSV analysis. Beyond analyzing columns and rows, the tabulated inputs can be processed in different ways before consumed by the components. Several features and options have been made available to extract or generate the needed information that should be rendered and visualized. Among these operations, we can enumerate :

The following options enables mainly data processing and customization of content to be extracted. While describing these options, we will see how they can enhance :

Let's consider the following CSV sample in our analysis of tabulated data parsing features:
Enterprise;Department;Technology;Name;Age;work_days;total_days
APPLE;IPod;Music;PM1;19;40;50
APPLE;IPod;Music;PM2;20;70;100
APPLE;IPod;Music;PM3;17;40;55
APPLE;IPod;Design;PD1;24;12;75
APPLE;IPod;Design;PD2;23;12;20
APPLE;IPod;Design;PD3;24;45;80
APPLE;IPod;Design;PD4;23;12;30
APPLE;IPod;Design;PD5;24;12;30
APPLE;IPod;Design;PD6;23;45;50
APPLE;IPod;Design;PD7;24;80;40
APPLE;IPod;Design;PD8;23;10;20
APPLE;IPhone;Telecom;PT1;19;10;15
APPLE;IPhone;Telecom;PT2;23;10;14
APPLE;IPhone;Telecom;PT3;19;20;24
APPLE;IPhone;Telecom;PT4;23;20;28
APPLE;IPhone;Telecom;PT5;19;23;26
APPLE;IPhone;Telecom;PT6;23;24;25
APPLE;IPhone;RD;PR1;23;50;55
APPLE;IPhone;RD;PR2;20;40;80
APPLE;IPhone;RD;PR3;23;60;70
APPLE;IPhone;RD;PR4;21;40;50
APPLE;IMAC;PC;M1;23;35;40
APPLE;IMAC;PC;M2;23;40;45
APPLE;IMAC;PC;M3;23;45;50
APPLE;IMAC;Electronics;E1;22;20;25
APPLE;IMAC;Electronics;E2;21;20;30
APPLE;IMAC;Electronics;E3;23;15;20
APPLE;IMAC;IRD;IE1;22;20;30
APPLE;IMAC;IRD;IE2;21;20;25
APPLE;IMAC;IRD;IE3;23;24;30

CSV delimiter (csvDelimiter)

The CSV delimiter is the character that delimits two fields.
By default, the csvDelimiter is ";" (with is also the delimiter of our example). It can be modified if the CSV data source is separated by a different delimiter.

Record delimiter (recordDelimiter)

The record Delimiter indicates the delimiter of a field (if it exists). It is generally used when data field content contains CSVDelimiter string as in such situations CSV parsing produces wrong results. By default, the recordDelimiter property is "" and can be modified when needed.

With Headers (withHeaders)

withHeaders property indicates if the first row should be considered as column headers. If set to false, the reference to each column of the parsed csv input is the string representation of its index. This reference is also used as properties for component unit data element (nodes for the Visualizer, TreeMapNode for TreeMap...). In the case where set to true (default case), column headers are the properties of the constructed elements.

FilterPath (filterPath)

The filterPath is the equivalent of the analysisPath property in the Visualizer component.

radarChart.filterPath=["Entreprise","Departement","Name"]

Unlike Visualizer, the filter path of all other components can have one column reference as they display entities and they don't care about hierarchies.

Merge Descriptor (mergeDescriptor)

This class defines merging tasks that must be applied on the CSV source. In fact, a MergeDescriptor is a set of MergeEntity instances, each one defines columns to be merged in one column according to a merging function. A MergeEntity instance is created by defining :

A MergeEntity is a unit merging task that must be added to the MergeDescriptor to be taken into account using the addMergeDescription(mergeEntity) method. When merging, you can choose between leaving the content of merged columns to be taken into account when performing data processing and adding the merging column or replacing them by the resulting column. This can be accomplished by setting to true or false the leaveMergeColumns of the MergeDescriptor instance.

The following example shows how a MergeDescriptor can be applied to our CSV sample.

var mergeDescriptor:MergeDescriptor=new MergeDescriptor(); 
var mergeEntity:MergeEntity=new MergeEntity(["work_days","total_days"],"holidays",mergeFunction); 
mergeDescriptor.addMergeDescription(mergeEntity);
mergeDescriptor.leaveMergeColumns=false;
labComponent.mergeDescriptor=mergeDescriptor; 
labComponent.build(); 

private function mergeFunction(arr:Array):Number
{
   if(arr.length!=2)
 	return 0;
   return Number(arr[1])-Number(arr[0]);					
}

Attributes Descriptor (attributesDescriptor)

When manipulating CSV files, some columns are often ignored or non-useful for the current data analysis. Thus, the content of these columns is a junk content and can be source of data overloading especially when manipulating large CSV files. The AttributesDescriptor is a solution for that problem. In fact, for columns defined in the analysis or filter path ( which are parsed as data items), the AttributesDescriptor assigns some column headers to a buildPath column as attributes. Thus, the generated data sets have the selected column headers as properties.
An AttributesDescriptor instance is a set of AttributesEntity instances defining the attributes of the data sets extracted from the buildPath column reference. An AttributesEntity is initialized by defining the column Name (must be a buildPath element) and its attributes array that contains selected column names. In analogy with the MergeDescriptor class, an AttributesEntity must be registered in the AttributesDescriptor using the function addAttributesEntity(attributesEntity).


The following example shows how a AttributesDescriptor can be used (defined after merge).

visualizer.buildPath=["Enterprise", "Name", "Department"];
var attributesDescriptor:AttributesDescriptor=new AttributesDescriptor(); 
var attributesEntity1:AttributesEntity=new AttributesEntity("Enterprise",["Technology","Age"]);
var attributesEntity2:AttributesEntity=new AttributesEntity("Name",["Holidays"]);
var attributesEntity3:AttributesEntity=new AttributesEntity("Department",["Age"]); 
attributesDescriptor.AddAttributesDescription(attributesEntity1); 
attributesDescriptor.AddAttributesDescription(attributesEntity2);
attributesDescriptor.AddAttributesDescription(attributesEntity3);
labComponent.attributesDescriptor=attributesDescriptor; 

Types Descriptor (typesDescriptor)

The component can recognize by his own the type of columns content if they are standard types. But, in some cases, numbers can refer to IDs and thus cannot be considered as Numbers but as Strings. The Types Descriptor is a Dictionary that allows to assign a Type Class to a given column data and thus giving the developer the possibility to avoid the modification of his data source while feeding the typesDescriptor property by the dictionary that describes well the content of his data source columns.

var typesDescriptor:Dictionary= new Dictionary();
typesDescriptor["Age"]=String;
labComponent.typesDescriptor=typesDescriptor; 

Reporting functions (reportingFunctions)

The reporting functions are an important part of the CSV data analysis. In fact, when building data Items and their corresponding data from a given column (based on a ReducedTable analysis), several rows are grouped into one row and each data field content (attribute column) is grouped in an array. Transforming the data fields arrays into a real data by applying standard or custom functions, can be very important to have correct and coherent data for data items generated from the CSV analysis.
The reportingFunctions property is a Dictionary that assigns to a given column key (column name) its reporting function. This function should accept as parameters an array and a type class (Integer for example).
Actually, there are two standard functions that can be accessed statically from the ReportingUtils class:

var reportingFunctions:Dictionary=new Dictionary();
reportingFunctions["Age"]=ReportingUtils.mean;
reportingFunctions["Holidays"]=ReportingUtils.mean;
labComponent.reportingFunctions=reportingFunctions;