<< Click to Display Table of Contents >> Improvement of Other Performance Points |
![]() ![]() ![]() |
❖Improvement of expression operation execution efficiency
Currently, about the processing of expression, part of expressions use the built-in processing engine of product, while part of expressions use jsEngine to process. For one expression who has the expression supported by the system, as well as the expression not supported by the system, the jsEngine will be used for analysis.
When processing the mart data of large data quantity, the efficiency of expression processed by jsEngine is lower than the built-in expression supported by the product. 7.0 version expanded the scope of expressions supported by the product, mainly added the supports for the following expressions:
Function |
Description |
---|---|
Right |
Capture the specified length at right of character string. If the length of character string is not enough,then take the entire string directly. |
Left |
Capture the specified length at left of character string. If the length of character string is not enough, Then take the entire string directly. |
Trim |
Trim the space at two sides of character string |
Upper |
Change all character to uppercase |
Lower |
Change all character to lowercase |
Weekdayname |
Get the serial number of a week, for example the default Sunday is 1 |
Year |
Return the year of date |
Month |
Return the month of date |
parseDate |
Parse the string into date |
FormatDate |
Format the date into string |
❖Local Reduce optimization
The local Reduce computation can be undertook in advance at data node (Map Node) in MPP data mart, i.e., Local Reduce.
In some computational scenarios, there is a type of dashboard of self-service analytics which generally depends on larger data set,and its binding dimension is not fixed, so the result set might be larger. When the result set is larger, the memory processing by reduce Node easily becomes the bottleneck, which cannot fulfill the computational processing demands. Therefore, the optimization is undertook to Local Reduce. When the computation fulfills Local Reduce, wait certain time at Map Node to conduct more Local Reduce computation as possible, and fully utilize the computational resource of mart to reach the objective of improving the processing speed of mart computation.
Currently, Local Reduce mainly decides whether Map result requires Local Reduce computation from these aspects:
•Summary computation and the result set of Map computation is less than 500000;
•The compression ratio of current local reduce is less than 0.33;
•When Map completes, the execution of local reduce already occurs, which also meet the above rules.
❖Multiple Reduce nodes participate in aggregate calculations
Multiple Reduce nodes participate in the aggregation calculation, and return the result to the client node for the result set. Not only can the parallelism of the calculation be improved, the pressure of the memory consumption of the Reduce node can be alleviated, and the streaming processing effect can be realized, and the single Reduce calculation result can be returned to the subsequent calculation and use in time.
Note: By default, a single Reduce node is still involved in the aggregate calculation. For multiple Reduce node participation, add the parameter _RED_TASK_COUNT_ (default is 1) in the dataset or production report module. This parameter value represents that several Reduce nodes participate in the calculation. The premise is that the user has enough Reduce nodes.
1. Multiple Reduce switches through parameter control, in the creation of the dataset interface or the production report interface, click on the edit parameters, as shown below
2. Click Add Parameter, enter _RED_TASK_COUNT_ in the edit name, and click OK, as shown below.
3. Set the parameter type to numeric type, check the single value, and enter the number of Reduce nodes that need to participate in the calculation. Click OK. For example, the number that needs to participate in Reduce calculation is 2: as shown in the following figure
❖Separation of Management and Computing Tasks
Thread pools are used in many parts of the product. Some problems caused by asynchronous thread pool tasks are waiting tasks. The tasks of management type and computation type are physically separated. When there is a queue of tasks, there will be no interaction between them.
Relevant property configuration:
The values in the following attribute configurations are the default values of the system, which can be adjusted according to the usage. Generally, the default values can be used.
Function |
Description |
---|---|
performance.channel.separate=true |
Define whether to open management task and compute task isolation |
❖Separation of Management and Computing Tasks
Compress the column content of the Market Data expression. After the expression value is calculated, the content is compressed to reduce the memory and disk usage, and the stored files are also compressed.