Detailed Configuration Parameters

<< Click to Display Table of Contents >>

Current:  Data Mart > MPP > Configuration Requirements 

Detailed Configuration Parameters

Previous pageReturn to chapter overviewNext page

There are two configuration files in one distributed data mart system: local property configuration (Local Properties) and Global property configuration (Global Properties).

 

Local Properties Configuration 

The local property configuration file is the necessary property file for the node of every machine. By default, it is saved at {bi.home}/bi.properties., and bi.home path is the directory relative to the installation path YH\Yonghong\bihome. 

Attribute

Optional/Required

Description

dc.io.handlers=1

Optional 

Defining the thread counts processing IO communication. Generally, one thread count is enough.

dc.io.channels=2

Optional 

Defining the maximum Socket connections when communicating with node of other machine. 

dc.io.ip=

Mandatory

Defining the IP of local machine, especially in the multiple networking interfaces. If it is not defined, the IP will be obtained by attempting from the operation system.

dc.node.types=mr

Mandatory

Define the node type of local machine, and m - Map Node, r - Reduce Node, n – Naming Node, c - Client Node. Generally it's the combination of above values.

dc.global.path=global_bi.properties

Mandatory

Define the configuration file path shared by the node of each machine. 

mem.serial.mem=700

Mandatory

Define the memory size that can be distributed in batch to the memory calculation. The unit is M

mem.proc.count=2

Mandatory

Define the CPU number that can be used for memory calculation.

dc.block.units=4

Optional 

Define the number of data cell in one data block. This data block is the physical file loaded or unloaded from memory. The data block is distributed to every Map node. 

dc.unit.rows= 262144

Optional 

Define the row count of one data cell. This data block will form a physical file to be distributed to each Map node. 

dc.fs.naming.paths=

Mandatory

Define the file path of saving the metadata of Naming node. The file path can be multiple, and seperated by ';'. In this way, metadata file has higher security. Please input the absolute path. The default value is {bihome}/cloud/cloud/qry_sub.m

dc.naming.waiting=30000

Optional 

Defining after Naming Node startup, for how long the available status can be switched back.

dc.naming.maps=1

Optional 

Defining after  the Naming Node startup, for how many alive Map node that available status can be switched back.

dc.naming.reds=1

Optional 

Defining after  the Naming Node startup, for how many alive Reduce node that available status can be switched back.

dc.naming.check.file=true

Optional 

Defining after  the Naming Node startup, whether the correct metadata should be ensured before the available status being switched back. The so-called correct metadata refers that all folders and files in the metadata are available.

dc.fs.sub.path=

Mandatory

Define the file path saving the metadata of Map node or Reduce Node. Please input the absolute path. The default value is {bihome}/cloud/cloud/qry_sub.m 

dc.fs.physical.path=

Mandatory

Define the folder saving the physical data of Map node or Reduce Node. Please input the absolute path. The default value is {bihome}/cloud/cloud 

dc.col.cache.count=20

Optional 

Define the number of maximum memory caching of every storage type. 

dc.data.debug=false

Optional 

Define whether to output the  debugging information of data.

dc.inverted.supported=false

Optional 

Define whether to try to produce the column index to speed up the performance. 

dc.inverted.ratio=3.1

Optional 

Define the size of index of each row in average when trying to produce the index. 

dc.buf.cache.count=10

Optional 

Define the number of memory caching in the data buffering area used by communication

dc.float.frags=4

Optional 

The decimal place saved in mart of single-precision floating point number. 

dc.double.frags=4

Optional 

The decimal place saved in mart of double-precision floating point number. 

dc.mr.debug=false

Optional 

When executing Map, Reduce task, type the execution progress of Map, Reduce in every 20s intervals. 

dc.orderby.limit=500000

Optional 

The maximum grouping number supporting sorting.

map.aggr.parallel=false

Optional 

Whether to carry out parallel process to a zb file according to data fragment and Hash partition at Map end. 

red.aggr.parallel=true

Optional 

Whether to carry out parallel process according to Hash partition at Reduce end. 

map.part.size=4

Optional 

The number of hash partition at Map end. 

red.part.size=32

Optional 

The number of hash partition at Reduce end. 

aggr.timeout=600000

Optional 

The time-out period waiting for the end of relevant thread processing in the parallel processing. 

parallel.min.groups=10000

Optional 

The minimum grouping number that needs the parallel processing  at Reduce end. 

The default value is available for the system with “Optional” marked. The default value equals to the result of the first column. 

 

Global Properties Configuration 

The global property configuration file saves all the property files shared by all machine groups. It is saved at {bi.home}/global_bi.properties by default.and bi.home is the directory of corresponding installation path YH\Yonghong\bihome. 

Attribute

Optional/Required

Description

dc.io.local=true

Optional 

Standalone or multi-machine versions are marked. The default is local standalone version. 

dc.cache.max=5242880

Optional 

Define the maximum memory caching. If the quantity of data over the maximum is read/wrote, one time of physical read/wrote will be triggered. 

dc.io.timeout=15000

Optional 

Define the maximum waiting time of communication between nodes of two machines.

dc.io.block=131072

Optional 

Define the caching size of Socket read and write. 

dc.io.sport=5083

Optional 

Define the ports of communication among nodes  of each machine. 

dc.io.fport=5066

Optional 

Define the port of transferring files among nodes of each machine. 

dc.node.naming=

Mandatory

Define the IP of Naming Node, if it is local standalone version, no need to define it. 

dc.fs.dup=2

Optional 

Define the number of copies of file system. 

dc.update.period=15000

Optional 

Define the cycle of heartbeat. For every cycle of heartbeat, Map/Reduce node shall send a report to Naming Node, to declare its survival. 

dc.task.timeout=60000

Optional 

Define the maximum time to complete one task. If the task is not complete over the maximum time, the system shall try to re-distribute the task. 

dc.nodes.pin=

Optional 

Define the Pin code needed by the communication among nodes. If Pin is null, the Pin will not be checked. The default value is null. 

dc.doctor.repair=false 

Optional 

Define whether to restore the lost file.

dc.mismatch.remove=false

Optional 

Define whether to delete the nonexistent zb file in Meta. 

file.sync.interval=3600000

Optional 

Define the time interval of metadata file of total quantity update. 

global.data.timeout=600000

Optional 

Define the time-out time of obtaining the dimension table. 

zk.conn.timeout=120000

Optional 

Customize the time-out of communication between client-side and ZooKeeper cluster node.  

zk.conn.hosts

Optional 

Customize the address from the client-side to ZooKeeper cluster. Multiple addresses are separated by commas, such as zk.conn.hosts=192.168.3.138:2181,192.168.3.138:2182,192.168.3.174:2181. 

dc.use.backup=false

Optional 

Customize whether the Naming backup mechanism is enabled.

dc.backup.max.bytes=1048576

Optional 

Customizing after and enabling the Naming backup mechanism, maximum transportable log size from each Naming node to ZooKeeper.