<< Click to Display Table of Contents >> File System in Data Mart |
![]() ![]() ![]() |
❖File System
About Folder: Folder is a file set. Folder is only available for Naming Node.
About File: File is one document in a file set. There are three forms of File:
1.File on Client Node: File will be added into cloud system by Client Node at this time.
2.File on Naming Node: the File is metadata at this time. This metadata is also stored in memory which Map/Reduce Node contains information about the file.
3.File on Map/Reduce Node: the File is metadata at this time, which also holds pointers to physical files.
•File Set
Folder is only saved on Naming node. Folder is completely a metadata, which is made up of one or multiple Files.
•The name of Folder can include ‘/’
Why is that? This is because the full name of every File in Folder is Folder_Name/File_Name. While saving the physical file, if Folder_Name can contain '/', the following displayed full name can be formed: sales/product/file_1, and sales/product is Folder_Name. In this way, the physical file storage is not limited to two-level directory but multi-level directory. In the operation system, if too many files are under the same directory, the access efficiency will be dropped greatly. Our file system supports multi-level directory mechanism, which is mainly for avoiding this problem.
➢For example
Set the name of cloud folder as test/test1/test3 when executing the cloud task in the task plan, and file prefix is test4 , as shown in the image below.
After the completion of this task, the cloud folder and cloud file shall be generated in the specified path (at the specified dc.fs.physical.path=G\:/bihome/ cloud in root/abc@123, introduce how to set up the local data mart system in details).
•File
File on Naming Node. As the metadata, File on Naming Node is saved in Folder, due to the constant heartbeat reports sending by Map/Reduce Node, Naming Node shall receive the physical files of certain File that is saved on which Map/Reduce Node.
File on Map/Reduce Node. File on Map/Reduce Node is a metadata file, which also holds a pointer to the physical files on this computer. Map/Reduce Node send heartbeat regularly to Naming Node, to tell Naming Node that this Node is available, and provide the File information saved on this Node.
•GMeta introduction
There is a GMeta on File of Naming Node, which records some summary info of this file.
1. The summary info stored by GMeta on File can be defined by the user. For example, the user can define the date of this file storage data is June-30th-2012. In addition, the system will automatically add a file name as an item of summary info: _FILE_NAME_.
2. Folder is saved on Naming Node, and there is also a GMeta recording some summary inform of folder.
Similarly, the summary info stored by GMeta of Folder can be defined by the user. The system will automatically add two aspects:
The Columns of the Data Grid corresponding to the Folder, as well as the available summary information stored on the Gmeta on any child File, which is also saved in the form of Columns.
•The function of GMeta
The summary info saved on GMeta plays an important role in data mart system.
➢Example:
When creating a data mart data set, we need to know what Columns are available. This information can be obtained from GMeta of Folder. Because GMeta of Folder saves this information.
When the user creating a data mart data set, he/she might not want to run query based on all files under Folder. Because it is unnecessary to run data set in this way, which is very resource-consuming. At this time, the data mart data set can define File Filter to limit the File requiring access. This File Filter will run based on the GMeta on File of Naming Node. Directly finding Files needed by Query, which can significantly improve the running performance of data set, and reduce the resource consumption.
Because the available summary info saved on GMeta of File in the way of Columns on Folder, on the GUI of defining File Filter, the data mart system returns these available Columns to define File Filter for the user.
•Mart data migration
In order to facilitate the backup and migration of mart data, reduce the inconsistent of meta file and physical file, the meta storage way on Map Node and Naming Node from version 7.5 have changed. Previously the meta information in the mart file will be saved in qry_naming.m file.Now the meta information for each folder is stored separately. The system will create a folder of the same name for every GSFolder to record meta information of GSFolder. The folder of the same name and qry_naming.m file are in a sibling directory. For the meta information in mart file, previously all the meta information on the single Map Node is saved in qry_sub.m file, now the meta information is directly saved in mart file. When doing the data migration or backup, only data itself and the corresponding GSFolder meta file need migration.
In version 8.5, similar to split qry_naming.m, qry_sub.m is split and stored. The system will generate a subname.meta file to record the zb meta information in the Cloud directory of the same level as zb. For data migration or backup, you only need to migrate the data itself to a single meta file corresponding to the GSFolder.
➢Explanation:the corresponding meta information is in distributed storage, at the same time, it will be saved in qry_naming.m and qry_sub.m.
➢Attention: after migrating the mart data to the new environment,the corresponding path of the data itself and single meta file should be consistent with the path before migration. For example: the relative path of meta file before migration is bihome/cloud/coffee sales statistics/Eastern market, the relative path after migration should be bihome/cloud/coffee sales statistics/Eastern market.
•Status
No matter it is Folder, File on Naming Node, or File on Map/Reduce Node, there is a status to mark it is available or unavailable.
Folder Active: If the Folder is not Active, the Query based on this Folder cannot run, and the system do not examine whether the Folder needs automatic correction or carries out automatic correction.
Active of File on Naming Node: If the File is not Active, the Query based on this Folder cannot see this File, and the system does not examine whether the File needs automatic correction or carries out automatic correction.
Active of File on Map/Reduce Node: If the File is not Active, and the File during heartbeat is invisible.
❖Automatic management
On Naming Node, there is a full-time Doctor undertaking automatic management to the cloud system.
Some mistakes like the followings shall be corrected:
Unmatched File. For example, one File is contained in certain Map Node heartbeat report, the version number of File is expired, then Doctor will let this Map Node delete this unmatched file. Or the corresponding Folder of this Map Node no longer exists, then the Doctor will let this Map Node delete this unmatched file.Excessive backup of File. For example, the backup number of certain File on Naming Node is 3, and the system-defined reasonable backup is 2, then Doctor will choose the excessive file to delete. The delete algorithm will choose the Map/Reduce Node of relative heavy burden.
Insufficient backup File. For example, the backup number of certain File on Naming Node is 2, but the system-defined reasonable backup is 3, then Doctor will choose Map/Reduce Node of relative light burden to back up this File.
❖Manual management
Data mart system provides some API management command.
AddFolderTask: this Task will add a new Folder, or append the new File to the existing Folder.
RemoveFolderTask: this Task can delete certain Folder.
RemoveGSFileTask: this Task can delete certain File.
QueryTask: this Task can Query < read and write > some information of certain Node on Host. Currently the following sub-functions are supported:
Naming Node: active, query whether the data mart system is available.
Naming Node: activeFolder <folder> makes certain Folder available.
Naming Node: deactivateFolder <folder> makes certain Folder unavailable.
Naming Node: activateFile <file> makes certain File available.
Naming Node: deactivateFile makes certain Folder unavailable.
Naming Node: exsitsFolder <folder> query whether certain Folder exist.
Naming Node: folders query the existing Folders, and use the symbol "_sep_" to divide Folders.