Advanced Analytics > Script

Python script nodes can connect to data nodes, input scripts for modeling, or connect to Python scripts for validation.

Drag and drop a dataset to connect to a Python script node. Select Python script node settings and display area with only one page: configuration item.

ML144

❖Configuration of Python script

•Version 8.5 to 9.0

Python script node has Input variables,Output variables,System methods.

Input variables

_input_table_ : The type of pandas.DataFrame , which represents the data that prepositive node outputs.

_input_model_ : It's usually assignmented a algorithm object (classifier, regressor) or any other object can be pickled and transmitted to the succeeding node, which can be used for prediction ontest-data source.

Output variables

_output_model_ : It's usually assignmented a algorithm object (classifier or regressor) and transmitted to the postpositive node, which can be used for prediction on test-data source.

_pmml_ : It's usually assignmented the PMML text of a trained model, as one of the node outputs,which is used with the method "to_pmml( )" together.

_plot_ : It's an object of "matplotlib.pyplot", which is used for output image data.

System methods

to_pmml(model, features, target) : Using for exporting a PMML text file, the paramenter "model" is your trained model , "features" is a like-list object that contains all ordered input feature names using for training a model, "target" is a string that contains the column name of the train-data set. The generated PMML text must be assignmented to the variable "_pmml_" when the method is invoked.

•Version 9.1

Changes to the Python script node:

(1) The 8.5-9.0 interface is still compatible. Only experiments created in the old version can still run normally when imported to 9.1. However, since it cannot support multiple data set input, do not use 9.0 and 9.0 in experiments newly created in version 9.1. Previous interface;

(2) The new API is used with DM-Engine_v1.2 from product 9.1, and scripts can be written offline using python SDK v0.2;

yonghong.script.port is a sub-toolkit for writing custom python scripts, including two tool classes:

EntryPoint: The input and output ports of the current script node.

Input: The data input of this node, to access certain data of a pre-node can be like this: entry.input['Data Set Node 1'][ResourceType.DATAFRAME] means obtain the output data set of "Data Set Node 1";

Output: the data output of this node;

dataset: the dataset to be output;

model: the model to be output;

pmml: pmml to be output;

images: The png image data to be output, how to use images.put_image_from_plot("Picture Name 1", plot)

ResourceType: the enumerated type of input and output data, DATAFRAME means data set, MODEL means model

❖New API Use Cases

Case 1—The pre-node is a data set node and a training model node (training model exported by Python script); the node output is a data set

# EntryPoint: The input and output ports of the current script node

# ResourceType: the enumerated type of input and output data, DATAFRAME means data set, MODEL means model

from yonghong.script.port import EntryPoint, ResourceType

entry = EntryPoint() # holds the input and output of this node

input_df = entry.input['sample data set'][ResourceType.DATAFRAME] # Get the output data of the specified pre-node'sample data set'

input_model = entry.input['Decision Tree Model'][ResourceType.MODEL] # Get the training model output by the specified pre-node ‘decision tree model’

import pandas as pd

entry.output.dataset = pd.DataFrame({'field name':input_model.predict(input_df)}) # Output the prediction result, if there is a post node, it will be received by the post node

anli1

Case 2-The pre-node is a data set node and a plug-in node (with output data set); the node output is a training model

# EntryPoint: The input and output ports of the current script node

# ResourceType: the enumerated type of input and output data, DATAFRAME means data set, MODEL means model

from yonghong.script.port import EntryPoint, ResourceType

from xgboost import XGBClassifier

entry = EntryPoint() # holds the input and output of this node

input_dataset1 = entry.input['sample data set'][ResourceType.DATAFRAME] # Get the output data of the specified pre-node'sample data set'

input_dataset2 = entry.input['standardized'][ResourceType.DATAFRAME] # Get the output data set of the specified pre-node ‘standardized’

features = ['V' + str(i) for i in range(1, 29)] + ['Amount'] # Specify the feature field.

target ='Class'

train_X = input_dataset1[features] # Read in the feature data of the training set.

train_y = input_dataset1[target] # Read in the classification value of the training set.

clf = XGBClassifier(n_jobs=3)

clf.fit(X=train_X, y=train_y)

import pandas as pd

entry.output.dataset = pd.DataFrame({'field name':clf.predict(input_dataset2[features])}) # Output the prediction result, if there is a post node, it will be received by the post node

entry.output.model = clf # If you need to pass the classifier clf to the next node, assign clf to the model.

entry.output.pmml = to_pmml(clf, features, target) # Generate the PMML text of the model.

anli2

Case 3—The pre-node is a data set node and a plug-in node (with output data set); the node output is a png image

# EntryPoint: The input and output ports of the current script node

# ResourceType: the enumerated type of input and output data, DATAFRAME means data set, MODEL means model

from yonghong.script.port import EntryPoint, ResourceType

from xgboost import XGBClassifier

entry = EntryPoint() # holds the input and output of this node

input_dataset1 = entry.input['sample data set'][ResourceType.DATAFRAME] # Get the output data of the specified pre-node'sample data set'

input_dataset2 = entry.input['standardized'][ResourceType.DATAFRAME] # Get the output data set of the specified pre-node ‘standardized’

# Drawing section

import matplotlib.pyplot as plt

from mpl_toolkits.mplot3d import Axes3D

from sklearn import manifold

df = input_dataset1

df_lle, _ = manifold.locally_linear_embedding(df.drop(columns='ElectricLeakage'), n_neighbors=30, n_components=2) # Perform dimensionality reduction operations on the data

fig = plt.figure()

ax = fig.add_subplot(211, projection='3d')

ax.scatter(df['LineLoss(%)'], df['ElectricQuantityDowntrend'], df['AlarmNumber'], c=df['ElectricLeakage'], cmap=plt.cm.Spectral)

ax.set_xlabel('LineLoss(%)')

ax.set_ylabel('EQD')

ax.set_zlabel('AlarmNumber')

ax.set_title('Original data')

ax = fig.add_subplot(212)

ax.scatter(df_lle[:, 0], df_lle[:, 1], c=df['ElectricLeakage'], cmap=plt.cm.Spectral)

ax.set_xlabel('x1')

ax.set_ylabel('x2')

ax.set_title('Projected data')

plt.axis('tight')

#After the drawing parameters are configured, do not use plt.show() to display pictures, this function is invalid

entry.output.images.put_image_from_plot("case picture name 1",plt) # After configuring the drawing parameters, pass plt to the images variable, and the chart can be drawn to the page

anli3

❖Run Python Script Node

In the Python script node's right-click menu, select "Run" to rename the node.

❖Rename Python Script Node

In the Python script node's right-click menu, select "Rename" to rename the node.

❖Refresh Python Script Node

In the Python script node's right-click menu, select "Refresh" to update the synchronization data or parameter information.

❖Save as Composite Node

In the right-click menu of node, select "Save as composite node" to save the selected node as a combined node to realize multiplexing nodes. The parameters of the saved node are consistent with the original node.

❖Copy/Cut/Paste/Delete Script Nodes

The script node's right-click menu supports copy, cut, paste, and delete operations.

【Copy】Copy the script node

【Cut】Cut the script node

【Paste】 After selecting copy, right-click on the canvas blank to paste and copy the script node.

【Delete】 Click the node right-click menu to click Delete, or click the keyboard delete button to delete, to delete the input and output connections of nodes and nodes.