-
Notifications
You must be signed in to change notification settings - Fork 30
Open
Labels
Description
We have a data generator for a KMeans benchmark and want to use it with the PEEL framework.
The generator produces 2 files, points and centers and run as a flink job. We want to save these files in <hdfs-root-directory >/kmeans using the GeneratedDataSet class and then pick these files with the KMeans flink job.
My question is: How can we configure PEEL to create the directory kmeans in HDFS and then copy the files to that directory? With our current configuration shown below that does not work.
<!--************************************************************************
* Data Generators
*************************************************************************-->
<bean id="datagen.kmeans" class="org.peelframework.flink.beans.job.FlinkJob">
<constructor-arg name="runner" ref="flink-1.0.3"/>
<constructor-arg name="command">
<value><![CDATA[
-v -c org.apache.flink.examples.java.clustering.util.KMeansDataGenerator \
${app.path.datagens}/KMeans.jar \
--points ${datagen.points} \
--k ${datagen.k} \
--output ${system.hadoop-2.path.input}/kmeans
]]>
</value>
</constructor-arg>
</bean>
<!--************************************************************************
* Data Sets
*************************************************************************-->
<bean id="dataset.kmeans.generated" class="org.peelframework.core.beans.data.GeneratedDataSet">
<constructor-arg name="src" ref="datagen.kmeans"/>
<constructor-arg name="dst" value="${system.hadoop-2.path.input}/kmeans"/>
<constructor-arg name="fs" ref="hdfs-2.7.1"/>
</bean>The usage of our data generator is similar to the WordGenetator except that it produces 2 files instead of just one.
Do you have an idea how we could solve this problem with PEEL or do we have to adjust our data generator?
Thanks!