Saturday, July 30, 2016

[WSO2 DAS] How to Control Spark Log File Generation.


When Spark scripts are executing in the DAS server, you can be see several Spark log files in <DAS_HOME>/work directory. If you execute number of spark scripts often, you might face a disk space issue as there will be a lots of log files.

If you want to allocate a disk space of predefined size for Spark log files, add below configuration to <DAS_home>/repository/conf/analytics/spark/spark-defaults.conf
   
   spark.executor.logs.rolling.strategy size  
   spark.executor.logs.rolling.maxSize 10000000  
   spark.executor.logs.rolling.maxRetainedFiles 10  
   


spark.executor.logs.rolling.maxSize contains the maximum size of disk space (in bytes) allocated for log directory. Older log files are deleted when new logs are generated so that the specified maximum size is not exceeded.

spark.executor.logs.rolling.maxRetainedFiles contains the maximum number of log files allowed to be kept in log directory at any give time.

spark.executor.logs.rolling.strategy contains the strategy used to control the log files. From above configuration, the amount of log files is restricted based on the size of log directory. 

For other related configuration, you can see Spark docs

If you want to change the default log directory location, change spark.worker.dir config of <DAS_home>/repository/conf/analytics/spark/spark-defaults.conf like below.
   
   spark.worker.dir /home/ubuntu/sparkout  
   
Make sure to restart the server to apply above configuration to the server.


No comments:

Post a Comment