Abstract The next generation of tokamaks, e.g. ITER, will have extremely large data collection rates significantly larger than those experienced today in present tokamaks, with consequential new challenges in data… Click to show full abstract
Abstract The next generation of tokamaks, e.g. ITER, will have extremely large data collection rates significantly larger than those experienced today in present tokamaks, with consequential new challenges in data management, data analysis and integrated modelling. One of these challenges is to ensure that appropriate data is efficiently made available when it is required and where it is consumed. Data volumes with limited network capabilities mean not all data can be distributed in time when a data-object is requested. One possible solution is to preemptively identify and distribute efficiently the data across the storage services before a user or an application requests it. Preemptive data distribution rely on analysis of historical access patterns to identify a set of rules whereby following a data-object request the most probable set of next requests can be inferred. Implementation of these rules requires the inferred sets of data to be moved close to data-object consumer. The work presented will describe the Apache Spark Machine Learning tools, the results of the analysis, and an implementation of the preemptive distribution experimental platform at CCFE, together with plans for its future integration and testing on the upcoming SAGE platform.
               
Click one of the above tabs to view related content.