Understanding Spark: Part 2: Architecture

  • After introducing the spark in the previous blog, I will try to explain the architecture of the spark in thi blog. The objective is to give an quick overview of various components in spark architecture, what their functinalities and how they enable spark to process large amount of data fast.
  • The assumtion is that the reader must have prior understanding of the map reduce paradigm and some knowedge on Hadoop architecture.

Spark Architecture

1. What are the key components of Spark application?

    Every spark application has two main components
    • One Driver
    • A set of Executors (one or many)
    Driver - Is the coordinator of the spark application and hosts the spark Context object, which is the entry point to the application.
    • Driver negotiates with the external resource managers to provision all required resources for the spark application.
    • Manages the executor tasks.
    • Converts all map reduce operations and create tasks for the execturs to perform.
    • Collects all metrics about the execution of spark application and its components.
  • Executors - are the actual work horses of the spark applications. There might be one or more executors provisioned for a spark applicaiton. Execturos are actually java containers running on physical or virtual machines, which in turn are managed under cluster mangers like YARN or Mesos.
    • Number of executor resources and their capacities in terms of virtual core and RAM must be specified before starting a spark application. (There is an exception to this where resources can be provisioned dynamically).
    • Let's assume that we are using YARN managed cluster.
    • Driver negotiates with the resoruce manager of YARN to provision these resources in the cluster.
    • Then node manger of YARN spawns these processes and then executors are registered ( handed over ) to the driver for control, allocation and coordination of tasks among executors.
  • The following diagram depicts the architecture of spark.

Fig 1: Spark Components: Driver and Executors

    Executors load the external data (for example, files from HDFS) and load onto the memory of executors. For example here two blocks loaded into each executor memory. The in memory representation of these data partitions are called RDD (Resilient Distributed Datasets). Each chuck of data in memory is called partitions. The algorithm is expressed in terms of map reduce stages and driver pushes these map reduce tasks to the executors. Mappers can run in parallel across each RDD partitions in executors. If a reduce operation is assgined, then executors wait until all paritions are completed and proceed for data shuffle. After data shuffle is over, then executors can again run operation in parallel on these shuffled partitions. Finally, the resulting partitions after completion of all map reduce task are saved into an external systems, which is defined in the code submitted to spark. These serializing of resulting partitions can be accomplished in parallel by the executors. As you can see, the executors actually load data in terms of RDD and its partitions and apply operations on those RDD partitions and driver only assignd and coordinates these task with the executors .

2. How the executors are provisioned?

    The number of executors and their capacity in terms of cpu and memory are specified during the submission of the application. Driver then negotiates with the cluster manager e.g. Resource Manager in YARN. Yarn manager finds the best resources to schedule the executors and instructs the node managers to spawn these processes. Once the exectuors are started, then register with the Driver for further assignment and coordination of tasks. The machines (physical or virtual) managed by cluster manager are typically called slaves or workers. The number of executors requested are optimally allocated in available workers. It is possible that some workers might have been assigned more than one executors. Irrespective of wherever the executors are assigned, the capacity requested by the spark application is guaranteed by the YARN resource manager.

3. How data is read into spark application?

    Data can be read into spark application from any external systems. Spark is not tightly coupled with any specific file system or storage systems. Data can be loaded onto spark by two methods. Driver can read data onto a buffer and then parallelize (divide into smaller chunks and send to) to executors. But the amount of data that can be read and processed in this fashion is very limited. Driver can give location of the files in external system and coordinate read of the data by executors directly. For example, which blocks would be read by which executors from HDFS file system.

How map reduce operations are executed optimally in spark?

    All operations are applied on RDD partitions in terms of map or reduce operations. All data analysis logics are expressed in terms of map and reduce operations. An example of map operation would be filtered or selecting data. An example of redue operation would be group by or sort by operations. Here is an example of a series of map and reduce operations.
    • Load data -> map1 -> map2 -> map3 -> reduce1 -> map4 -> reduce2 -> reduce3 -> save results
    Once driver read the sequence of operations, it sends these as tasks to the executors. But it has to coordinate the execution of task to resolve any dependency between the RDD partitions across multiple executors. In this case the first operation is read data and map1. Let's say executor 1 finished map1 operation on P0 partition, before P1 partition and executor 2 finishes the map1 operation on P2 and P3 partitions.
    • Does, the executor need to wait for map1 operation to complete across all partitions, before it start map2 operation?
    The answer is no, as map2 operation is independent of other partition data, so executor can proceed with map2 operation. The only time, executors need to wait before proceeding further is when there is a reduce operation. As reduce operation will depend on the data across all paritions. The data need to shuffled across exectuors before reduce operation can be applied. Driver understands this dependencies, given a sequence of map reduce tasks and then combine these operations into stages. Each stage can be processes in parallel across executors, but need to wait for all executors before proceeding to next stage. So, given the above sequence, driver divides the task into four stages as below.
      Stage 1: load -> map1 -> map2 -> map3 Stage 2: reduce1 -> map4 Stage 3: reduce2 Stage 4: reduce3 -> save

  • The diagram above depicts the stages created by driver and executed by executors.
  • Not only the stages are executed in parallel, they can be done in parallel with an executor. Each Executor may have multiple paritions loaded onto their memory and can process these stages in parallel across partitions within the same executor. Processing the partitions in parallel is calle tasks.
  • But, to process partitions in parallel the executor should start multiple threads. And these threads can run in parallell in true sense, only if the executors have access to multiple CPUs.
  • So, each executor should be allocated with multiple CPUs or cores, if we intend to run the task in parallel.

Conclusion:

  • In this blog, we delved into spark architecture quickly to undestand its components and their internal workings. In the next blog, we will dive more deeper to understand how spark manages memory and when it actually it evaluates and executes tasks.