Impact of Distributed-Memory Parallel Processing Approach on Performance Enhancing of Multicomputer-Multicore Systems: A Review

Article History: Received: 2/10/2021 Accepted: 16/11/2021 Published: Autumn 2021 Distributed memory is a term used in computer science to describe a multiprocessor computer system in which each processor has its own private memory. Computational jobs can only work with local data, so if you need remote data, you'll have to communicate with one or more remote processors. Parallel and distributed computing are frequently used together. Distributed parallel computing employs many computing devices to process tasks in parallel, whereas parallel computing on a single computer uses multiple processors to execute tasks in parallel. Distributed systems are designed separately from the core network. There are different kinds of distributed systems such as peer-to-peer (P2P) networks, groups, grids, distributed storage systems. The multicore processor can be classified into two types: homogeneous and heterogeneous. This paper reviews the impact of the distributed-memory parallel processing approach on performance-enhancing of multicomputer


Introduction
A distributed system is a group of standalone computers that appear to their users as one coherent system. Distributed computing uses many remote computers geographically and solves large and complex tasks with high efficiency (Agarwal 2004) (Dino et al. n.d.). However, when deployed through distributor applications today in data centres, it is possible to participate in the design of distributed systems with their own network layer, but can offer significant benefits (Ports et al. 2015). Distributed systems display the best price and performance than concentrate systems(Van Steen and Tanenbaum 2017). Computer power can be added in a tiny boost to distributed systems. Distributed systems allow multiple users to access a shared computing resource, providing resource sharing (Bansal, Sharma, and Trivedi 2011) ). Multicore processors have become popular and have a compact parallel computing power that cannot be completely used unless the program in progress is written accordingly. Writing an effective, scalable parallel program is very complex. In order to get the most out of multi-core processors, many parallels are needed in order to effective implementation of a program on a higher number of cores effectively. Simultaneous execution of several programs on several cores (Bridges et al. 2008) (McCool 2008). Multi-core processors have been around since the last decade, but they gained importance later on due to the limitations of technology that single-core processors face today (Wang et al. 2010) such as high throughput and long battery life with high energy efficiency (Ramanathan 2006). Multicore processors can be

Distributed system
Distributed systems are designed separately from the core network, making the worst assumptions (Ports et al. 2015). Furthermore, the issue of operating systems display in distributed systems proposes new solutions to existing problems (Lu et al. 2016) . Distributed systems allow multiple users to access a shared computing resource, providing resource sharing. Examples of distributed computing include online rail reservation systems, air traffic control, internet banking, etc. (Bansal et al. 2011).

a. Peer-to-Peer (P2P) Distributed Systems
P2P is a class of application that benefits from the storage of resources, courses, content, and human existence that are accessible on the Internet frontier. Since recovering these decentralized resources means working in an unstable connections environment and unpredicted IP addresses, P2P nodes need to work outside of DNS and have worthy or complete independence for intermediate servers (Wang and Li 2003). The existing Peer-to-Peer (P2P) networks can be categorized into three generations: First generation: The first P2P networks, such as napster and gnutella, aim to spread easily and quickly. Second generation: P2P networks typically use DHT technology to achieve better scalability and better query efficiency, and provide load balancing and inevitable search guarantees. Third generation: The recently proposed P2P network aims to provide high flexibility assuming that the node will collapse with some possibility of failure (Fiat and Saia 2002). The structure means that the topology of the P2P network is tightly controlled (such as Mesh (Zhao, Kubiatowicz, and Joseph 2001) (Rowstron and Druschel 2001), Ring (Stoica et al. 2001), d-dimension Torus (Ratnasamy et al. 2001)).

b. Clusters Distributed Systems
A computer group is a group of interconnected computers that are habitually linked using a fast local area network (LAN). Computer clusters are the preferred distributed system construction type because of the top performance-price divided computing ratio. It makes up the rest are Massively Parallel Processors (MPP) and Pleiades systems (Grudenić and Bogunović 2009). It's a local area network with high-speed connections and more common in modern high-performance computing. Number of computers are grouped so that benefiting from an individual resource group. Then, the result of the minimum job will be merged to form the end outcome (Amir et al. 2004). Cluster computing aid organizations raise their computing power by apply generally available standard technology. These hardware and software, known as commodities, can be bought on the market at a comparatively small  (Lakshmanan, Ahamad, and Venkateswaran 2003). A set of computers combination together in such a way that a single resource group makes a group. Then, the result of the minimum job will be merged to form the end outcome (Amir et al. 2004).

c. Grids Distributed Systems
Grid computing is a group of computational resources from a variety of administrative areas used for a usual task, technical, or commercial problem that requires a large number of computer training courses or the need for large amounts of data processing. It is a kind of parallel distributed system that allows you to dynamically distribute, select, and edit independent geographically distributed sources over time, depending on the availability, power, performance, cost, and quality of service requirements for users (Saafan 2009) (Rashid et al. 2018). The broker used in Network Transactions is responsible for assigning tasks. The size of a transition system can vary from a few hundred computers in a large organization to thousands of nodes in many organizations. Smaller networks that are confined to one organization are commonly known as a company within a node, while a larger system is referred to as a joint node (Puttaswamy, Zheng, and Zhao 2008).  (Kaur, Kaur, and Kaur 2013).

d. Distributed Storage Systems
The rapid growth of storage capacity, account resources, and bandwidth combined with the lower cost of storage equipment has increased the popularity of distributed storage systems (Harinath et al. 2015). The main kind of distributed storage systems. In particular, redundant array of inexpensive disks (RAID), central raid, network storage, and Local Area network (LAN). Network-attached storage NAS and storage area network SAN are the four most common distributed storage technologies.

Multicore
Usually, a multi-core processor is a single processor that has several cores on a chip (Wang et al. 2010). This is due to the presence of parallel processing technology that was absent in a single-core processor. There are CPUs with different basic levels two-cores, four-cores, six-cores, eight cores, ten cores, and more (Sondhi and Ganesh n.d.) . The multi-core processor is an integrated circuit (IC) with two or more processors linked to improve performance, reduce energy consumption, and handle multiple tasks more efficiently and simultaneously (Rouse 2013). The multi-core processor can be classified into two types: homogeneous and heterogeneous. Symmetric multiprocessor (SMP) operating systems are usually implemented on monolithic multi-core processors to compute high-performance combinations. On the other hand, heterogeneous multi-core processors, which consist of various cores dedicated to specific applications, are better for embedded systems (Wei et al. 2011). In recent years, CPU designers have moved from a multi-core architecture with multiple processing threads to a clock speed of multi-core processors (Rao, Prasad, and Venkateswarlu 2009). The core can be considered as one healer. The dual-core contains two internal processors, which are produced by a single chip. The quad-core model has four processors, two dual-processors, made on one chip. More cores are useful for a variety of tasks. Single-thread applications can use only one kernel, leaving any other procedure idle. Core i3 processors have four cores, and i5 and i7 has four cores (Rao, Moturi, and KLEF 2018).

Literature Review
V. Sklyarov, et. al. (Sklyarov et al. 2016) in 2016, explored distributed computing systems that can be used efficiently to process frequently requested information in electronic, environmental, medical and biological applications. Pre-processing can be done in very parallel accelerators that are set to reconfigurable devices. The core of the accelerator is a sort/search network that is implemented either in FPGA or in a programmable chip-based system (such as ZYNQ devices). Data is sent to the computer via the high-bandwidth PCI-Express bus. B. Sreevidya, et. al. (Sreevidya, Rajesh, and Mamatha 2018) in 2018, worked on determination of sensor network performance after on-demand accuracy using an RSA-based security scheme that allows fake data to enter the wireless sensor network(WSN) using NS2 emulation. The RSA algorithm applied in the modified scheme provides large system security than the MAC algorithm of the BECAN scheme. The MAC algorithm only denoted system authentication, but the RSA algorithm constructed the system more security with public key encryption technology.

1145
S. Phoemphon, et. al. (Phoemphon, So-In, and Nguyen 2018) in 2018, studied the possible integration of two computing technologies, fuzzy logic (FL) and extreme learning machines (ELMs), with the aim of improving the estimated localization accuracy, taking into account the above factors. Localization is one of the major challenges encountered in wireless sensor networks, especially in the non-attendance of such GPS installation equipment. Unlike ELMs, FL methods provide high accuracy under limited node density and coverage conditions. M. Hossain, et. al. (Hossain et al. 2018) in 2018, addressed the designed network measured latency and throughput compared to the interactive real-time online (ROIA) multi-box schedule and NOX. The main technology of SDN application is data level and management level separation, and network virtualization through programming. The total amount of time a user can respond is called the response time. The transfer rate is the exchange rate at which the network transmits data. A. Rauniyar, et. al. (Rauniyar, Engelstad, and Moen 2018) in 2018, proposed a new distributed localization algorithm is using SL-PSO based social learning for the Internet of Things. With the SLPSO algorithm, the method aims to precisely localize diffuse sensor nodes and reduce computational complexity that will further enhance the lifespan of these sensing nodes with limited resources. Extensive simulations are performed to demonstrate the effective performance of the SL-PSO algorithm in fine localization. Y. Liu, et. al. (Liu et al. 2019) in 2019, proposed a lightweight block-chain system called a lightweight chain that is resource-efficient and suitable for energy Industrial Internet of Things (IIoT) scenarios. In particular, a green agreement mechanism called symmetric multicore processer (SMP) to facilitate collaboration among IIoT devices, a light data structure called light-block (LB) to identify broadcast content. Moreover, they designed a new block-chain to avoid unlimited overhead growth without affecting the block-chain backlog.
F. Al-Wesabi, et. al. (Al-Wesabi, Iskandar, and Ghilan 2019) in 2019, expanded the work of the E-Avala approximation to improving the overall performance of the workstudy by increasing power recovery capabilities. The work-study was executed and

Vol. (6), No (4), Autumn 2021 ISSN 2518-6566 (Online) -ISSN 2518-6558 (Print)
1146 compared with the previous study using square modelling with different configuration parameters. The comparative results show that the proposed study has allowed for better improvement with recycling capabilities at different levels. L. Tan, et. al. (Tan, Zhao, and Zhang 2019) in 2019, demonstrated using iTrace to collect effects on iOS. This creates opportunities for many types of searches that were previously impossible to perform on iOS. The iTrace Ultra Speed also allows for timesensitive analysis. The collected system calls for effects for commonly used applications and summarizes and displays their results from these effects. They hoped to highlight application behaviour and open up areas for further investigation. F. Noor, et. al. (Noor, Ibrahim, and AlKhattab 2020) in 2020, proposed the parallel distributed bat algorithm (PDBA) using the Message Passing Interface (MPI) of the computer group in the programming C language. PDBA time and complexity are determined. Also, the results were presented in terms of speed, efficiency, time completed number of times to perform the fitness function. An algorithm is an optimization algorithm that is workable and efficient to obtain the best approximate solutions to nonlinear problems. M. Bhatia, et. al. (Bhatia, Sood, and Kaur 2020) in 2020, proposed a quantitative approach to the planning of homogeneous functions in fog-based applications. In particular, the scale of a given node is determined by calculating the node computational index to estimate the computational power of the fog computing nodes. Moreover, the QCI neural network model is proposed to optimize the node to handle the real-time mock task. Comparative analysis was performed using advanced scheduling models such as heterogeneous completion time, Min-Max, and Round Robin for comparative analysis to determine performance improvements. K. Warasup, et. al. (Warasup, Hamamura, and Pattaramalai 2020) in 2020, introduced a new MAC protocol that enhances network performance for wireless LAN with Multiple Packet Reception (MPR) capability. The main feature of their proposed protocol is that multiple asynchronous RTS transmissions are allowed. As a result, the method get a higher probability of packet transmission when the network is operating at the  Bianco, et. al. (Bianco 2020) in 2020, discussed the design and development of FPGAbased NIC enabling us to overcome performance barriers bottlenecks and Lack of flexibility in commercial NICs. Network interface cards (NICs) are gaining more and more attention in the research community as they can provide several gigabytes of transmission speeds per packet, a performance similar to low-to-mid-range routers. The effectiveness of the proposed approach and limits was comprehensively discussed. P. Neelima, et. al. (Neelima and Reddy 2020) in 2020, introduced the ADA-based, multi-load balancing method, and this system is being tested through simulation experiments. They have planned an approach of the design to deliver a well-balanced load on virtual machines deliver them on time at the lowest cost. Moreover, the proposed algorithm demonstrates the ADA's ability to improve task scheduling and resource allocation in the cloud computing environment. To assess the performance of the proposed system, three problematic cases are analysed. S. Yousefi, et. al. (Yousefi et al. 2020) in 2020, introduced a new route planning mechanism for collecting data on the Internet. The proposed mechanism includes two basic states. Gather data on the Internet of Things using the idea of mobile software agents. The first step is to integrate IoT devices. The Group leaders assembled groups by assigning mobile agents to a wedge-based operation. The basic purpose of the second period is to provide a path mapping to each mobile factor of group leaders to effectively collect data through the Markov Determination Process (MDP).

Vol. (6), No (4), Autumn 2021 ISSN 2518-6566 (Online) -ISSN 2518-6558 (Print)
1148 F. Shahid, et. al. (Shahid, Khan, and Jeon 2020) in 2020, proposed Distributed Ledger for Internet of Things (DL-for-IoT) addresses the challenge of integrating the two technologies: distributed book and Internet of things. Notable features of DL-for-IoT include transaction hierarchy, floor cuts, lightweight compatibility, and quantitatively secure digital signatures. The basic building block of DL-for-IoT is a new one-time signature (OTS) method called DL-OTS. By comparing DL-OTS with general one-time signature schemes, they concluded that DL-OTS is a compact and high-speed energysaving signature scheme. X. Zhao, et. al. (Zhao et al. 2020) in 2020, designed and developed based on distributed heterogeneous medical data a system to share and merge the distributed medical data and apply it to some internal hospitals. Therefore, a series of smart medical information network platform system is created so that patients can enjoy the service. High quality, safe, and proper diagnosis and dealing are based on little waiting times and vital medical costs. Finally, the results of questionnaires and interviews concluded that they were compared to previous treatments. S. Dhingra, et. al. (Dhingra et al. 2020) in 2020, introduced a cloud fog for real-time analysis of the network of basic cloud services for traffic monitoring base to overcome the clogging latency limits. Therefore, proposed to implement a prototype Smart Traffic Monitoring System (STMS) and it is designed for signal traffic monitoring of overcrowding. It can also be adapted to detect road accidents that require immediate assistance in case of congestion. Within this framework, the small computer on the module acts as a fog node, collecting real-time data from geographically dispersed sensors, moving it to the cloud for storage and processing. Parra, et. al. (Parra et al. 2020) in 2020, proposed cloud distributed deep learning framework for detecting the phishing attacks. The model consists of two main security mechanisms that operate cooperatively: first, a distributed convolutional neural network (DCNN) model included as an add-on partial security for the Internet of Things device to detect phishing and denial of distributed service application (DDoS). Second, a cloud-based Long-Short Term Memory (LSTM) model hosted on the backside to detect Botnet attacks and CNN embedding to detect phishing attacks

Vol. (6), No (4), Autumn 2021 ISSN 2518-6566 (Online) -ISSN 2518-6558 (Print)
1149 distributed across many IoT devices. Distributed CNN model, included in the ML engine of customer's IoT device. Y. Gao, et. al. (Gao et al. 2020) in 2020, evaluated and compared FL and SplitNN IoT settings in the real world in terms of learning performance and overall hardware application. considered a wide variety of data sets, different model structures, multiple clients, and different performance metrics. For learning performance, determined by model accuracy and convergence velocity measures, evaluated the empirically for both FL and SplitNN under different types of data distributions such as unbalanced, non-independent, and identically distributed data (other than IID) Y. Ren, et. al. (Ren et al. 2020) in 2020, introduced the innovative DCOMB (Dual Combination Bloom Filter) method to convert the processing power of bitcoin mining into query processing power. In addition, the use of the DCOMB method to create a block chain based IoT data query model. This model combines the flow of IoT data with a block chain of the timing chain, improving the interoperability and versatility of the data of the IoT database system. W. Yánez, et. al. (Yánez et al. 2020) in 2020, proposed a new context-sensitive mechanism for distributing chain data across IoT blockchain systems. In particular, they developed the data controller based on tasteless logic to calculate the cost allocation value for each data request, taking into account multiple context parameter data network quality and their chain distribution. Moreover, introduced how the design and perception of improves the architectural styles of the two common uses of the Internet of Things (block, fog).

Discussion
This part depends on the details explained in table 1. All the above literature used different techniques in distributed systems and the significant result. For example. The first literature V. Sklyarov, et al. the Scheduling algorithms used and the results of pre-processing, statistical treatment, analysis of existing, acquired groups and data mining. Also, B. Sreevidya, et al. the MAC and RSA algorithm used and the result indicate that schema performance is better than the current schemes that provide On the other hand, the programming language is the crucial part of any paper, and all the above literature uses different programming languages, such as Matlab, Java, Python, C ++. Also, the network is the tower part for the distributed system because the literature shows that if they use a good network and fast internet, they will get a good outcome. Finally, all the above literature used an excellent technique and found a significant result: one is the best and gets high performance for system displayed at the last part of this paper.   (Sklyarov et al. 2016) Scheduling algorithms The results of pre-processing, statistical treatment, analysis of existing, acquired groups and data mining.
32-bit Linux dual-core C language Wired (Sreevidya et al. 2018) MAC and RSA algorithm Results indicate that the schema performance is better than the current schemes that provide either data integrity or sender authentication.

Conclusion
Distributed systems are designed separately from the core network, making the worst assumptions. The multicore processor can be classified into two types: homogeneous and heterogeneous. This paper reviewed of Performance Impact of Distributed-Memory Parallel Processing Approach on Performance Enhancing of Multicomputer-Multicore Systems. Also, number of methods been introduced which used Impact of Distributed-Memory Parallel Processing Approach on Performance Enhancing of Multicomputer-Multicore Systems. Adding to that, the best methods been explained, with focusing on speed for performance enhancing of multicore in the distributed systems. Depending on the details explained in Table I in the discussion section, it can be concluded that the best methods were those which depended by Y. Liu et al. and F. Shahid, et al. Firstly, Liu et al. used an operating system named gun/Linux 4.8.0-36, intel Xeon 2.5, python programming language. The outcome of the proposed method made a time reduction by 39.32% and speeded up the block generation up to 74.06% of individual computational cost. Secondly, F. Shahid et al. used an operating system named Windows 8.1 32-bit Intel core i5. The programming language is python, and result, it can reduce 76% at the time of signature and 48.7% energy savings compared to winternitz-ots scheme.