Cloud Computing Unit - 3 Final
Cloud Computing Unit - 3 Final
Parallel and Distributed Programming Paradigms – Map Reduce, Twister and Iterative Map
Reduce – CGL– Map Reduce – Programming models for Aneka –Hadoop Library from Apache
– Mapping Applications – Programming Support –Google App Engine, Amazon AWS – Cloud
Software Environments –Eucalyptus,Open Nebula, Open Stack, CloudSim – SAP Labs – EMC – Sales
force – VMware.
Parallel Computing:
The systems that support parallel computing can have a shared memory or distributed memory.
➢ In shared memory systems, all the processors share the memory.
➢ In distributed memory systems, memory is divided among the processors.
➢ There are multiple advantages to parallel computing.
➢ As there are multiple processors working simultaneously, it increases the CPU utilization and
improves the performance.
➢ Moreover, failure in one processor does not affect the functionality of other processors.
➢ Therefore, parallel computing provides reliability.
➢ On the other hand, increasing processors is costly.
➢ Furthermore, if one processor requires instructions of another, the processor might cause
latency.
➢ It allows scalability and makes it easier to share resources easily. It also helps to perform
computation tasks efficiently.
➢ On the other hand, it is difficult to develop distributed systems.
➢ Moreover, there can be network issues.
Difference Between Parallel and Distributed Computing
Definition
➢ Parallel computing is a type of computation in which many calculations or execution of
processes are carried out simultaneously.
➢ Whereas, a distributed system is a system whose components are located on different
networked computers which communicate and coordinate their actions by passing messages to
one another.
➢ Thus, this is the fundamental difference between parallel and distributed computing.
Number of computers
➢ The number of computers involved is a difference between parallel and distributed computing.
➢ Parallel computing occurs in a single computer whereas distributed computing involves
multiple computers.
Functionality
➢ In parallel computing, multiple processors execute multiple tasks at the same time.
➢ However, in distributed computing, multiple computers perform tasks at the same time.
Memory
➢ Moreover, memory is a major difference between parallel and distributed computing.
➢ In parallel computing, the computer can have a shared memory or distributed memory.
➢ In distributed computing, each computer has its own memory.
Communication
➢ Also, one other difference between parallel and distributed computing is the method of
communication.
➢ In parallel computing, the processors communicate with each other using a bus.
➢ In distributed computing, computers communicate with each other via the network.
Usage
What is MapReduce?
➢ A MapReduce is a data processing tool which is used to process the data parallelly in a distributed
form.
➢ It was developed in 2004, on the basis of paper titled as "MapReduce: Simplified Data Processing
on Large Clusters," published by Google.
➢ The MapReduce is a paradigm which has two phases, the mapper phase, and the reducer phase.
➢ In the Mapper, the input is given in the form of a key-value pair.
➢ The output of the Mapper is fed to the reducer as input.
➢ The reducer runs only after the Mapper is over. The reducer too takes input in key-value format,
and the output of reducer is the final output.
Steps in Map Reduce
o The map takes data in the form of pairs and returns a list of <key, value> pairs. The keys will not
be unique in this case.
o Using the output of Map, sort and shuffle are applied by the Hadoop architecture. This sort and
shuffle acts on these list of <key, value> pairs and sends out unique keys and a list of values
associated with this unique key <key, list(values)>.
o An output of sort and shuffle sent to the reducer phase. The reducer performs a defined function
on a list of values for unique keys, and Final output <key, value> will be stored/displayed.
Sort and Shuffle
➢ The sort and shuffle occur on the output of Mapper and before the reducer.
➢ When the Mapper task is complete, the results are sorted by key, partitioned if there are multiple
reducers, and then written to disk.
➢ Using the input from each Mapper <k2,v2>, we collect all the values for each unique key k2.
➢ This output from the shuffle phase in the form of <k2, list(v2)> is sent as input to reducer phase.
Usage of MapReduce
o It can be used in various application like document clustering, distributed sorting, and web link-
graph reversal.
o It can be used for distributed pattern-based searching.
o We can also use MapReduce in machine learning.
o It was used by Google to regenerate Google's index of the World Wide Web.
o It can be used in multiple computing environments such as multi-cluster, multi-core, and
mobile environment.
TWISTER ARCHITECTURE
The messaging infrastructure in twister is called broker network and it is responsible to perform
data transfer using publish/subscribe messaging.
Access Data
1. To access input data for map task it either reads dta from the local disk of the worker nodes.
2. Receive data directly via the broker network.
They keep all data read as file and having data as native file allows Twister to pass data directly
to any executable.
Additionally they allow tool to perform typical file operations like
(i) create directories, (ii) delete directories, (iii) distribute input files across worker nodes, (iv)
copy a set of resources/input files to all worker nodes, (v) collect output files from the worker
nodes to a given location, and (vi) create partition-file for a given set of data that is distributed
across the worker nodes.
Intermediate Data
The intermediate data are stored in the distributed memory of the worker node. Keeping the
map output in distributed memory enhances the speed of the computation by sending the
output of the map from these memory to reduces.
Messaging
Fault Tolerance
There are three assumption for for providing fault tolerance for iterative mapreduce:
(i) failure of master node is rare adn no support is provided for that.
(ii) Independent of twister runtime the communication network can be made fault tolerant.
(iii) the data is replicated among the nodes of the computation infrastructure. Based on these
assumptions we try to handle failures of map/reduce tasks, daemons, and worker nodes
failures.
Why Iterative
The MapReduce framework like Hadoop and Dryad has been very successful in fulfilling the
need of the people to analyze huge files and compute data intensive problems.
➢ Although it takes care of many problems but many data analysis techniques require
iterative computations,
➢ Including PageRank , HITS (Hypertext-Induced Topic Search) , recursive relational
queries, clustering, neural-network analysis, social network analysis, and network
traffic analysis.
These techniques have a common trait: data are processed iteratively until the computation
satisfies a convergence or stopping condition.
➢ Most of the iterative algorithm are run once and then output is operated with initial
output to generate the required result.
➢ This type of program terminates only when fixed output is reached i.e the result
does not changes from one iteration to another.
➢ The MapReduce framework does not directly support these iterative data analysis
applications.
➢ Instead, programmers must implement iterative programs by manually issuing
multiple MapReduce jobs and orchestrating their execution using a driver program .
in which the data flow takes the form of a directed acyclic graph of operators. These
platforms lack built-in support for iterative programs.
o Aneka includes an extensible set of APIs associated with programming models like
MapReduce.
o These APIs support different cloud models like a private, public, hybrid Cloud.
o Microsoft focuses on creating innovative software technologies to simplify the development
and deployment of private or public cloud applications.
o Our product plays the role of an application platform as a service for multiple cloud
computing.
o Multiple Structures:
o Aneka is a software platform for developing cloud computing applications
o In Aneka, cloud applications are executed.
O Aneka is a pure PaaS solution for cloud computing.
O Aneka is a cloud middleware product.
O Manya can be deployed over a network of computers, a multicore server, a data center, a
virtual cloud infrastructure, or a combination there
Multiple containers can be classified into three major categories:
1. Textile Services:
Fabric Services defines the lowest level of the software stack that represents multiple containers.
They provide access to resource-provisioning subsystems and monitoring features implemented in
many.
2. Foundation Services:
Fabric Services are the core services of Manya Cloud and define the infrastructure management
features of the system. Foundation services are concerned with the logical management of a
distributed system built on top of the infrastructure and provide ancillary services for delivering
applications.
3. Application Services:
Application services manage the execution of applications and constitute a layer that varies
according to the specific programming model used to develop distributed applications on top of
Aneka.
The SDK (Software Development Kit) includes the Application Programming Interface (API)
and tools needed for the rapid development of applications. The Aneka API supports three popular
cloud programming models: Tasks, Threads and MapReduce;
A runtime engine and platform for managing the deployment and execution of applications on a
private or public cloud.
One of the notable features of Aneka Pass is to support the provision of private cloud resources
from desktop, cluster to a virtual data center using VMware, Citrix Zen Server, and public cloud
resources such as Windows Azure, Amazon EC2, and GoGrid cloud service.
Aneka's potential as a Platform as a Service has been successfully harnessed by its users and
customers in three different areas, including engineering, life sciences, education, and business
intelligence.
Architecture of Aneka
➢ Aneka is a platform and framework for developing distributed applications on the Cloud.
➢ It uses desktop PCs on-demand and CPU cycles in addition to a heterogeneous network of
servers or datacenters.
➢ It can be a public cloud available to anyone via the Internet or a private cloud formed by
nodes with restricted access.
➢ A multiplex-based computing cloud is a collection of physical and virtualized resources
connected via a network, either the Internet or a private intranet. Aneka provides a rich set of
APIs for developers to transparently exploit such resources and express the business logic of
applications using preferred programming abstractions.
➢ System administrators can leverage a collection of tools to monitor and control the deployed
infrastructure
➢ Each resource hosts an instance of multiple containers that represent the runtime environment
where distributed applications are executed.
➢ The container provides the basic management features of a single node and takes advantage
of all the other functions of its hosting services.
➢ Services are divided into clothing, foundation, and execution services.
➢ Foundation services identify the core system of Anka middleware, which provides a set of
infrastructure features to enable Anka containers to perform specific and specific tasks.
➢ Fabric services interact directly with nodes through the Platform Abstraction Layer (PAL)
and perform hardware profiling and dynamic resource provisioning.
➢ Execution services deal directly with scheduling and executing applications in the Cloud.
➢ One of the key features of Aneka is its ability to provide a variety of ways to express
distributed applications by offering different programming models;
➢ Execution services are mostly concerned with providing middleware with the
implementation of these models.
➢ Additional services such as persistence and security are inverse to the whole stack of services
hosted by the container.
➢ At the application level, a set of different components and tools are provided to
A common deployment of Aneka is presented on the side. If the deployment identifies a private
cloud, all resources are in-house, for example, within the enterprise.
Hadoop is an Apache open source framework written in java that allows distributed processing of
large datasets across clusters of computers using simple programming models.
The Hadoop framework application works in an environment that
provides distributed storage and computation across clusters of computers.
Hadoop is designed to scale up from single server to thousands of machines, each offering local
computation and storage.
Hadoop Architecture
MapReduce
MapReduce is a parallel programming model for writing distributed applications devised at Google
for efficient processing of large amounts of data (multi-terabyte data-sets),
On large clusters (thousands of nodes) of commodity hardware in a reliable, fault-tolerant manner.
The MapReduce program runs on Hadoop which is an Apache open-source framework.
The Hadoop Distributed File System (HDFS) is based on the Google File System (GFS) and
provides a distributed file system that is designed to run on commodity hardware.
It has many similarities with existing distributed file systems. However, the differences from other
distributed file systems are significant.
It is highly fault-tolerant and is designed to be deployed on low-cost hardware.
It provides high throughput access to application data and is suitable for applications having large
datasets.
Apart from the above-mentioned two core components, Hadoop framework also includes the
following two modules −
• Hadoop Common − These are Java libraries and utilities required by other Hadoop
modules.
• Hadoop YARN − This is a framework for job scheduling and cluster resource management.
It is quite expensive to build bigger servers with heavy configurations that handle large scale
processing, but as an alternative,
You can tie together many commodity computers with single-CPU, as a single functional
distributed system and practically, the clustered machines can read the dataset in parallel and
provide a much higher throughput.
Moreover, it is cheaper than one high-end server. So this is the first motivational factor behind
using Hadoop that it runs across clustered and low-cost machines.
Hadoop runs code across a cluster of computers. This process includes the following core tasks that
Hadoop performs −
• Data is initially divided into directories and files. Files are divided into uniform sized blocks
of 128M and 64M (preferably 128M).
• These files are then distributed across various cluster nodes for further processing.
• HDFS, being on top of the local file system, supervises the processing.
• Blocks are replicated for handling hardware failure.
• Checking that the code was executed successfully.
• Performing the sort that takes place between the map and reduce stages.
• Sending the sorted data to a certain computer. Writing the debugging logs for each job.
Advantages of Hadoop
• Hadoop framework allows the user to quickly write and test distributed systems. It is
efficient, and it automatic distributes the data and work across the machines and in turn,
utilizes the underlying parallelism of the CPU cores.
• Hadoop does not rely on hardware to provide fault-tolerance and high availability (FTHA),
rather Hadoop library itself has been designed to detect and handle failures at the
application layer.
• Servers can be added or removed from the cluster dynamically and Hadoop continues to
operate without interruption.
• Another big advantage of Hadoop is that apart from being open source, it is compatible on
all the platforms since it is Java based.
➢ Application mapping refers to the process of identifying and mapping interactions and
relationships between applications and the underlying infrastructure.
➢ An application, or network map, visualizes the devices on a network and how they are
related.
➢ It gives users a sense of how the network performs in order to run analysis and avoid data
bottlenecks.
➢ For containerized applications, it depicts the dynamic connectivities and interactions
between the microservices.
What is Application Mapping?
• SNMP-Based Maps — Simple Network Management Protocol (SNMP) monitors the health
of computer and network equipment such as routers. An SNMP-based map uses data from
routers to switch management information bases (MIBs).
• Active Probing — Creates a map with data from packets that report IP router and switch
forwarding paths to the destination address. The maps are used to find “peering links” between
Internet Service Providers (ISPs). The peering links allow ISPs to exchange customer traffic.
• Visibility – locate where exactly applications are running and plan accordingly for system
failures
IT personnel use app maps to conceptualize the relationships between devices and transport
layers that provide network services. Using the application map, IT can monitor network
statuses, identify data bottlenecks, and troubleshoot when necessary.
Application owners and operations team use app maps to conceptualize the relationships between
software components and application services. Using the application map, DevOps team can
monitor application health, identify security policy breaches, and troubleshoot when necessary.
An application map (see image below) provides visual insights into inter-app communications in
a container-based microservices application deployment. It captures the complex relationships of
containers. An application map can graph the latency, connections, and throughput information
of microservice relationships.
GOOGLE APP ENGINE :
➢ A scalable runtime environment, Google App Engine is mostly used to run Web
applications.
➢ These dynamic scales as demand change over time because of Google’s vast computing
infrastructure.
➢ Because it offers a secure execution environment in addition to a number of services, App
Engine makes it easier to develop scalable and high-performance Web apps.
➢ Google’s applications will scale up and down in response to shifting demand.
➢ Croon tasks, communications, scalable data stores, work queues, and in-memory caching
are some of these services.
After creating a Cloud account, you may Start Building your App
• Using the Go template/HTML package
• Python-based webapp2 with Jinja2
• PHP and Cloud SQL using Java’s Maven
➢ The app engine runs the programmers on various servers while “sandboxing” them.
➢ The app engine allows the program to use more resources in order to handle increased
demands.
➢ The app engine powers programs like Snapchat, Rovio, and Khan Academy.
To create an application for an app engine, you can use Go, Java, PHP, or Python. You can develop
and test an app locally using the SDK’s deployment toolkit. Each language’s SDK and nun time
are unique. Your program is run in a:
• Java Run Time Environment version 7
• Python Run Time environment version 2.7 PHP runtime’s PHP 5.4 environment
• Go runtime 1.2 environment
➢ These are protected by the service-level agreement and depreciation policy of the app
engine. The implementation of such a feature is often stable, and any changes made to it
are backward-compatible.
➢ These include communications, process management, computing, data storage, retrieval,
and search, as well as app configuration and management.
➢ Features like the HRD migration tool, Google Cloud SQL, logs, datastore, dedicated
Memcached, blob store, Memcached, and search are included in the categories of data
storage, retrieval, and search.
Features in Preview
➢ In a later iteration of the app engine, these functions will undoubtedly be made broadly
accessible.
➢ However, because they are in the preview, their implementation may change in ways that
are backward-incompatible.
➢ Sockets, MapReduce, and the Google Cloud Storage Client Library are a few of them.
Experimental Features
➢ These might or might not be made broadly accessible in the next app engine updates.
They might be changed in ways that are irreconcilable with the past.
➢ The “trusted tester” features, however, are only accessible to a limited user base and
require registration in order to utilize them.
➢ The experimental features include Prospective Search, Page Speed, OpenID,
Restore/Backup/Datastore Admin, Task Queue Tagging, MapReduce, and Task Queue
REST API.
➢ App metrics analytics, datastore admin/backup/restore, task queue tagging, MapReduce,
task queue REST API, OAuth, prospective search, OpenID, and Page Speed are some of
the experimental features.
Third-Party Services
➢ As Google provides documentation and helper libraries to expand the capabilities of the
app engine platform, your app can perform tasks that are not built into the core product
you are familiar with as app engine.
➢ To do this, Google collaborates with other organizations. Along with the helper libraries,
the partners frequently provide exclusive deals to app engine users
Advantages of Google App Engine :
The Google App Engine has a lot of benefits that can help you advance your app ideas. This
comprises:
1. Infrastructure for Security: The Internet infrastructure that Google uses is arguably the
safest in the entire world. Since the application data and code are hosted on extremely secure
servers, there has rarely been any kind of illegal access to date.
2. Faster Time to Market: For every organization, getting a product or service to market
quickly is crucial. When it comes to quickly releasing the product, encouraging the
development and maintenance of an app is essential. A firm can grow swiftly with Google
Cloud App Engine’s assistance.
3. Quick to Start: You don’t need to spend a lot of time prototyping or deploying the app to
users because there is no hardware or product to buy and maintain.
4. Easy to Use: The tools that you need to create, test, launch, and update the applications are
included in Google App Engine (GAE).
5. Rich set of APIs & Services: A number of built-in APIs and services in Google App Engine
enable developers to create strong, feature-rich apps.
6. Scalability: This is one of the deciding variables for the success of any software. When using
the Google app engine to construct apps, you may access technologies like GFS, Big Table,
and others that Google uses to build its own apps.
7. Performance and Reliability: Among international brands, Google ranks among the top
ones. Therefore, you must bear that in mind while talking about performance and reliability.
8. Cost Savings: To administer your servers, you don’t need to employ engineers or even do it
yourself. The money you save might be put toward developing other areas of your company.
9. Platform Independence: Since the app engine platform only has a few dependencies, you
can easily relocate all of your data to another environment.
What is AWS?
o AWS stands for Amazon Web Services. The AWS service is provided by the Amazon that uses
Uses of AWS
o A small manufacturing organization uses their expertise to expand their business by leaving
their IT management to the AWS.
o A large enterprise spread across the globe can utilize the AWS to deliver the training to the
distributed workforce.
o An architecture consulting company can use AWS to get the high-compute rendering of
construction prototype. o A media company can use the AWS to provide different types of
content such as ebox or audio files to the worldwide files.
Pay-As-You-Go
Based on the concept of Pay-As-You-Go, AWS provides the services to the customers.
AWS provides services to customers when required without any prior commitment or upfront
investment.
o Computing
oProgramming models
oDatabase storage
oNetworking
Advantages of AWS
1) Flexibility
o We can get more time for core business tasks due to the instant availability of new features
and services in AWS.
o It provides effortless hosting of legacy applications. AWS does not require learning new
technologies and migration of applications to the AWS provides the advanced computing
and efficient storage.
o AWS also offers a choice that whether we want to run the applications and services together
or not. We can also choose to run a part of the IT infrastructure in AWS and the remaining
part in data centres.
2) Cost-effectiveness
AWS requires no upfront investment, long-term commitment, and minimum expense when
compared to traditional IT infrastructure that requires a huge investment.
3) Scalability/Elasticity
➢ Through AWS, autoscaling and elastic load balancing techniques are automatically scaled up
or down, when demand increases or decreases respectively.
➢ AWS techniques are ideal for handling unpredictable or very high loads.
➢ Due to this reason, organizations enjoy the benefits of reduced cost and increased user
satisfaction.
4) Security
EUCALYPTUS :
➢ The open-source cloud refers to software or applications publicly available for the users in the
➢ Eucalyptus is a Linux-based open-source software architecture for cloud computing and also a
➢ Eucalyptus was designed to provide services compatible with Amazon’s EC2 cloud and Simple
Storage Service(S3).
Eucalyptus Architecture :
➢ Eucalyptus CLIs can handle Amazon Web Services and their own private instances.
➢ Clients have the independence to transfer cases from Eucalyptus to Amazon Elastic Cloud.
➢ The virtualization layer oversees the Network, storage, and Computing.
➢ Occurrences are isolated by hardware virtualization.
Important Features are:-
1. Images: A good example is the Eucalyptus Machine Image which is a module software bundled
and uploaded to the Cloud.
2. Instances: When we run the picture and utilize it, it turns into an instance.
3. Networking: It can be further subdivided into three modes: Static mode(allocates IP address to
instances),System mode(assigns a MAC address and imputes the instance’s network interface to
the physical network via NC)and Managed mode(achieves local network of instances).
4. Access Control: It is utilized to give limitations to clients.
5. Elastic Block Storage: It gives block-level storage volumes to connect to an instance.
6. Auto-scaling and Load Adjusting: It is utilized to make or obliterate cases or administrations
dependent on necessities.
Components of Architecture
• Node Controller is the lifecycle of instances running on each node. Interacts with the operating
system, hypervisor, and Cluster Controller. It controls the working of VM instances on the host
machine.
• Cluster Controller manages one or more Node Controller and Cloud Controller simultaneously. It
gathers information and schedules VM execution.
• Storage Controller (Walrus) Allows the creation of snapshots of volumes. Persistent block
storage over VM instances. Walrus Storage Controller is a simple file storage system. It stores
images and snapshots. Stores and serves files using S3(Simple Storage Service) APIs.
• Cloud Controller Front-end for the entire architecture. It acts as a Complaint Web Services to
client tools on one side and interacts with the rest of the components on the other side.
• Managed Mode:
Numerous security groups to users as the network is large. Each security group is assigned a
set or a subset of IP addresses. Ingress rules are applied through the security groups specified
by the user. The network is isolated by VLAN between Cluster Controller and Node Controller.
Assigns two IP addresses on each virtual machine.
The root user on the virtual machine can snoop into other virtual machines running on the
same network layer. It does not provide VM network isolation.
• System Mode:
Simplest of all modes, least number of features. A MAC address is assigned to a virtual
machine instance and attached to Node Controller’s bridge Ethernet device.
• Static Mode:
Similar to system mode but has more control over the assignment of IP address. MAC
address/IP address pair is mapped to static entry within the DHCP server. The next set of
MAC/IP addresses is mapped.
Advantages Of The Eucalyptus Cloud
1. Eucalyptus can be utilized to benefit both the eucalyptus private cloud and the eucalyptus
public cloud.
2. Examples of Amazon or Eucalyptus machine pictures can be run on both clouds.
3. Its API is completely similar to all the Amazon Web Services.
4. Eucalyptus can be utilized with DevOps apparatuses like Chef and Puppet.
5. Although it isn’t as popular yet but has the potential to be an alternative to OpenStack and
CloudStack.
6. It is used to gather hybrid, public and private clouds.
7. It allows users to deliver their own data centers into a private cloud and hence, extend the
services to other organizations.
OPEN NEBULA :
➢ OpenNebula is a free and open source software solution for building clouds and for data
centre virtualisation.
➢ It is based on open technologies and is distributed under the Apache License 2.
➢ OpenNebula has features for scalability, integration, security and accounting.
➢ It offers cloud users and administrators a choice of interfaces.
➢ OpenNebula is an open source platform for constructing virtualised private, public and
hybrid clouds.
➢ It is a simple yet feature-rich, flexible solution to build and manage data centre
virtualisation and enterprise clouds.
➢ So, with OpenNebula, virtual systems can be administered and monitored centrally on
different Hyper-V and storage systems.
➢ When a component fails, OpenNebula takes care of the virtual instances on a different
host system.
➢ The integration and automation of an existing heterogeneous landscape is highly flexible
without further hardware investments.
Benefits of OpenNebula:
The plurality of support to Hyper-V and platform-independent architecture makes OpenNebula
the ideal solution for heterogeneous computing centre environments.
• It is 100 per cent open source and offers all the features in one edition.
• It provides control via the command line or Web interface, which is ideal for a variety of user
groups and needs.
• OpenNebula is available for all major Linux distributions, thus simplifying installation.
• The long-term use of OpenNebula in large scale production environments has proven its stability
and flexibility.
• OpenNebula is interoperable and supports OCCI (Open Cloud Computing Interface) and AWS
(Amazon Web Services).
By using the OpenNebula CLI or Web interface, you can keep track of activities at any time.
There is a central directory service through which you can add new users, and those users can
be individually entitled. Managing systems, configuring new virtual systems or even targeting
the right users and groups is easy in OpenNebula.
OpenNebula not only takes care of the initial provisioning, but the high availability of its
cloud environment is much better compared to other cloud solutions. Of course, the central
OpenNebula services can be configured for high availability, but this is not absolutely
necessary. All systems continue to operate in their original condition and are automatically
included in the restored availability of the control processes.
In virtual environments, one lacks the ability to directly access the system when there are
operational problems or issues with the device. Here, OpenNebula offers an easy solution —
using the browser, one can access the system console of the host system with a VNC
integrated server.
All host and guest systems are constantly monitored in OpenNebula, which keeps the host
and VM dashboards up to date at all times. Depending on the configuration, a virtual machine
is to be restarted in case of the host system failing or if migrating to a different system. If a
data store is used with parallel access, the systems can of course be moved, while in
operation, on to other hardware. The maintenance window can be minimised and can often be
completely avoided.
• Open standards
OpenNebula is 100 per cent open source under the Apache License. By supporting open
standards such as OCCI and a host of other open architecture, OpenNebula provides the
security, scalability and freedom of a reliable cloud solution without vendor lock-in, which
involves considerable support and follow-up costs.
Figure 2: OpenNebula architecture
• APIs and interfaces: These are used to manage and monitor OpenNebula components. |To
manage physical and virtual resources, they work as an interface.
• Users and groups: These support authentication, and authorise individual users and groups
with the individual permissions.
• Hosts and VM resources: These are a key aspect of a heterogeneous cloud that is managed
and monitored, e.g., Xen, VMware.
• Storage components: These are the basis for centralised or decentralised template
repositories.
• Network components: These can be managed flexibly. Naturally, there is support for VLANs
and Open vSwitch.
The front-end
• The machine that has OpenNebula installed on it is known as the front-end machine, which is
also responsible for executing OpenNebula services.
• The front-end needs to have access to the image repository and network connectivity to each
node.
• It requires Ruby 1.8.7 or above.
• OpenNebula’s services are listed below:
1. Management daemon (Oned) and scheduler (mm_sched)
2. Monitoring and accounting daemon (Onecctd)
3. Web interface server (Sunstone)
4. Cloud API servers (EC2- query or OCCI) Virtualisation hosts
• To run the VMs, we require some physical machines, which are called hosts.
• The virtualisation sub-system is responsible for communicating with the hypervisor and
taking the required action for any node in the VM life cycle.
• During the installation, the admin account should be enabled to execute commands with root
privileges.
Storage
Data stores are used to handle the VM images, and each data store must be accessible by the
front-end, using any type of storage technology.
• File data store – used to store the plain files (not disk images)
• The image data store type depends on the storage technology used. There are three types of
image data stores available:
• LVM – reduces the overhead of having the file system in place; the LVM is used to store
virtual images instead of plain files
Networking :
There must be at least two physical networks configured in OpenNebula:
• Service network – to access the hosts to monitor and manage hypervisors, and to move VM
images.
• Instance network – to offer network connectivity between the VMs across the different hosts.
Whenever any VM gets launched, OpenNebula will connect its network interfaces to the
bridge described in the virtual network definition.
OPEN STACK :
Introduction to OpenStack
➢ OpenStack lets users deploy virtual machines and other instances that handle different tasks
for managing a cloud environment on the fly.
➢ It makes horizontal scaling easy, which means that tasks that benefit from running
concurrently can easily serve more or fewer users on the fly by just spinning up more
instances.
➢ For example, a mobile application that needs to communicate with a remote server might
be able to divide the work of communicating with each user across many different
instances, all communicating with one another but scaling quickly and easily as the
application gains more users.
➢ And most importantly, OpenStack is open source software, which means that anyone who
chooses to can access the source code, make any changes or modifications they need, and
freely share these changes back out to the community at large.
➢ It also means that OpenStack has the benefit of thousands of developers all over the world
working in tandem to develop the strongest, most robust, and most secure product that they
can.
➢ The cloud is all about providing computing for end users in a remote environment, where
the actual software runs as a service on reliable and scalable servers rather than on each
end-user's computer.
➢ Cloud computing can refer to a lot of different things, but typically the industry talks about
running different items "as a service"—software, platforms, and infrastructure. OpenStack
falls into the latter category and is considered Infrastructure as a Service (IaaS).
➢ Providing infrastructure means that OpenStack makes it easy for users to quickly add new
instance, upon which other cloud components can run.
➢ Typically, the infrastructure then runs a "platform" upon which a developer can create
software applications that are delivered to the end users.
OpenStack is made up of many different moving parts. Because of its open nature, anyone can add
additional components to OpenStack to help it to meet their needs.
But the OpenStack community has collaboratively identified nine key components that are a part
of the "core" of OpenStack, which are distributed as a part of any OpenStack system and officially
maintained by the OpenStack community.
• Nova is the primary computing engine behind OpenStack. It is used for deploying and
managing large numbers of virtual machines and other instances to handle computing tasks.
• Swift is a storage system for objects and files. Rather than the traditional idea of a referring to
files by their location on a disk drive, developers can instead refer to a unique identifier
referring to the file or piece of information and let OpenStack decide where to store this
information. This makes scaling easy, as developers don’t have the worry about the capacity
on a single system behind the software. It also allows the system, rather than the developer, to
worry about how best to make sure that data is backed up in case of the failure of a machine or
network connection.
• Cinder is a block storage component, which is more analogous to the traditional notion of a
computer being able to access specific locations on a disk drive. This more traditional way of
accessing files might be important in scenarios in which data access speed is the most
important consideration.
• Neutron provides the networking capability for OpenStack. It helps to ensure that each of the
components of an OpenStack deployment can communicate with one another quickly and
efficiently.
• Horizon is the dashboard behind OpenStack. It is the only graphical interface to OpenStack,
so for users wanting to give OpenStack a try, this may be the first component they actually
“see.” Developers can access all of the components of OpenStack individually through an
application programming interface (API), but the dashboard provides system administrators a
look at what is going on in the cloud, and to manage it as needed.
• Keystone provides identity services for OpenStack. It is essentially a central list of all of the
users of the OpenStack cloud, mapped against all of the services provided by the cloud, which
they have permission to use. It provides multiple means of access, meaning developers can
easily map their existing user access methods against Keystone.
• Glance provides image services to OpenStack. In this case, "images" refers to images (or
virtual copies) of hard disks. Glance allows these images to be used as templates when
deploying new virtual machine instances.
• Ceilometer provides telemetry services, which allow the cloud to provide billing services to
individual users of the cloud. It also keeps a verifiable count of each user’s system usage of
each of the various components of an OpenStack cloud. Think metering and usage reporting.
• Heat is the orchestration component of OpenStack, which allows developers to store the
requirements of a cloud application in a file that defines what resources are necessary for that
application. In this way, it helps to manage the infrastructure needed for a cloud service to run.
CLOUD SIM :
➢ It is used for modelling and simulating a cloud computing environment as a means for
evaluating a hypothesis prior to software development in order to reproduce tests and results.
➢ For example, if you were to deploy an application or a website on the cloud and wanted to
test the services and load that your product can handle and also tune its performance
to overcome bottlenecks before risking deployment, then such evaluations could be performed by simply
coding a simulation of that environment with the help of various flexible and scalable classes provided by the
CloudSim package, free of cost.
• Open source and free of cost, so it favours researchers/developers working in the field.
• Easy to download and set-up.
• It is more generalized and extensible to support modelling and experimentation.
• Does not require any high-specs computer to work on.
• Provides pre-defined allocation policies and utilization models for managing resources, and
allows implementation of user-defined algorithms as well.
• The documentation provides pre-coded examples for new developers to get familiar with the
basic classes and functions.
• Tackle bottlenecks before deployment to reduce risk, lower costs, increase performance, and
raise revenue.
CloudSim Architecture:
CloudSim Core Simulation Engine provides interfaces for the management of resources such as
VM, memory and bandwidth of virtualized Datacenters.
CloudSim layer manages the creation and execution of core entities such as VMs, Cloudlets,
Hosts etc.
It also handles network-related execution along with the provisioning of resources and their
execution and management.
User Code is the layer controlled by the user. The developer can write the requirements of the
hardware specifications in this layer according to the scenario. Some of the most common
classes used during simulation are:
• Datacenter: used for modelling the foundational hardware equipment of any cloud
environment, that is the Datacenter. This class provides methods to specify the functional
requirements of the Datacenter as well as methods to set the allocation policies of the VMs
etc.
• Host: this class executes actions related to management of virtual machines. It also defines
policies for provisioning memory and bandwidth to the virtual machines, as well as allocating
CPU cores to the virtual machines.
• VM: this class represents a virtual machine by providing data members defining a VM’s
bandwidth, RAM, mips (million instructions per second), size while also providing setter and
getter methods for these parameters.
• Cloudlet: a cloudlet class represents any task that is run on a VM, like a processing task, or a
memory access task, or a file updating task etc. It stores parameters defining the
characteristics of a task such as its length, size, mi (million instructions) and provides
methods similarly to VM class while also providing methods that define a task’s execution
time, status, cost and history.
• DatacenterBroker: is an entity acting on behalf of the user/customer. It is responsible for
functioning of VMs, including VM creation, management, destruction and submission of
cloudlets to the VM.
• CloudSim: this is the class responsible for initializing and starting the simulation
environment after all the necessary cloud entities have been defined and later stopping after
all the entities have been destroyed.
Features of CloudSim:
SAP Platform :
➢ SAP Cloud Platform (SCP) is a platform-as-a-service (PaaS) product that provides a
development and runtime environment for cloud applications.
➢ Based in SAP HANA in-memory database technology, and using open source and open
standards, SCP allows independent software vendors (ISVs), startups and developers to
create and test HANA-based cloud applications.
➢ According to SAP, SCP is primarily intended to allow organizations to extend existing on -
premises or cloud-based ERP applications with next-generation technology,
➢ Such as advanced analytics, blockchain or machine learning; build and deploy new
enterprise business cloud and mobile apps; integrate and connect enterprise applications
regardless of the application location or data source; and connect enterprise applications and
data to IoT.
➢ For example, SCP facilitates the integration of SAP S/4HANA Finance with cloud
applications like SAP Ariba or SAP SuccessFactors.
➢ It can also integrate these applications with non-SAP systems and data sources, including
social media sites and other vendors' enterprise applications.
➢ SCP is based on open standards and offers developers flexibility and control over which
clouds, frameworks and applications to deploy, according to SAP.
➢ SCP uses different development environments, including Cloud Foundry and Neo, and
provides a variety of programming languages.
SAP Cloud Platform licensing models
➢ Although the applications developed and running on SCP provide widely divergent
functions and benefits, they share a common characteristic of enabling business
digital transformation.
➢ A number of custom use cases are available on SCP, including:
➢ Building custom, SAP Fiori-like user experience (UX) apps for SAP S/4HANA.
➢ There are also a number of early SCP customers who have implemented its services
and technology in production environments, according to SAP.
➢ For example, German robotics firm Kuka AG uses SAP Cloud Platform to connect
robotics in manufacturing processes.
➢ Mitsubishi Electric Europe incorporates IoT into its industrial automation technology
via SCP.
➢ Global healthcare company Aesculap developed an Apple iOS app on SCP that
manages and simplifies the use of sterile containers in surgeries.
➢ Analytics, which allows you to embed advanced analytics into applications for real -
time results.
➢ User Experience, which lets you develop personalized and simple user interactions.
➢ Although SAP Cloud Platform shares a similar name with SAP HANA Enterprise
Cloud (HEC), the two platforms have different intents and purposes.
➢ Both are variations of HANA cloud technology, but the two products use different
service models.
➢ While SCP offers a PaaS tool intended for developing and running cloud-based
applications,
➢ HEC is an infrastructure-as-a-service (IaaS) tool that enables companies to run
SAPbased operations in a hosted environment.
➢ SAP hosts HEC applications in several data centers located around the world and
provides ongoing application support and management, including upgrades, backups,
patches, restoration and recovery, infrastructure monitoring and event detection.
➢ VMware Cloud services are services that enable you to integrate, manage, and secure
applications on cloud resources.
➢ These services work for any cloud service using VMware and can help you centralize
the management and maintenance of hybrid or multi-cloud environments.
➢ VMware Cloud services enable you to determine how resources are used and where
workloads are deployed while applying a single operational model.
➢ This enables you to standardize security, reduce management complexity, and
improve your ROI.
➢ You can use VMware Cloud services with either public or private clouds.
➢ When integrating these services you do not need to re-architecture applications or
convert data. This can help you simplify app modernization and ensure high
performance.
➢ VMware Cloud services are available in a variety of technologies provided as part of
a VMware Cloud subscription. This subscription offers a wide range of services.
➢ The following services are particularly helpful for monitoring and managing your
cloud environments:
VMware Cloud on AWS
• Cloud Provider Metering
• vRealize Network Insight Cloud
• vRealize Log Insight
• vRealize Automation
➢ This integration was developed jointly by AWS and VMware and applies VMware
➢ You can use this integration to extend on-premises or other cloud services to AWS.
➢ When you integrate VMware Cloud with AWS you gain access to a single-tenant
➢ EC2 instances are optimized for high volume input/output operations and storage
(NVMe).
infrastructure. In your deployment you can control scaling with options for between
➢ Additionally, VMware Cloud on AWS enables you to run the VMware Software-
Defined