Distributed Comupting Applied in Oil Exploration
Distributed Comupting Applied in Oil Exploration
Cloud Computing and Big Data for Oil and Gas Industry
Application in China
Yang Zhifeng1,2,3*, Han Fei4, Feng Xuehui3, Yuan Qi3, Cao Zhen3, Zhang Yidan5
1Postdoctoral Workstation, Xinjiang Oilfield Company, Petrochina, Karamay, Xinjiang, China.
2State Key Laboratory of Petroleum Resource and Prospecting, China University of Petroleum, Beijing, China.
3Sugon Information Industry Co., Ltd., Beijing, China.
4Lenovo (Beijing) Co., Ltd., Beijing, China.
5 Xinjiang Oilfield Company, petroChina, Karamay, Xinjiang, China.
Abstract: The oil and gas industry is a complex data-driven industry with compute-intensive, data-intensive
and business-intensive features. Cloud computing and big data have a broad application prospect in the oil and
gas industry. This research aims to highlight the cloud computing and big data issues and challenges from the
informatization in oil and gas industry. In this paper, the distributed cloud storage architecture and its
applications for seismic data of oil and gas industry are focused on first. Then,cloud desktop for oil and gas
industry applications are also introduced in terms of efficiency, security and usability. Finally, big data
architecture and security issues of oil and gas industry are analyzed. Cloud computing and big data architectures
have advantages in many aspects, such as system scalability, reliability, and serviceability. This paper also
provides a brief description for the future development of Cloud computing and big data in oil and gas industry.
Cloud computing and big data can provide convenient information sharing and high quality service for oil and gas
industry.
Key words: Big data, cloud computing, cloud desktop, oil and gas industry, China.
1. Introduction
The evolution of Cloud computing over the past few years is potentially one of the major advances in the
history of computer field. At present, a lot of research work on cloud computing and big data has been applied at
the domestic and foreign field. Google was the first company to implement cloud computing services. IBM
released the “blue Cloud platform schedule” in 2007. Microsoft developed Windows live online services. The
governments of all countries also pay more attention about cloud computing. The U.S. government tax
monitoring platform had been deployed on the Amazon Cloud computing platform and became the first
government in the world to use commercial cloud computing services. The purpose of cloud computing is to
reduce costs and help users focus their critical application and avoid the constraint of traditional IT technique.
Cloud computing transfer the risks for over-provisioning or under-provisioning of the Cloud computing vendors,
who mitigates that risk by statistical analysis of customer group. The cloud computing vendors provide quality
and convenient services for IT client at a relatively low price [1].
With the rapid development of information technology in recently years, cloud computing and big data
technology has also been introduced into the oil and gas industry. The upstream oil and gas industry
engages in exploration and production, midstream includes storage and transportation, downstream evolves
refining and chemical industry as well as sales. Oil exploration and production has the characteristics of large
data volume, high I / O read and write performance, high reliability and linear expansion of data storage nodes.
The application of new generation cloud platform will help meet these challenges. Cloud computing integrates
application software platform of big data. It provides service for different departments of oil and gas industry.
Cloud computing provides a unified data storage and data access interface for the seismic exploration, drilling
and other department. It provides information services and establishes an integrated business information
platform. Mykola Gordii articulated the global oil and gas industry investment in public cloud would rise from
USD 1billion in 2012 to more than USD 2billion in 2014[2]. Spending on Cloud computing and big data
technologies in the next few years may climb as high as 42 billion/year [3].Based on the rapid development of
Cloud computing and big data technologies in China, the new Cloud storage, Cloud desktop and big data
architecture of oil and gas industry are presented in this study.
2. Cloud Computing
2.1. Cloud Computing Definition
Cloud computing is defined by the National institute of standard and Technology (NIST) as a model for
enabling convenience, on-demand network access to share pool of configurable computing resources(such as
servers, storage device, network, application, and services). Cloud computing architecture can be rapidly
deployment and released with the characteristic of low management cost and interaction service function”[4].
Cloud computing, which integrates grid computing, distributed computing, network storage, virtualization, load
balancing and other traditional computer technology, is an integrated product [5]. Distribution services of large
number scattered computers is the main feature of Cloud Computing [6]. Cloud computing has an enterprise
data center which enables to migrate resource to meet application requirements and provides access to all
storage devices. Cloud computing is appearing as a model in support of “everything-as-a-service” architecture,
hardware, software, network are available for users in the form of services [7].
Xu summarized the characteristics of Cloud computing as follows, low cost information technologies, real time
dynamic resource deployment, flexibility, standardization, and ultra-large scale [8].The advantage of Cloud
computing in oil and gas industry can be summarized as follows: first, it improves the resource utilization.
Second, Cloud computing provides free download, installation and time limitless service which can reduce the
cost of IT industry. Third, Cloud computing has unparalleled advantage in real time data backup, high reliability
for oil and gas industry information management. Fourth, the virtual data center of cloud computing can reduce
energy consumption and carbon emission which exactly meet the demand for green IT era.
very useful bridge that customers can move some computing resource to public Cloud while still hanging on to
legacy systems [10]. Feblowitz articulated that the public cloud is a deployment model where the Cloud is open
to a largely unrestricted potential client [16]. The private cloud is designed for restricting access to a single
enterprise. Hybrid cloud solution, which combines private cloud with public cloud resources, is a smart way to
reap many benefits from the public cloud while ensure data security.
Xu Weiping suggested that the critical application of oil and gas industry should be deployed in the form of
enterprise private Cloud [8]. Non-critical application, which can get flexibility and low cost management from
SAAS vendor’s service, should be deployed on public cloud. Enterprise hybrid cloud mod, which combines
private Cloud with public Cloud, can gain a balance between safety and efficiency. The author confirmed that the
cloud computing of information management system in oil and gas industry can not achieve overnight. First, we
should build private cloud to meet specific business needs of working environment in the oil and gas industry.
Second, we should continue to gather service and data expansion in platform. Third, with the continued data
extension, establish a Chinese “oil industry cloud” to service for the industry as a whole and even other
countries.
3. Big Data
3.1. Big Data Definition
Big data is information that is so large, complex and fast moving. It’s difficult to handle using everyday data
management tools [17]. Typically data sets are so large and complex that they require advanced data storage,
management, analysis, and visualization technology [18]. David cameron indicated that big data integrated a
variety of IT technologies, including regular analysis, large scale parallel databases, memory computing, fast
retrieval, natural language processing and the statistical analysis of large datasets [19]. J.Johnston proposed that
big data revealed hidden laws and unknown correlations according to the exhibited data set [20].
The first definition of big data was proposed by its three features, Volume, Velocity and Variety. Based on data
quality, IBM has added the fourth V called Veracity. While, oracle has added the fourth V called Value, for
emphasizing the value of big data [21]. A recent Mckinsey global institute report defines big data as “data-sets
whose size is beyond the ability of typical database software tools to capture, store, manage and analyze”. Big
data technologies are essentially based on ApacheTM Hadoop project which is open-source software for
reliability, scalability and distributed computing [22].
data processing. Streaming data structures are diverse with timestamp or ordering attributes. Streaming data
are new data or data streams, unknown or unlimited in advance, non-stored or non-predictable in memory. The
data are processed quickly and the results are obtained instantly. The data stream can reach at the second-
millisecond level. The typical application of streaming data processing is data collection (log, sensor and
webpage).
4. Application of Cloud Computing and Big Data in the Oil and Gas Industry
Oil and gas industry face a lot of questions such as data integration, achievement sharing and work
collaboration. Cloud computing provides not only massive data storage and sharing solutions, but also integrates
various types of hardware and software resources, to meet the informatization requirement of oil and gas
industry. At the same time, the data integration technology is used to manage all kinds of data and information
in a unified and effective way, and break the restrictions on accessing different departments among oil and gas
industry. Cloud storage combines various information islands to achieve a unified data information service and
sharing mechanism.
The cloud computing of oil and gas industry is mainly including research cloud and desktop cloud. Research
cloud improves work efficiency of exploration and production. Cloud computing can realize collaborative work
and remote visualization in the oil company's headquarters, branch company, and even remote place. Petroleum
companies will leverage cloud services to enhance existing super-computing capabilities and process massive
seismic data generated by ultra-sensitive seismic sensors, and reduce imaging analysis time. The cloud
computing in midstream and downstream oil and gas industry is mainly concentrated on the cloud-desktop
application, management and online software service.
Fig. 1. Cloud computing and big data of oil and gas industry.
grid technology, and distributed file system, cloud storage will integrate a large number of different storage
devices. The large-scale distributed cloud storage system has massive storage space, and supports the flexible,
high efficient, high-performance and reliable service for file sharing storage platform and high concurrent access.
At present, the research on cloud storage is mainly focused on the security of data storage and cloud storage
architecture. With the development of cloud storage technology, many scholars apply cloud storage technology
on the critical application systems in different industries.
Naresh vurukonda was concerned about the security of data storage in cloud computing, and proposed an
effective solution for data storage in the field of privacy and encryption under cloud environment [23]. Swapnali
More proposed public audit architecture for privacy protection based on the third party audit platform. Cloud
computing has the characteristics of privacy protection, public audit and data integrity [24]. Li concerned on the
security issues of cloud data storage,and proposed a way to prevent cloud administrator to gain sensitive data
from clients [25]. Wang studied the data replication techniques of distributed storage system and evaluated its
performance [26]. Wu developed a cloud storage system named MingCloud with high availability and high
performance. The cloud storage system has the architectural features of the Master/Slave [27]. Li used dual
active Master nodes and multi service heartbeat detection algorithm to build a distributed and high availability
framework for smart grid design [28]. Liao built distributed storage architecture for multimedia file segmentation
and editing storage with video servers cluster. Base on the business characteristics of the relay satellite system
[29], Chen design a cloud storage architecture. The cloud storage system meets the demand of massive data
storage, and solves the bottleneck of large capacity data storage in the relay satellite system [30].
4.1.1. Comparison of distributed cloud storage technology
With the expansion and changes of big data in application requirements, the traditional GFS, HDFS and Lustre
distributed file systems have obvious shortcomings in the requirements of massive file processing scenarios and
data storage fault tolerance. Compared with the traditional distributed file system, this paper introduces
Parastor, new and efficient distributed parallel cloud storage architecture. Parastor is an
asymmetric storage system for massive unstructured data processing. It can provide TB/s high speed bandwidth
and EB level storage space, with ultra-strong scale out capacity. It is far beyond the traditional NAS, SAN and
traditional distributed storage in terms of system capacity and linear aggregate bandwidth performance. It has
some typical features, such as high reliability, scalability, flexibility and excellent storage performance (Table 1).
4.1.2. Cloud storage application in oil and gas exploration
Petroleum exploration has the characteristics of single source, compute-intensive and complicated process. To
meet the storage requirement of the oil and gas industry, a distributed parallel cloud storage system is used to
build storage architecture with high stability, high security and sustainability. It can meet the demand of massive
seismic data storage in oil and gas industry, the high I/O reading and writing bandwidth and the high floating
point computing performance. The cloud storage architecture has four important modules, including storage
subsystem, computing subsystem, network subsystem and seismic data processing and interpretation
subsystem. The storage subsystem includes the management node, meta-data node and data node. The POSIX
protocol is configured for high performance computing application scenarios, with higher bandwidth and
consistency in the data cache. Computation subsystem is a critical part of oil exploration and production, mainly
engages in seismic data processing. Blade servers and fat node servers are usually selected, in order to meet
user's demand for high floating point computing capacity, high scalability and high system memory bandwidth of
the seismic data processing. The entire architecture selects 10GbE, FDR or EDR high-speed network, to ensure
data transmission rate and meet customer requirement for high-speed network. The workstation shows the
results of seismic data processing, and explains the strata and faults after the processing of seismic data. It
provides information on drilling decision analysis for geology engineers.
optimized.
On the basis of performance optimization, a set of distributed parallel cloud storage system architecture is
deployed. The architecture includes two dual –active redundant meta-data nodes (responsible for meta-data
access, storage system monitoring and management) and six data nodes (responsible for data access
requirement, and equipped with 216TB bare capacity). The hard disks use an erasure code protection strategy of
8 + 2: 1, with up to 75% utilization of storage space. The test results of the write / read aggregate bandwidth and
I / O throughput are up to 5GB / s. IOPS is 120000 times. IO response time is 9ms (Table 2). The result shows that
the distributed parallel cloud storage system meets the high reliability requirements of the high speed access
and storage system of oil field.
(a) Changes in Ls-l monitoring business data (b) Single file large block random read scene
Fig. 3. Business performance optimization of parastor.
server for image processing, which not only improves application performance but also enables VDI to meet
the needs of more users.
4.2.2. Characteristic of SCADA system
The SCADA system is the key business system in the middle stream oil and gas insustry. The SCADA system
collects real time data and realizes local or remote control. SCADA system carries on the comprehensive, real-
time monitoring for production running process. It provides the necessary reference data for production,
dispatching and management. The software of the business system includes 3 parts, the computer operating
system, the SCADA system, and the application software. The data transmission throughput is large and the real-
time performance is strong. According to rough statistics, the total number of parallel monitoring data is close to
millions, and the time precision is usually millisecond. The hardware of the SCADA system includes 3 parts, the
host computer system, the lower computer system, the automation instrument and the executive mechanism.
The host computer system can operate and control the instructions from other devices, such as the SCADA
server and PLC. The lower computer system is responsible for sending data to the upper device and executing
operation instructions from sensors and actuators. Data transmission is asymmetrical. The data collected from
the lower computer system to host computer system is massive.
4.2.3. Cloud desktop architecture of midstream in oil and gas industry
GPU servers, which are deployed with virtual desktop clusters, are used as the underlying hardware devices.
Hot standby node is configured to ensure reliability, continuity and stability for user business system. The GPU
servers are used to realize the graphical virtualization of the SCADA system. Use desktop virtualization software
to pool the underlying physical resources. Each virtual desktop is allocated 50GB to deploy the operating system,
50GB for data storage and 300GB for system management. The virtual desktop and the customer's SCADA
monitoring system are deployed in virtual machine. Virtual desktop terminals are deployed in thin clients cluster.
Thin client terminal clusters access the core switching network of SCADA. Independent display devices are used
to monitor the oil pipeline system and the crude oil pipeline system.
(a) data scale (b) data type of oil and gas industry
Fig. 6. Data characteristic of big data in oil and gas industry.
Petroleum companies use tape libraries to archive and disk arrays to store business data and enterprise office
data. Big data storage architecture of oil and gas industry uses traditional disk array and distributed cloud
storage solution. The largest seismic data volume is stored in the cloud storage architecture, and other structure
data are stored in the disk array. First,data cleaning and data standardize are processed, followed by big data
analysis. Spark streaming of Hadoop ecosystem is used to access real-time data of the upstream oil and gas
industry such as actuators and sensors of SCADA system. The results of memory computing can be imported
into Hadoop for relational data analysis and batch processing. The big data
architecture will establish a data model to provide exploration and production decision-making for geologists
and oil engineers. The marketing data of oil enterprises are stored in distributed storage, and the user's
consumption behavior is described by correlation analysis from Hadoop. After the analysis, the result of big data
is displayed on the big screen. The value of data mining will provide analysis and decision for the exploration and
production.
5. Cloud Computing and Big Data: State of the Art and Challenge
5.1. Big Data in Cloud Computing
Big data has the following characteristics: large capacity, fast speed, and safe storage. The storage
requirement of the big data can not be separated from the cloud computing. The high speed big data can only be
processed in the waiting time by cloud computing. At the same time, cloud computing is a feasible way to
analyze and understand big data. Big data value can be discovered through data mining. Its potential value can
be found from low value density data, and the implementation of big data mining technology is inseparable from
cloud computing. In a word, cloud computing is the core support technology on big data processing. Cloud
computing provides processing capabilities for big data. Cloud computing based big data analysis is a service
model in which elements of the big data analysis process are provided through public and private Cloud.
6. Conclusion
Our case study has revealed that the development process of China’s Cloud computing and big data in oil and
gas industry. Opportunities and challenges of Cloud computing and big data for oil and gas industry applications
are described. Parastor, distributed parallel storage architecture, can provide high quality cloud storage service.
It could meet the demand of massive data storage for the growing petroleum exploration system, and solve the
bottleneck of massive data storage. The cloud computing of oil and gas industry is mainly including research
cloud and desktop cloud. Desktop cloud applications mainly refer to the office desktop and monitor desktop of
oil and gas storage and transportation. The big data research in the upstream oil and gas industry are mainly on
seismic data processing, intelligent decision-making support system, drilling based multiple conditions
recognition and production data processing. In the midstream oil and gas industry, big data mainly focuses on oil
and gas storage and transportation. In the downstream oil and gas industry, the application of big data mainly
focuses on the gas station management, sales activity analysis, and customer behavior and preference. Cloud
computing and big data technology can provide convenient information sharing and high quality service for oil
and gas industry.
Abbreviations
NIST:National institute of standard and Technology; SAAS: software as a service; PAAS: Platform as a service;
IAAS: infrastructure as a service; GFS: google file system; HDFS: hadoop file system; OLTP: on-line transaction
process; OLAP: on-line analytical process; NAS: network attached storage; SAN: storage area network; FDR:
fourteen data rate; EDR: enhance data rate; BGP: bureau of geophysical prospecting INC.,china national
petroleum corporation; VDI: virtual desktop infrastructure; GPU: graphic processing unit; SCADA: supervisory
control and data acquisition; PPDM: profession petroleum data management; SEG: society of exploration
geophysicists; WITSML: well information transmission standard mark language
Acknowledgment
The authors thanks Zhang yulong for his initial contribution.
References
[1] Armbrust, M. (2009). Above the clouds, A berkeley view of cloud computing. Science, 53(4), 50-58.
[2] Gordij., & Mykola. (2013). Use of cloud computing in oil and gas industry. Geomatics and Environmental
Engineering, 7(2), 35-41.
[3] Buyya, R., Pandey, S., & Vecchiola, C. (2009). Cloudbus toolkit for market-oriented cloud computing.
Computer Science, 5931, 24-44.
[4] Garfinkel, S. L. (2011). Cloud computing defined business impact report series. Technology Review
Magazine.
[5] Vaquero, L. M., Rodero-Merino, L., & Lindner, M. (2008). A break in the clouds, towards a cloud definition.
Acm Sigcomm Computer Communication Review, 39(1), 50-55.
[6] Zheng, L., Chen, S., & Hu, Y. (2011). Applications of cloud computing in the smart grid. Proceedings of
International Conference on Artificial Intelligence, Management Science and Electronic Commerce (pp. 203-
206).
[7] Lenk, A., Klems, M., & Nimis, J. (2009). What’s inside the Cloud? An architectural map of the Cloud
landscape. IEEE Computer Society, 23-31.
[8] Xu, W. P., & Zhao, H. (2013). Research of cloud computing information management mode for oil
enterprise. Applied Mechanics & Materials, 336-338.
[9] Janssen, M., & Joha, A. (2011). Challenges for adopting cloud-based software as a service (SAAS) in the
public sector. European Conference on Information Systems.
[10] Sotomayor, B., Montero, R. S., & Liorente, I. M. (2009). Virtual infrastructure management in private and
hybrid clouds. IEEE Internet Computing, 13(5), 14-22.
[11] Owens, D. (2010). Securing elasticity in the cloud. ACM, 53(53), 46-51.
[12] Abokhodair, N., Taylor, H., & Hasegawa, J. (2016). Heading for the Clouds, Implications for Cloud Computing
Adopters, 1-9.
[13] Perrons, R. K., & Hems, A. (2013). Cloud computing in the upstream oil & gas industry, a proposed way
forward. Energy Policy, 56, 732-737.
[14] Geczy, P., Izumi, N., & Hasida, K. (2012). Cloudsourcing, managing cloud adoption. Global Journal of
Business Research, 6(2), 57-70.
[15] Hofmann, P., & Dan, W. (2010). Cloud computing, the limits of public clouds for business applications.
IEEE Internet Computing, 14(6), 90-93.
[16] Feblowitz, J. (2011). Oil and gas, into the cloud? Journal of Petroleum Technology, 63(5), 32-33.
[17] Hems, A., Soofi, A., & Perez, E. (2013). Drilling for new business value, how innovative oil and gas companies
are using big data to outmaneuver the competition. A Microsoft White Paper.
[18] Chen, H., Chiang, R. H. L., & Storey, V. C. (2012). Business intelligence and analytics ,from big data to big
impact. Society for Information Management and the Management Information System Research Center,
36(4), 1165-1188.
[19] Cameron, D. (2014). Big data in exploration and production, silicon snake-oil, magic bullet, or useful tool.
Society of Petroleum Engineers.
[20] Johnston, J., & Guichard, A. (2015). New findings in drilling and wells using big data analytics.
Proceedings of Offshore Technology Conference.
[21] Baaziz, A., & Quoniam, L. (2013). How to use big data technologies to optimize operations in Upstream
petroleum industry. International Journal of Innovation, 1(1).
[22] What Is Apache Hadoop? Apache™ Hadoop® Website, last update. Retrieved from
http,//hadoop.apache.org/#What Is+Apache+Hadoop%3F
[23] Vurukonda, N., & Rao, B. T. (2016). A study on data storage security issues in cloud computing.
Procedia Computer Science, (92), 128-135.
[24] More, S., & Chaudhari, S. (2016). Third party public auditing scheme for cloud storage. Procedia Computer
Science, 79, 69-76.
[25] Li, Y., Gai, K., & Qiu, L. [2016]. Intelligent cryptography approach for secure distributed big data storage
Yang Zhifeng was born in China in 1987. In 2012, he had graduated from China University of
Petroleum (Hua Dong), obtained the bachelor and master degree in geology. In 2016, he had
graduated from China University of petroleum (Bei Jing), obtained the Ph.D degree in reservoir
exploration and production. At present, he is working in the postdoctoral workstation of
Xinjiang Oilfield Company, PetroChina. Main research directions: cloud computing, big data
management, data aggregation and information development of
petroleum industry.
Han fei was born in China in 1985. In 2017, she had graduated from China University of
Petroleum (Bei jing), obtained the Ph.D degree in computer science. At present, she is working
in Lenovo (Beijing) Co., Ltd., Beijing. Main research directions: high performance computing, big
data and artifical Intelligence.