0% found this document useful (0 votes)
26 views35 pages

Security Prerequisites

Uploaded by

Mohammad Aref
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views35 pages

Security Prerequisites

Uploaded by

Mohammad Aref
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Cloudera Professional Services

Security Engagement Prerequisites


2
Contents
1 Introduction ................................................................................................................................ 5
2 Professional Services Prerequisites .............................................................................................. 5
3 Introduction to Hadoop Security .................................................................................................. 6
3.1 Background .......................................................................................................................................................6
3.2 Kerberos Principals and Keytabs .......................................................................................................................6
3.3 Identity Management .......................................................................................................................................7
3.4 Group Membership ...........................................................................................................................................8

4 Security Reference Architecture .................................................................................................. 9


4.1 Logical Architecture ..........................................................................................................................................9
4.2 Architecture Summary ......................................................................................................................................9

5 Authentication and Authorization Prerequisites ........................................................................ 13


5.1 License Keys ................................................................................................................................................... 13
5.2 Root/sudo access ........................................................................................................................................... 13
5.3 Authentication Packages ................................................................................................................................ 13
5.4 Identity Management Integration ................................................................................................................. 13
5.5 Active Directory OU and OU user .................................................................................................................. 14
5.6 Enable SSL/TLS on Active Directory ............................................................................................................... 14
5.7 Principals and Keytabs ................................................................................................................................... 14
5.8 Active Directory Bind Account ....................................................................................................................... 15
5.9 Active Directory Groups for Privileged Users................................................................................................. 15
5.10 Active Directory Groups for Cloudera Manager and Navigator Role........................................................... 15
5.11 Sentry and Navigator Databases .................................................................................................................. 16
5.12 Active Directory Test Users and Groups ...................................................................................................... 16
5.13 Multiple Realms/Active Directory Domains................................................................................................. 16
5.14 SAML configuration ...................................................................................................................................... 16
5.15 TLS/SSL Certificates ...................................................................................................................................... 17
5.16 Firewall ......................................................................................................................................................... 18
5.17 Windows Registry Key Setup ....................................................................................................................... 19
5.18 Admin/Root Access for Installing ODBC Drivers .......................................................................................... 19
5.19 Cloud Considerations ................................................................................................................................... 19

6 Encryption Prerequisites ............................................................................................................ 21


6.1 Navigator Key Trustee .................................................................................................................................... 21
6.2 HDFS Encryption............................................................................................................................................. 22
6.3 Navigator Encrypt........................................................................................................................................... 23

3
6.4 Entropy Requirements ................................................................................................................................... 23

7 Security Prerequisites Checklist ................................................................................................. 25


Appendix A: Example MIT KDC Configuration .................................................................................. 29
Appendix B: Example Kerberos Client Configuration ........................................................................ 30
Appendix C: Example Custom Kerberos Keytab Retrieval Script ....................................................... 31
Appendix D: Example Scripts for Creating Principals and Keytab Files in Active Directory ................. 32
Appendix E: Test for Active Directory OU and CM Principal Privilege Before Enabling Kerberos ........ 35

4
1 Introduction
This document describes Cloudera's Security Reference Architecture for configuring authentication,
authorization, and encryption in an Enterprise Data Hub (EDH). Its aim is to explain to the reader
Cloudera's recommended approach for setting up Hadoop security in enterprise environments and the
associated prerequisite steps that need to be completed prior to a Cloudera Professional Services
engagement.

 Note: Should the reader identify any potential exceptions to following the Security Reference
Architecture (for example, having to use a local MIT KDC instead of integrating directly with Active
Directory) then these should be communicated to Cloudera prior to any Professional Services
engagement.

2 Professional Services Prerequisites


Prior to Cloudera Professional Services arriving onsite the customer must complete the Cloudera
Professional Services Prerequisites* and the prerequisites from this document.

* Cloudera Professional Services Prerequisites v1.9

5
3 Introduction to Hadoop Security
3.1 Background
Hadoop security leverages Kerberos † to perform user authentication and prevent malicious user
impersonation. Kerberos is a network authentication protocol designed by Massachusetts Institute of
Technology (MIT) to provide strong authentication for client/server applications. Kerberos forms the
basis for many authentication systems such as MIT Kerberos on Linux and Microsoft Active Directory
on Windows.

Without Kerberos enabled, Hadoop makes no effort to verify that a user is who they say they are.
Without 3rd party verification users can easily impersonate other users to circumvent any authorization
controls that have been put in place (e.g. HDFS file permissions).

With Kerberos enabled, users must first authenticate with a Kerberos Key Distribution Centre (KDC) to
obtain a valid Ticket-Granting-Ticket (TGT). This ticket is used by Hadoop to ensure that a user is
authenticated. Microsoft's Active Directory provides a Kerberos KDC as well as LDAP directory
services to solve Hadoop authentication across the Cloudera platform.

Kerberos authentication coupled with Hadoop's authorization mechanisms, such as HDFS POSIX
permissions and Sentry, gives a robust security model to protect access to data in a cluster.

3.2 Kerberos Principals and Keytabs


A user in Kerberos is called a principal, and a principal is made up of three components: primary,
instance, and realm:

 The primary component is an arbitrary string that is typically a user's operating system
username or the name of a service (e.g. hdfs );
 The instance component is an optional string that qualifies the primary. It is separated from the
primary by a slash (/).
o For example, Cloudera Manager's principal qualifies its primary as an admin instance
(e.g. cloudera-scm/admin@[Link] ) and the Hadoop service principals use the
instance to define the host on which a service runs (e.g.
hdfs/[Link]@[Link] );
 The realm is used to logically define a related group of principals. Large organizations
commonly create distinct realms to delegate administration of groups within the enterprise. A
Hadoop cluster deployment will typically have its own realm. Realms, by convention, are
written in uppercase.

† [Link]

6
The Hadoop service principals have the form:
service_name/[Link]@[Link] , where service_name is the
username of an existing Unix account that is used to run a Hadoop process, such as hdfs or yarn .
These are commonly referred to as Service Principal Names (SPN).

Users who want to access the Hadoop cluster will also need to have Kerberos principals. User principals
typically have the form: username@[Link] , where username refers to the username of the user's
Unix account. User principals typically do not include the instance component.

There are two ways to authenticate a principal in Kerberos: password login or keytab. Users will
typically authenticate via a password login and Hadoop processes will authenticate via keytab files that
are either created on successful password login or referenced from a prebuilt keytab file.

A keytab is a file containing one or more Kerberos principals and their encrypted keys. A keytab file is
used to authenticate a principal to Kerberos without human interaction. Because having access to a
keytab file allows one to act as the principal(s) it contains, the keytab files should be tightly secured (i.e.
readable by a minimal set of users, stored on the local disk, and not included in the host backup, unless
access to those backups is secure). Every Hadoop process running in the cluster has its own keytab file
since the principals are unique to each service and host.

3.3 Identity Management


A secure Hadoop cluster requires that all users (including service users) have Unix accounts available to
every cluster host. These accounts are used to own the executing Hadoop processes and job tasks. For
example, the hdfs service account will own the NameNode and DataNode processes, and a given user
account will own the executing tasks for a submitted job.

Cloudera recommends that all the service accounts be created locally on each host across the cluster.
There is no additional step to create these accounts, as installation of Cloudera software will
automatically create them.

Cloudera recommends that all user accounts and groups be managed in Active Directory (or other
identity management system) and the cluster's Linux hosts be configured to cache these accounts using
integration software such as SSSD‡, Centrify, or QAS/VAS. This provides fast lookup for user and
group information that is critical for proper functioning of Hadoop Authorization.

 Note: Prior to a Cloudera Professional Services engagement the customer should have their desired
identity management solution and Linux identity integration in place.

‡[Link]
US/Red_Hat_Enterprise_Linux/6/html/Deployment_Guide/[Link]

7
3.4 Group Membership
By default Hadoop retrieves user's group membership information from Linux, which in turn can use
local groups or groups defined in the central identity management system, as discussed above.

Hadoop can also be configured to retrieve group membership information directly from LDAP.
Although, care must be taken when choosing this option because it will incur additional network traffic
and may cause degraded cluster performance. This can also cause significant additional load in some
cases on the backend LDAP servers and is not recommended.

8
4 Security Reference Architecture
This section describes Cloudera's Security Reference Architecture for configuring authentication,
authorization, and encryption in an Enterprise Data Hub (EDH). Not every customer will be able to
setup Hadoop security inline with these recommendations (e.g. due to individual security policies) so
areas where alternate approaches can be taken are highlighted.

4.1 Logical Architecture

4.2 Architecture Summary


The Security Reference Architecture focuses on four key elements:

 Perimeter security: guarding access to cluster services and only exposing specific interfaces to
external business users, applications, and networks (firewalls and authentication)
 Access control: controlling what data users and applications can access, and what compute
resources they can utilize (authorization and multitenancy)
 Data visibility: reporting on where data is and how it is being used (auditing and lineage)

9
 Data protection: protecting the visibility of sensitive data (encryption)

Firewall

The first line of defence for managing access to cluster services is a firewall. A correctly configured
firewall will secure the cluster perimeter and only allow access to specific services, such as Cloudera
Manager, Navigator, and the edge node interfaces. A full list of services and ports can be found in
Cloudera's online documentation: [Link]
us/documentation/enterprise/latest/topics/cm_ig_ports.html.

Authentication

Cloudera recommends integrating directly with Active Directory for cluster authentication. In this
architecture:

 All service and user principals are created in Active Directory


 All cluster hosts are configured to authenticate with Active Directory using [Link]
 Cloudera Manager will connect directly to Active Directory to create and manage the service
principals for the CDH services. To do this Cloudera Manager uses its own Active Directory
account that has the privileges to create other accounts within a given Organisational Unit (OU)
 All service and user principals are authenticated by Active Directory
 All cluster hosts cache user accounts and groups from Active Directory using SSSD (or other
Linux/AD integration tool)
 Active Directory is used to authenticate users with Cloudera Manager, Cloudera Navigator, Hue,
and Impala. Cloudera Manager, Navigator, and Hue also support SAML authentication for Single
Sign-On (SSO)

 Note: If it is not possible to create Cloudera Manager its own Active Directory account with the
required privileges, then the CDH service principals will need to be created manually. The
corresponding keytab files can then be securely stored on the Cloudera Manager host before being
imported into Cloudera Manager using a Custom Kerberos Keytab Retrieval Script.

Please refer to the appendix for example scripts for manually generating the CDH service principals
in Active Directory, and an example Custom Kerberos Keytab Retrieval Script.

10
 Note: If it is not possible to integrate directly with an Active Directory KDC for Kerberos
authentication then the alternative is to setup a local MIT KDC. Cloudera Manager will connect
directly to the local MIT KDC to create and manage the CDH service principals.

A cross-realm trust can then be established between the local MIT KDC and a central Active
Directory KDC for user authentication. If it is not possible to setup a cross-realm trust then a
Hadoop administrator must create and manage all users accounts in the local MIT KDC.

[Link]
us/documentation/enterprise/latest/topics/cm_sg_kdc_def_domain_s2.html

[Link]

Authorization and multitenancy

 HDFS POSIX permissions and extended ACLs are used to secure directories and files
 Sentry is used to provide fine-grained access control to Hive/Impala/Solr objects
 Static service pools (cgroups) are used to divide up compute resource at the CDH service level
 YARN/Impala dynamic resource pools are used with ACLs to control user and group access to
compute resource
 Active Directory user accounts and groups being cached in Linux are used for authorization in
HDFS, Sentry, and YARN

Auditing and lineage

 Cloudera Navigator is used for auditing, lineage, and business metadata capabilities

Encryption

 If encrypting data at rest is a requirement, Cloudera recommends the use of HDFS-level


encryption, which can be applied only to HDFS directories where it is needed
 Cloudera Navigator Key Trustee is used for enterprise-grade reliable key storage. The default
Java keystore should be used for test purposes only
 If network-level encryption is a requirement then this can be enabled for data sent from client
user interfaces, service-to-service communication (e.g. RPC), and Cloudera Manager Agent
communication. This protection uses industry-standard protocols such as SSL/TLS. Even when

11
network-level encryption is not a requirement for data itself, Cloudera recommends that
certain interfaces be encrypted so that the end-user or the service account credentials are not
transmitted in clear text.

12
5 Authentication and Authorization Prerequisites
5.1 License Keys
License keys to any new components have to be acquired by the customer from Cloudera. For
Encryption at rest with Navigator KTS and/or Cloudera Navigator an EDH license is required.

5.2 Root/sudo access


Additional packages will be installed from the OS repository. Root access or sudo access to root will be
needed to install additional packages.

5.3 Authentication Packages


The following Kerberos package and its dependencies will be installed on all hosts and the OpenLDAP
package and its dependencies will be installed on the Cloudera Manager Server host.

OS Packages to be Installed

openldap-clients on the Cloudera Manager Server host


RHEL/CentOS 5, 6, & 7
krb5-workstation on ALL hosts

openldap2-client on the Cloudera Manager Server host


SLES
krb5-client on ALL hosts

ldap-utils on the Cloudera Manager Server host


Ubuntu or Debian
krb5-user on ALL hosts

5.4 Identity Management Integration


All cluster hosts should be integrated with the customer's internal Active Directory infrastructure using
SSSD, Centrify, Quest Authentication Services (QAS), or similar tool. It is also recommended that a
mechanism be put in place to allow users to login to edge nodes only. Only admin users should be able
to login to all cluster hosts.

 Note: If Centrify, QAS, or a similar tool is chosen, then the customer may have to purchase licenses
from the respective vendor. This should be planned in advance as it may take some time to procure
license and configure the tool.

 Note: Care should be taken to ensure that the identity management tool does not associate the
service principal names (SPNs) with the host principals when the hosts are joined to the Active

13
Directory domain. For example, Centrify by default associates the HTTP SPN with the host principal,
so the HTTP SPN should be excluded when the hosts are joined to the domain. More information
can be found here [Link]

 Note: When using SSSD to integrate with AD, AD should support POSIX schema. This is available in
AD version 2008 onwards. For other versions of AD additional AD components to support POSIX
schema may need to be installed. Unique IDs should be assigned to LDAP attributes for uidNumber
and gidNumber

5.5 Active Directory OU and OU user


The customer should create a separate, dedicated OU in Active Directory for the cluster, and create an
account that has privileges to create additional accounts in that OU. More information on how to setup
the account for delegation can be found here along with the corresponding blog:
[Link]
[Link]
for-kerberos-authentication

Recommendations on AD OU structure and default groups can be found here:


[Link]

Refer Appendix E for sample script to verify the OU and the delegation account privileges

5.6 Enable SSL/TLS on Active Directory


The customer should enable SSL/TLS in Active Directory and Cloudera Manager should be able to
connect to Active Directory on the LDAPS port (TCP 636).

5.7 Principals and Keytabs


When implementing the recommended direct-to-AD approach all required service principals and
keytabs are created and managed by Cloudera Manager. If the direct-to-AD approach cannot be used
then the customer should manually create the service principals in Active Directory and provide
associated keytab files.

The accounts will have the User Principal Name (UPN) set to service/fqdn@REALM . The account
will also have the Service Principal Name (SPN) set to service/fqdn . The principal name in the
keytab file will be the UPN of the account. The keytab files will have the naming convention
servicename_fqdn.keytab .

The following principals need to be created for each host they run on:

14
accumulo/fqdn@REALM
flume/fqdn@REALM
hbase/fqdn@REALM
hdfs/fqdn@REALM
hive/fqdn@REALM
HTTP/fqdn@REALM
httpfs/fqdn@REALM
hue/fqdn@REALM
impala/fqdn@REALM
kms/fqdn@REALM
mapred/fqdn@REALM
oozie/fqdn@REALM
sentry/fqdn@REALM
solr/fqdn@REALM
spark/fqdn@REALM
webhcat/fqdn@REALM
yarn/fqdn@REALM
zookeeper/fqdn@REALM

5.8 Active Directory Bind Account


The customer should create an Active Directory account that can be used for LDAP binding in Cloudera
Manager, Navigator, Hue and Impala for LDAP(s) authentication.

5.9 Active Directory Groups for Privileged Users


Groups should be created in Active Directory and corresponding members should be added for
authorized users, HDFS admins, and HDFS superuser groups.

The purpose of these groups:


 Authorized users: a group consisting of all users that need access to the cluster
 HDFS Admins: group of users that need to run HDFS administrative commands
 HDFS superusers: group of users that have superuser privilege (r/w access to all data and
directories in HDFS)

It is not recommended to put regular users into the HDFS superuser group. Instead, an account that
administrators escalate to should be part of the HDFS superuser group.

5.10 Active Directory Groups for Cloudera Manager and Navigator Role
Groups should be created in Active Directory and corresponding members should be added for the roles
that will be used for role-based access in Cloudera Manager and Navigator.

The Cloudera Manager roles and their definitions are available here:

15
 [Link]
r_roles.html
Cloudera Navigator roles and their definitions available here:
 [Link]
[Link]

5.11 Sentry and Navigator Databases


Create a separate database for storing Sentry and/or Navigator metadata. Admin credentials for these
databases will be requested during Sentry and/or Navigator installation.

For an up-to-date list of supported databases visit:


[Link]
[Link]?scroll=topic_2_unique_5

For an up-to-date list of supported databases for Navigator visit:


[Link]
pt_oqr_j4k_np_unique_1

Since CDH 5.13, Sentry High availability is available. Sentry HA provides automatic failover in the event
that your primary Sentry host goes down or is unavailable. In CDH 5.13.0, you can now have two
Sentry hosts. For information about how to configure high availability for Sentry, see Sentry High
Availability.
Sentry supports Hive metastore high availability.

5.12 Active Directory Test Users and Groups


At least one Active Directory user and group should be provided to test HDFS permissions, Sentry, and
YARN resource pools.

5.13 Multiple Realms/Active Directory Domains


If users from multiple Kerberos realms will be using the cluster, then at least a one-way trust is needed
so that users from external domains are trusted in the realm the cluster is secured.

5.14 SAML configuration


If SAML authentication is required then the following steps must be completed:

1. Generate a private key and certificate pair for signing/encrypting SAML data

16
2. Retrieve the SAML metadata XML file from your IDP. This file must contain the public
certificates needed to verify the sign/encrypt key used by the IDP per the SAML Metadata
Interoperability Profile
3. Get the entity ID that should be used to identify the Cloudera Manager/Navigator/Hue instance
4. Know how the user ID is passed in the SAML authentication response:
a. As an attribute. If so, what identifier is used?
b. As the NameID
5. Know the method by which the Cloudera Manager/Navigator/Hue role will be established:
a. From an attribute in the authentication response?
i. What identifier will be used for the attribute?
ii. What values will be passed to indicate each role?
b. From an external script that will be called for each request
i. The script takes user ID and $1
ii. The script sets an exit code to reflect the assigned role (see below)
iii. A negative value is returned for a failure to authenticate

5.14.1 Script exit codes:


 0 - Administrator
 1 - Read-only
 2 - Limited operator
 3 - Operator
 4 - Configurator

5.14.2 Additional SAML packages for Hue


For SAML authentication with Hue these additional packages will need to be installed on the Hue host:
 openssl and swig from the package repository
 xmlsec1 package from [Link]
 Python packages for pysaml2 and djangosaml2 from Github

5.15 TLS/SSL Certificates


Communication between Cloudera Manager and Cloudera Agents can be encrypted so that sensitive
information such as Kerberos keytab files are not communicated from Cloudera Manager to the Agents
in clear text.

There are different levels of protection available (levels 1 to 3). Level 3 (the highest level) requires each
Cloudera Agent to have its own certificate. If this level of protection is required then the customer
should provide certificates for each node in the cluster.

17
At a minimum, certificates are needed for Cloudera Manager and Edge Nodes to secure the web
interfaces. Either the CN or the subjectAltName of the certificate should match the valid fully
qualified DNS hostname of the host the certificate belongs to. If one or more subjectAltName is
present in the certificate then at least one subjectAltName should be the same as the CN in Subject
of the certificate. If haproxy or Load Balancer will be used for accessing cluster services such as Impala,
Solr etc., then the DNS name of the hosts where haproxy is running or the Load Balancer DNS name
(VIP) should also be specified in the subjectAltName of all the certs that’ll be part of load balanced
services. As a best practice Cloudera recommends that the certs created have corresponding RSA keys
of size 2048 bits and signed with SHA-256 hashing algorithm. The certs should either have no
restrictions or have these properties set
keyUsage = digitalSignature, keyEncipherment
extendedKeyUsage = serverAuth, clientAuth

If the CA that signs these certificates is internal, then the customer will also need to provide the
certificate chain of the CA that signs these certificates. It should be noted that for CA’s that sign
certificates with SHA-1 that browsers may reject the TLS certificates until SHA-2 is available from the
CA. When creating certificates for a cluster, the same password must be used for all hosts.

The same certificates can also be used to encrypt Cloudera Manager, Hue, Hadoop web interfaces,
HiveServer2, Impala JDBC/ODBC, and YARN encrypted shuffle. A node does not need more than one
cert.

 Note: If any external services such as LDAPS or SAML use certificates from an internal CA, then the
public certificate of the Root CA and any Intermediate CA in the chain should be provided.
Intermediate CAs will only be trusted if the corresponding server does not provide full certificate
chain in the TLS/SSL connection.

5.16 Firewall
If perimeter security is in place and there is a firewall in between the cluster and Active Directory, then
the following ports should be opened to all cluster hosts.

Access
Component Service Qualifier Port Protocol
Requirement
Kerberos KRB5 KDC Server Secure 88 UDP/TCP External
KRB5 Admin Server Secure 749 TCP Internal
LDAP LDAP non-ssl 389 TCP External

18
LDAP SSL Secure 636 TCP External
AD Global Catalog 3268 TCP External
AD Global Catalog Secure 3269 TCP External

5.17 Windows Registry Key Setup


If Windows clients require access to HiveServer2 and Impala via JDBC then the following registry key
should be added to the Windows clients:

Windows XP:
Add a new DWORD_VALUE called “allowtgtsessionkey”, and assign it a value of 1.

HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\Lsa\Kerberos\
REG_DWORD name: allowtgtsessionkey
Value: 1

Windows 2000, Windows Vista and above:


Add a new DWORD_VALUE called “allowtgtsessionkey”, and assign it a value of 1.

HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\Lsa\Kerberos\Parameters
REG_DWORD name: allowtgtsessionkey
Value: 1

5.18 Admin/Root Access for Installing ODBC Drivers


Clients that need access to HiveServer2 or Impala via ODBC will need admin/root privileges on the
client machine to install the ODBC drivers and to configure DSNs.

5.19 Cloud Considerations


While the options to kerberize the cluster remain the same for Cloud deployments (as do all the caveats
mentioned in those sections), there are some additional considerations that are needed for approaches
that integrate with the Active directory.

5.19.1 Considerations for the direct-to-AD approach

[Link] Kerberos
In this deployment all the CDH services talk to AD domain controller(s) for authentication. When an
end user has to authenticate, the request is sent to AD as well. So latency from the cluster to AD is a
concern for the stability of the cluster services as well as user experience.

19
[Link] LDAP
Since various components in CDH have the ability to use LDAP authentication, the latency for LDAP
requests to the AD is more important for the LDAP authentication setup. Cloudera Manager, HUE, Hive
and Impala are the components that can use LDAP authentication.

[Link] User/Group information on Linux


Tools like SSSD and Centrify also need direct access to the AD environment. While there is
functionality in each software to cache the user/group information on the cluster nodes, they could run
into issues when the latency from Cloud nodes to the AD infrastructure is high.

5.19.2 AD domain controller placement


We know that the direct-to-AD approach is the best approach to provision Kerberos for on-premises
environments because of all the caveats mentioned in the previous sections. For Cloud deployments,
the direct-to-AD approach is only recommended if there is the possibility of deploying AD domain
controllers in the cloud. You can either consider deploying a new cloud specific domain
([Link]) and kerberize the cluster with that domain OR deploy domain controllers
that serve the existing domains ([NA,EU].[Link]) in the Cloud environment. If the AD
team doesn’t want the domain controllers in the cloud to serve the existing domains, then the cloud
specific domain approach is more desirable. A one-way trust can be deployed between the cloud domain
and the existing domains if a two-way trust is undesirable.

20
6 Encryption Prerequisites
6.1 Navigator Key Trustee
6.1.1 Key Trustee Server minimum requirements
The customer should allocate two servers to host the Key Trustee Server that meets the minimum system
requirements. Latest minimum requirements for KeyTrustee Server can be found at
[Link]

 Note: The minimum storage requirements cannot be allocated to one partition/volume. KTS
installation needs space in /usr, /opt and /var/lib, /var/log and /var/run

6.1.2 Navigator Key Trustee parcel download


The customer should acquire Navigator Trustee Repository parcels by downloading from Cloudera
website’s downloads section at [Link] .

6.1.3 Package dependencies


The customer should be aware of the required package dependencies (covered in the Navigator Key
Trustee Installation and Maintenance Guide).

6.1.4 Backup and Failure Options


The customer should be aware of the available backup and fail-over options for Key Trustee Server.

6.1.5 Firewall
Network access through firewall can be found here
[Link]
kmm_3q_unique_1__section_qjc_tgb_kq_unique_1

The software is downloaded from Cloudera’s repository ([Link] so the customer


must be able to access this URL. Optionally, if email is configured, outbound TCP traffic over port 25
(SMTP) will be required.

6.1.6 TLS
If the customer would like to protect Navigator Key Trustee traffic with TLS, the following criteria
should be met:

 Customer should provide a certificate, which has been properly signed by a trusted CA
 Customer should provide the private key used to generate Certificate Signing Request (CSR) for
the above certificate
 Customer should provide the chain file for the CA that signed the above certificate

21
 The DNS hostname of the Key Trustee Server should either match the Common Name ( CN ) of
the above certificate or match one of the hostnames in subjectAltName . If one or more
subjectAltName is present in the certificate then at least one subjectAltName should be
same as the CN in Subject of the certificate.
 The 'SSL Verification Commands' below have been run and returned successfully

6.1.7 TLS verification


If the customer would like to encrypt Key Trustee Server traffic with TLS, they should provide a CA-
signed PEM certificate. This certificate should match the DNS hostname of the Key Trustee machine
and must be signed by a trusted authority (or CA).

The following command should complete successfully without error:


openssl verify -CAfile [Link] [Link]

6.2 HDFS Encryption


6.2.1 Key Management Server (KMS) minimum requirements
The customer should allocate a pair (a primary and a failover) of servers (VM preferred) to host KMS
Servers that meet the minimum system requirements. The hardware requirements for KMS Server are
same as KTS servers except the KMS servers need an additional 10 GB space in /opt volume as CDH
packages are also installed on KMS Servers.
 Note: The minimum storage requirements cannot be allocated to one partition/volume. KTS
installation needs space in /usr, /opt and /var/lib, /var/log and /var/run

22
6.3 Navigator Encrypt
Please note that Navigator Encrypt is currently not supported on the Azure environment.

6.3.1 Navigator encrypt minimum requirements


The minimum requirements for Navigator Encrypt can be found at
[Link]

6.3.2 Downtime Required


HDFS data will not be accessible during the encryption process, please submit any necessary downtime
requests.

6.3.3 Navigator Encrypt binaries download


The customer should acquire Navigator Encrypt Repository packages by downloading from Cloudera
website’s downloads section at [Link]

6.3.4 Package Dependencies


The customer should be aware of the required package dependencies explained at
[Link]

6.3.5 Firewall
Customer will need to allow TCP traffic to the Navigator Key Trustee Server. It is explained in the
network requirements section at
[Link]

6.3.6 TLS
If a self-signed (or non-CA signed certificate) is being used on the Key Trustee Server, the Encrypt
servers should be updated to trust the new certificate.

6.4 Entropy Requirements

23
Cryptographic operations require entropy to ensure randomness.

You can check the available entropy on a Linux system by running the following command:

$ cat /proc/sys/kernel/random/entropy_avail

The output displays the entropy currently available. Check the entropy several times to determine the
state of the entropy pool on the system. If the entropy is consistently low (500 or less), you must
increase it by installing rng-tools and starting the rngd service. Run the following commands on RHEL
6-compatible systems:
$ sudo yum install rng-tools
$ sudo echo 'EXTRAOPTIONS="-r /dev/urandom"' >> /etc/sysconfig/rngd
$ sudo service rngd start
$ sudo chkconfig rngd on

For RHEL 7, run the following commands:


$ cp /usr/lib/systemd/system/[Link] /etc/systemd/system/
$ sed -i -e 's/ExecStart=\/sbin\/rngd -f/ExecStart=\/sbin\/rngd -f -r
\/dev\/urandom/' /etc/systemd/system/[Link]
$ systemctl daemon-reload
$ systemctl start rngd
$ systemctl enable rngd

Make sure that the hosts running Key Trustee Server, Key Trustee KMS, and Navigator Encrypt have
sufficient entropy to perform cryptographic operations.

24
7 Security Prerequisites Checklist
For detailed explanations on the specific task see the table following the checklist.

Completed Completed
Task
Date By
Authentication and Authorization

☐ Opened required firewall ports


☐ Acquired license keys from Cloudera
☐ Setup root/sudo access on the cluster hosts
☐ Installed required authentication packages on all cluster hosts
☐ Installed and configured chosen Linux/AD integration tool (e.g.
SSSD) on all cluster hosts

☐ Created
Manager
Active Directory OU and OU user account for Cloudera

☐ Enabled SSL/TLS on Active Directory


☐ [Exception] Created CDH service principals and keytab files (only
required if not using the recommend direct-to-AD architecture)

☐ Created Active Directory Bind Account


☐ Created Active Directory groups for privileged users
☐ Created Active Directory groups for Cloudera Manager and
Navigator roles

☐ Created databases for Cloudera Navigator and Sentry


☐ Created Active Directory test users and groups

25
☐ Configured cross-realm trusts for multi-realm authentication
☐ [SAML only] Created certificates and gathered required
information from IDP for SAML authentication

☐ Generated SSL/TLS certificates


☐ [Windows clients only] Created Windows registry keys
☐ Installed Hive and/or Impala ODBC drivers on client machines
Encryption

☐ Completed Cloudera Navigator Key Trustee Server prerequisites


(covered in the Installation and Maintenance Guide)

☐ Provisioned Key Management Server (KMS) hosts for HDFS-


level encryption

☐ [Navigator Encrypt only] Installed required Encrypt packages


☐ Opened firewall ports to Navigator Key Trustee Server
☐ Sufficient Entropy validated on KMS and KT hosts

[Link].1 Item Description


Internally for the cluster there should be no firewalls.
Opened required firewall ports Recommend complete disable of iptables/firewalld on all
nodes.
Requires a proper EDH license key from Cloudera; ask your
Acquired license keys from Cloudera
account executive if you don’t have one.
Ideally root access with passwordless ssh on all nodes. If the
company policy does not allow root, then an alternate login
Setup root/sudo access on the cluster hosts
with sudo capability is required. Passwordless ssh should be
enabled for the alternate sudo account.
The libraries, openldap-clients and krb-workstation, should
Installed required authentication packages on all cluster hosts
be installed on all cluster nodes.
Existing AD Users and their groups need to be enabled on
cluster nodes. A Linux user’s shell group mapping is used in
Installed and configured chosen Linux/AD integration tool
HDFS. Normally a subset of users (Hadoop users) should be
(e.g. SSSD) on all cluster hosts
enabled on cluster nodes. There are commercial tools like
Centrify, Quest/VAS and others that can seamlessly enable

26
AD users on Linux machines. The open source SSSD can be
used but requires proper configuration by SSSD admin.
EDH Kerberos wizard can use the AD’s built-in Kerberos
server. To do this it requires an account that can
add/delete/modify other principals in AD’s LDAP structure.
Created Active Directory OU and OU user account for Normally, an OU is created in AD for Hadoop Kerberos
Cloudera Manager principals. Within the OU the account can add/delete/modify
principals. The EDH Kerberos wizard allows for a prefix to
identify all Hadoop cluster Kerberos principals added within
the OU.
AD’s LDAP server connection is normally secured with
SSL/TLS enabled. If it is not, ask the AD admin. The ldaps
Enabled SSL/TLS on Active Directory
protocol uses port 636. If this has been changed note the new
port number.
When using a user installed KDC server, EDH Kerberos
[Exception] Created CDH service principals and keytab files wizard can create the cluster’s principals. However, if there
(only required if not using the recommend direct-to-AD are other principals that are required (e.g. tools like Tableau)
architecture) then create the principal appropriately and obtain the keytab
for the principal.
To connect to AD’s LDAP a proper bind account is required
Created Active Directory Bind Account
that can access and read from LDAP.
In AD’s LDAP, a group that has admin rights in CDH cluster
Created Active Directory groups for privileged users
should be created.
Cloudera recommends usage of an external database like
Postgres, Mysql or Oracle to house data for Navigator and
Sentry. Empty database instances and grants must be
Created databases for Cloudera Navigator and Sentry provisioned prior to configuration in CDH. Please consult
online docs in Cloudera for deploying and configuring
external databases. Note: Navigator database requires
additional space.
It is recommended that at least 3 test users and groups be
Created Active Directory test users and groups created in AD to start testing role based privileges, to include
both read and write access.
A realm constitutes the boundary for a Kerberos cluster.
Sometimes enterprises might have separate realms for
computers versus users. When there are multi-realms
Configured cross-realm trusts for multi-realm authentication
involved, a trust needs to be established. For example if
computers are in realm B and users in realm A, a trust is
required for realm B to contact realm A users.
Path to SAML IDP Metadata File property to point to the
IDP metadata file. Path to SAML Keystore File property to
point to the Java keystore. SAML Keystore Password
[SAML only] Created certificates and gathered required property. Alias of SAML Sign/Encrypt Private Key
information from IDP for SAML authentication property. SAML Sign/Encrypt Private Key Password
property. SAML Entity ID property. Source of user ID in
SAML response property. SAML Role assignment
mechanism property.
Certificates should be CA signed and issued per node (the
CN should have FQDN of node). For level 3 TLS both web
Generated SSL/TLS certificates
server and web client should be present in the certificate.
Strong encryption type like SHA-256 is recommended.

27
[Windows clients only] Created Windows registry keys Consult Microsoft windows registry docs.
If there are external applications on client machines that need
Installed Hive and/or Impala ODBC drivers on client access to Hive/Impala then Cloudera recommends a proper
machines JDBC/ODBC driver be installed on the client machine.
Cloudera has the latest driver available for download.
Encryption Encryption
Cloudera recommends the usage of Navigator Key Trustee
Completed Cloudera Navigator Key Trustee Server
server (KTS) for production clusters. HA is only supported
prerequisites (covered in the Installation and Maintenance
with Nav KTS. Additionally, support for external HMS
Guide)
keystores can be integrated only via Navigator KTS.
Provisioned Key Management Server (KMS) hosts for Cloudera recommends separate nodes for KMS in the cluster.
HDFS-level encryption
Navigator Encrypt requires these packages
 dkms
 keyutils
 ecryptfs-utils
[Navigator Encrypt only] Installed required Encrypt  libkeytrustee
packages  navencrypt-kernel-module
 openssl
 lsof
 gcc,gcc3, and gcc4
 cryptsetup
Opened firewall ports to Navigator Key Trustee Server KTS requires ports 11371, 11381 to be opened.
Entropy on the system ensures a good randomized key
generation. Check to see if this number is high (around 3000;
Sufficient Entropy validated on KMS and KT hosts
cat /proc/sys/kernel/random/entropy_avail). If not you have
to install rng tools.

28
Appendix A: Example MIT KDC Configuration

# /var/kerberos/krb5kdc/[Link]

[kdcdefaults]
kdc_ports = 88
kdc_tcp_ports = 88

[realms]
[Link] = {
master_key_type = aes256-cts
acl_file = /var/kerberos/krb5kdc/[Link]
dict_file = /usr/share/dict/words
admin_keytab = /var/kerberos/krb5kdc/[Link]
supported_enctypes = aes256-cts:normal aes128-cts:normal
max_life = 1d 0h 0m 0s
max_renewable_life = 7d 0h 0m 0s
}

 Note: When integrating with Microsoft Active Directory (AD) the AES encryption types are not
supported until Windows Server 2008. For older versions include arcfour-hmac in
supported_enctypes .

Weak encryption types should not be supported, e.g. des* and arcfour-hmac-exp

29
Appendix B: Example Kerberos Client Configuration

# /etc/[Link]

[libdefaults]
default_realm = [Link]
ticket_lifetime = 1d 0h 0m 0s
renew_lifetime = 7d 0h 0m 0s
forwardable = true
udp_preference_limit = 1
kdc_timeout = 3000

[realms]
[Link] = {
kdc = [Link]
admin_server = [Link]
default_domain = [Link]
}

[domain_realm]
.[Link] = [Link]
[Link] = [Link]

[logging]
default = FILE:/var/log/[Link]
kdc = FILE:/var/log/[Link]
admin_server = FILE:/var/log/[Link]

30
Appendix C: Example Custom Kerberos Keytab Retrieval Script
The script should take two arguments: a destination file to write the keytab to, and the full principal
name to retrieve the key for.

#~/bin/bash

# Custom Kerberos Keytab Retrieval Script


# This script is called by Cloudera Manager Server. It takes two arguments:
# A destination file to write the key to, and
# The full principal name to retrieve the keytab for

# CM will input /tmp for DEST


DEST=$1

# CM will input principal name in format <service>/<fqdn>@REALM


PRINC=$2

# BASEDIR is the location on Cloudera Manager Server host where the keytabs
are stored
# BASEDIR=/etc/cloudera-scm-server/service_keytabs
BASEDIR=/root/cdh_keytabs

# Parse PRINC to determine keytab filename


SERV=${PRINC%%/*}
NORM=${PRINC%%@*}
FQDN=${NORM#*/}

# Keytab filenames should be in format <service>_<fqdn>.keytab


FILE=${BASEDIR}/${SERV}_${FQDN}.keytab
echo ${FILE}

if [ ! -e ${FILE} ] ; then
# Keytab not found
echo "Keytab not found: ${FILE}"
echo "Keytab not found: ${FILE}" > ${BASEDIR}/[Link]
exit 1
fi

cp ${FILE} ${DEST}
exit 0

31
Appendix D: Example Scripts for Creating Principals and Keytab
Files in Active Directory
Windows
The following bash script generates the appropriate DOS commands for creating principals and
extracting keytab files from Active Directory.

#!/bin/bash
set -x

#hbase/[Link]@[Link]
#hdfs/[Link]@[Link]

while read a; do
user=${a%%/*}
host=${a%@*}
host=${host#*/}
shost=${host%%.*}
uid="${user}_$shost"
realm=${a##*@}

if (( ${#uid} > 19 )); then


new_shost=`echo $shost | sed 's|-||g' | sed 's|centos|cent|g' | sed
's|cento|cent|g' `

if [ "$user" = "zookeeper" ]; then


new_uid="zk_$new_shost"
elif [ "$user" = "impala" ]; then
new_uid="imp_$new_shost"
elif [ "$user" = "mapred" ]; then
new_uid="MR_$new_shost"
elif [ "$user" = "hbase" ]; then
new_uid="hb_$new_shost"
else
new_uid="${user}_$new_shost"
fi

if (( ${#new_uid} > 19 )); then


echo name too long \"$new_uid\"
exit 1
else
uid="$new_uid"
fi
fi

32
echo "dsadd user \"CN=$uid,CN=Users,dc=qa,dc=cloudera,dc=com\" -upn
\"$user/$host\" -fn \"$user\" -ln \"$shost\" -display \"${uid}\" -pwd
Cloud3r@457"

echo "ktpass /out ${user}_$[Link] /crypto All /princ $a /mapuser


${uid} /ptype KRB5_NT_PRINCIPAL /mapop add /pass Cloud3r@457"

done

#dsadd user "CN=hbase_gs-hue-50-2,CN=Users,dc=qa,dc=cloudera,dc=com" -upn


"hbase/[Link]" -fn "hbase" -ln "gs-hue-50-2" -display
"hbase_gs-hue-50-2" -pwd Cloud3r@457

#ktpass /out hbase_gs-[Link] /crypto All /princ hbase/gs-hue-50-


[Link]@[Link] /mapuser hbase_gs-hue-50-2 /ptype
KRB5_NT_PRINCIPAL /mapop add /pass Cloud3r@457

This script is run against a list of principals stored in a file. If you have previously setup Hadoop security
with a local MIT KDC, then the list of principals can be obtained from Cloudera Manager:
Administration > Kerberos:

33
The resulting DOS commands should be saved to a file and copied to the Active Directory host where
they can be executed.

After the DOS commands have been run, the extracted keytab files can be copied to the Cloudera
Manager host and the Custom Kerberos Keytab Retrieval Script configured in Cloudera Manager. See
Appendix C: Example Custom Kerberos Keytab Retrieval Script.

34
Appendix E: Test for Active Directory OU and CM Principal Privilege
Before Enabling Kerberos
The following script creates a test account within the Active Directory OU provisioned for kerberizing
the cluster:

#!/bin/bash
set -x
LDAPTLS_CACERT=</path/to/cacert/for/tls> ldapmodify -H
ldaps://<[Link]:port> -x -D "serviceaccount" -w "Password" <<EOF
dn: CN=test,OU=CDH,OU=ENV,OU=Hadoop,DC=region,DC=company,DC=com
changetype: add
objectClass: top
objectClass: person
objectClass: organizationalPerson
objectClass: user
sAMAccountName: <test>
accountExpires: 0
unicodePwd:: <IgBBAG4ARQB4AGEAbQBwAGwAZQBQAGEAcwBzAHcAbwByAGQAMQAhACIA>
EOF

Execute the script after substituting appropriate values for the highlighted portions.

If the script throws an exception, please debug to ensure that the values are correct and the service
account has delegation privileges on the OU provisioned.

If the script completes successfully, you can query for the account using ldapsearch or look up active
directory to ensure that it was successfully created.

Upon successful verification, you can delete the test account created using the script below:

#!/bin/bash
set -x
LDAPTLS_CACERT=</path/to/cacert/for/tls> ldapmodify -H
ldaps://<[Link]:port> -x -D "serviceaccount" -w "Password" <<EOF
dn: CN=test,OU=CDH,OU=ENV,OU=Hadoop,DC=region,DC=company,DC=com
changetype: delete
EOF

35

You might also like