Why Hadoop cluster security is required?


In early versions of Hadoop the restricting access where designed to prevent accidental data loss, rather than prevent unauthorized access to data. The file permission system in HDFS prevents one user to accidentally whole file system from a program. But does not prevent a malicious user from assuming root’s identify to access or delete any data in cluster. HDFS file permissions provide only a mechanism for authorization. Only authorization is not enough for security purpose, because the system is still open for abuse. In year 2009 the first Hadoop authentication mechanism was implemented in Yahoo! In their design, the Hadoop itself does not manage the user authentication, instead it relies on Kerberos. Kerberos is a mature open-source network authentication protocol to authenticate users. Kerberos does not manage the user permissions. It just performs the user authentication process. It’s the job of Hadoop to determine whether authenticated user has permissions to perform a given action. Kerberos – principal (user) components In Kerberos, a user is called as principal, which made-up of three components: the primary, instance and realm. The first component, primary, is a string and may be operating system username of user or a name of a service. The instance is an optional section that follows the primary component. The instance may define a user role or a host name, on which the service is running. The instance is separated by primary by using slash The third component, realm, is similar to a domain in DNS. The realm in Kerberos defines the group of principals. The below examples shows the Kerberos principals: hadoopuser@HADOOP.CLOUDERA.COM – A standard user principal. User hadoopuser in realm HADOOP.CLOUDERA.COM. hadoopuser/admin@HADOOP.CLOUDERA.COM – User with admin role in the realm HADOOP.CLOUDERA.COM. hdfs/hadoop01.mydomain.com@HADOOP.CLOUDERA.COM – the hdfs service on host hadoop01.mydomain.com on realm HADOOP.CLOUDERA.COM. The following diagram shows the three-step Kerberos ticket exchange protocol.