CDOSS Certificate
BIG DATA ADMIN MASTERY
Profiles that can prepare this certification contents: Data engineer, Big Data Consultant.
Global knowledge to be acquired to pass this certification:
- Kerberos Fundamentals
- KDC Setup: krb5-kdc, kdb5_util create -s, realm configuration (/etc/krb5.conf)
- Admin Tools: kadmin.local, addprinc, ACL management (kadm5.acl)
- HA KDC: Database replication (kdb5_util dump/load), multi-KDC config
- Time Synchronization
- Chrony/NTP: Server/client configurations (allow subnet directives), iburst for rapid sync
- Time Drift Impact: Kerberos ticket expiration failures
- Cloudera Manager Security
- Kerberos Enablement:
- Wizard steps (realm, KDC server, encryption types: aes128-cts, arcfour-hmac)
- Principal authentication (admin/admin)
- Service Restarts: Cluster relaunch post-Kerberization
- HDFS Security & HA
- NameNode HA:
- Failover testing (kill active NN → verify standby takeover via master:9870)
- Kerberized Operations:
- Directory creation: sudo -u hdfs kinit -kt hdfs.keytab → hdfs dfs -mkdir
- Ownership: hdfs dfs -chown user:group /path
- Hive Authorization
- Impersonation: hive.server2.enable.doAs=true (user differentiation)
- RBAC:
- Enable: hive.security.authorization.enabled=true
- Admin role: hive.users.in.admin.role=hive
- Privileges: GRANT/REVOKE SELECT ON TABLE …
- HBase ACLs
- Coprocessors: AccessController for master/regionserver
- Permission Letters:
- R (Read), W (Write), C (Create), A (Admin)
- Commands: grant, revoke, user_permission
- YARN Queue Management
- Dynamic Queues:
- Placement rules (e.g., route user → root.system.user)
- Auto-queue creation
- Legacy vs. Modern:
- capacity-scheduler.xml (old) vs. CM Queue Manager UI + ZK (new)
- User Provisioning
- Linux: adduser, usermod -aG group
- Kerberos: kadmin.local addprinc user@REALM
- HDFS Home Dirs