[컴] hadoop deamon 들 설정

하둡 설정 / configuration /

hadoop deamon 들 설정

2종류의 config file들

configurations

ref. 1 에 가면, ’기본값’과 설정값의 ’의미’를 알 수 있다.

etc/hadoop/core-site.xml

  • fs.defaultFS
  • io.file.buffer.size

etc/hadoop/hdfs-site.xml

  • Name node
    • dfs.namenode.name.dir
    • dfs.hosts / dfs.hosts.exclude
    • dfs.blocksize
    • dfs.namenode.handler.count
  • Data Node
    • dfs.datanode.data.dir

etc/hadoop/yarn-site.xml

  • Resource Manager, Node Manager
    • yarn.acl.enable
    • yarn.admin.acl
    • yarn.log-aggregation-enable
  • Resource Manager
    • yarn.resourcemanager.address
    • yarn.resourcemanager.scheduler.address
    • yarn.resourcemanager.resource-tracker.address
    • yarn.resourcemanager.admin.address
    • yarn.resourcemanager.webapp.address
    • yarn.resourcemanager.hostname
    • yarn.resourcemanager.scheduler.class
    • yarn.scheduler.minimum-allocation-mb
    • yarn.scheduler.maximum-allocation-mb
    • yarn.resourcemanager.nodes.include-path
  • Node Manager
    • yarn.nodemanager.resource.memory-mb
    • yarn.nodemanager.vmem-pmem-ratio
    • yarn.nodemanager.local-dirs
    • yarn.nodemanager.log-dirs
    • yarn.nodemanager.log.retain-seconds
    • yarn.nodemanager.remote-app-log-dir
    • yarn.nodemanager.remote-app-log-dir-suffix
    • yarn.nodemanager.aux-services
  • History Server
    • yarn.log-aggregation.retain-seconds
    • yarn.log-aggregation.retain-check-interval-seconds
  • Node Manager 의 health monitoring 을 위해 사용되는 설정
    • yarn.nodemanager.health-checker.script.path
    • yarn.nodemanager.health-checker.script.opts
    • yarn.nodemanager.health-checker.interval-ms
    • yarn.nodemanager.health-checker.script.timeout-ms

etc/hadoop/mapred-site.xml

  • MapReduce Applications
    • mapreduce.framework.name
    • mapreduce.map.memory.mb
    • mapreduce.map.java.opts
    • mapreduce.reduce.memory.mb
    • mapreduce.reduce.java.opts
    • mapreduce.task.io.sort.mb
    • mapreduce.task.io.sort.factor
    • mapreduce.reduce.shuffle.parallelcopies
  • MapReduce JobHistory Server
    • mapreduce.jobhistory.address
    • mapreduce.jobhistory.webapp.address
    • mapreduce.jobhistory.intermediate-done-dir
    • mapreduce.jobhistory.done-dir

table

Parameter
Value Notes
fs.defaultFS NameNode URI NameNode URI hdfs://host:port/
io.file.buffer.size SequenceFiles에 사용되는 읽기/쓰기 버퍼의 크기 131072 Size of read/write buffer used in SequenceFiles.




dfs.namenode.name.dir log 관련 로컬 파일 시스템의 경로. 이곳에 NameNode가 네임스페이스 및 트랜잭션 로그를 지속적으로 저장한다. Path on the local filesystem where the NameNode stores the namespace and transactions logs persistently. If this is a comma-delimited list of directories then the name table is replicated in all of the directories, for redundancy.
dfs.hosts / dfs.hosts.exclude permitted/excluded data node의 list List of permitted/excluded DataNodes. If necessary, use these files to control the list of allowable datanodes.
dfs.blocksize HDFS block size 이다. 268435456 HDFS blocksize of 256MB for large file-systems.
dfs.namenode.handler.count 추가적인 thread 개수 많은 수의 DataNode로부터 오는 RPC들을 처리하기 위한 추가적인 NameNode server thread수 100 More NameNode server threads to handle RPCs from large number of DataNodes.




dfs.datanode.data.dir data node의 local filesystem 의 경로 data node가 갖고 있게 되는 block 들을 저장할 위치 Comma separated list of paths on the local filesystem of a DataNode where it should store its blocks. If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices.




yarn.acl.enable Access Control List(ACL) 을 사용할지 여부, 기본값은 false true / false Enable ACLs? Defaults to false.
yarn.admin.acl cluster의 admin 을 설정하기 위한 ACL Admin ACL ACL to set admins on the cluster. ACLs are of for comma-separated-usersspacecomma-separated-groups. Defaults to special value of * which means anyone. Special value of just space means no one has access.
yarn.log-aggregation-enable log aggregation 을 할지 여부 FALSE Configuration to enable or disable log aggregation




yarn.resourcemanager.address RM주소, client들이 job을 submit 하기 위한 RM주소 ResourceManager host:port for clients to submit jobs. host:port If set, overrides the hostname set in yarn.resourcemanager.hostname.
yarn.resourcemanager.scheduler.address application master들이 resource를 얻기위해 scheduler 에게 이야기하기 위한 RM 주소 ResourceManager host:port for ApplicationMasters to talk to Scheduler to obtain resources. host:port If set, overrides the hostname set in yarn.resourcemanager.hostname.
yarn.resourcemanager.resource-tracker.address nodemanager들을 위한 RM 주소 ResourceManager host:port for NodeManagers. host:port If set, overrides the hostname set in yarn.resourcemanager.hostname.
yarn.resourcemanager.admin.address 관리자 command 를 위한 RM 주소 ResourceManager host:port for administrative commands. host:port If set, overrides the hostname set in yarn.resourcemanager.hostname.
yarn.resourcemanager.webapp.address RM web-ui 주소 ResourceManager web-ui host:port. host:port If set, overrides the hostname set in yarn.resourcemanager.hostname.
yarn.resourcemanager.hostname host 를 적는다. 모든 yarn.resourcemanager*address 설정값들이 설정된다. Port는 각 component 의 기본값으로 설정된다. ResourceManager host. host Single hostname that can be set in place of setting all yarn.resourcemanager*address resources. Results in default ports for ResourceManager components.
yarn.resourcemanager.scheduler.class 사용하려는 scheduler를 변경할 때 쓸 수 있다. ResourceManager Scheduler class. CapacityScheduler (recommended), FairScheduler (also recommended), or FifoScheduler. Use a fully qualified class name, e.g., org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.
yarn.scheduler.minimum-allocation-mb RM에서 각 container 요청에 할당할 메모리 최소치 Minimum limit of memory to allocate to each container request at the Resource Manager. In MBs
yarn.scheduler.maximum-allocation-mb RM에서 각 container 요청에 할당할 메모리 최대치 Maximum limit of memory to allocate to each container request at the Resource Manager. In MBs
yarn.resourcemanager.nodes.include-path / yarn.resourcemanager.nodes.exclude-path permitted/excluded nodemanager의 리스트 List of permitted/excluded NodeManagers. If necessary, use these files to control the list of allowable NodeManagers.




yarn.nodemanager.resource.memory-mb node manager가 사용가능한 물리적 메모리 Resource i.e. available physical memory, in MB, for given NodeManager Defines total available resources on the NodeManager to be made available to running containers
yarn.nodemanager.vmem-pmem-ratio 각 task의 가상메모리 사용량이 물리적인 memory limit 을 넘을 수 있는 최대비율 (물리적 메모리의 몇배를 가상메모리로 사용할 것인가.?) Maximum ratio by which virtual memory usage of tasks may exceed physical memory The virtual memory usage of each task may exceed its physical memory limit by this ratio. The total amount of virtual memory used by tasks on the NodeManager may exceed its physical memory usage by this ratio.
yarn.nodemanager.local-dirs local filesystem의 path들 여기에 intermediate data 가 쓰여진다. 여러 path 를 적으면 disk i/o 를 분산하기 좋다. Comma-separated list of paths on the local filesystem where intermediate data is written. Multiple paths help spread disk i/o.
yarn.nodemanager.log-dirs log들이 쓰여질 local filesystem 의 path 여러 path 가 disk i/o 를 분산시키는 데 좋다. Comma-separated list of paths on the local filesystem where logs are written. Multiple paths help spread disk i/o.
yarn.nodemanager.log.retain-seconds log aggregation 이 disabled 되어야 사용가능 node manager의 log files 를 유지하는 시간 10800 Default time (in seconds) to retain log files on the NodeManager Only applicable if log-aggregation is disabled.
yarn.nodemanager.remote-app-log-dir log aggregation 이 enabled 되어야만 사용됨 application 이 끝났을때 application log 들이 move되는 HDFS directory 적절한 permission 을 설정해야 한다. /logs HDFS directory where the application logs are moved on application completion. Need to set appropriate permissions. Only applicable if log-aggregation is enabled.
yarn.nodemanager.remote-app-log-dir-suffix log aggregation 이 enabled 되어야만 사용됨. Remote log dir 뒤에 붙는 글자(suffix) 로그는 yarn.nodemanager.remote − app − log − dir/{user}/${thisParam} 에 저장되게 된다. logs Suffix appended to the remote log dir. Logs will be aggregated to yarn.nodemanager.remote − app − log − dir/{user}/${thisParam} Only applicable if log-aggregation is enabled.
yarn.nodemanager.aux-services Map Reduce application 들을 위해 설정될 필요가 있는 Shuffle service mapreduce_shuffle Shuffle service that needs to be set for Map Reduce applications.




yarn.log-aggregation.retain-seconds aggregation log 를 얼마나 오래 가지고 있을건지 -1 How long to keep aggregation logs before deleting them. -1 disables. Be careful, set this too small and you will spam the name node.
yarn.log-aggregation.retain-check-interval-seconds aggregated log 의 유지를 얼마만에 한번씩 확인할지. 0 또는 음수는 aggregated log retention time의 1/10 로 설정된다. -1 Time between checks for aggregated log retention. If set to 0 or a negative value then the value is computed as one-tenth of the aggregated log retention time. Be careful, set this too small and you will spam the name node.




mapreduce.framework.name
yarn Execution framework set to Hadoop YARN.
mapreduce.map.memory.mb map들을 위한 더큰 resource limit 1536 Larger resource limit for maps.
mapreduce.map.java.opts map들의 child jvm들의 더 큰 heap-size -Xmx1024M Larger heap-size for child jvms of maps.
mapreduce.reduce.memory.mb reduce들을 위한 더큰 resource limit 3072 Larger resource limit for reduces.
mapreduce.reduce.java.opts reduce들의 child jvm들의 더 큰 heap-size -Xmx2560M Larger heap-size for child jvms of reduces.
mapreduce.task.io.sort.mb 효율성을 위한 data 정렬을 하는 동안의 더 큰 memory limit 512 Higher memory-limit while sorting data for efficiency.
mapreduce.task.io.sort.factor 파일들을 정렬하는 동안에 한번에 merge 되는 stream들을 얼마나 더 많이 100 More streams merged at once while sorting files.
mapreduce.reduce.shuffle.parallelcopies 매우많은 수의 map으로 부터 결과들을 가져오기 위해 reduce에 의해 수행되는 더 많은 병렬 복사(parallel copies) 50 Higher number of parallel copies run by reduces to fetch outputs from very large number of maps.




mapreduce.jobhistory.address MapReduce JobHistory Server MapReduce JobHistory Server host:port Default port is 10020.
mapreduce.jobhistory.webapp.address MapReduce JobHistory Server Web UI MapReduce JobHistory Server Web UI host:port Default port is 19888.
mapreduce.jobhistory.intermediate-done-dir MapReduce job들에 의해 history 파일들이 written 되는 directory /mr-history/tmp Directory where history files are written by MapReduce jobs.
mapreduce.jobhistory.done-dir history 파일들이 MR JobHistory Server에 의해 관리되는 directory /mr-history/done Directory where history files are managed by the MR JobHistory Server.

See Also

References

  1. Apache Hadoop 2.10.1 – Hadoop Cluster Setup

댓글 없음:

댓글 쓰기