쿠...sal: deep

레이블이 deep인 게시물을 표시합니다. 모든 게시물 표시

[컴] ARM 의 license

Apple 의 CPU(A4, A5, A6 …) / Qualcomm 의 Snapdragon / 삼성의 Exynsos / 애플의 cpu 와 삼성의 cpu 의 차이 / 삼성의 비메모리 반도체 수준

qualcomm 의 snapdragon 이나 삼성의 엑시노스(Exynose) 와 차이가 궁금해서 조금 찾아봤다.

여기서 말하고자 하는 결론은 "Microarchitecture" 의 차이가 있다." 이다.

퀄컴도 삼성도, 그리고 애플도 전부 ARM core 를 사용한다. 처음 생각에는 이 ARM core 를 license 하는 거면, 결국 그냥 설계도 license 해서 fab(공장) 에 의뢰해서 찍어내면, core 는 다 비슷한 것 아닌가 라고 생각했다. 물론 SoC 이기 때문에 안에 들어가는 녀석들에 대한 디자인이나, 다른 core(GPU 같은) 들이 다른 성능을 가지기 때문에 그 부분이 다른 것이라고 생각했다.

ARM 의 license

그런데, 잘 못 알고 있었다. 알고 보니, ARM license 가 아래와 같이 2가지가 있다[ref. 1].

processor core license : Cortex A8, A9, A15…
ARM instruction set architecture license : ARMv7 ISA

core license 를 가진 경우에는 그냥 찍어내면 되는 것이고, instruction set architecture(ISA) 이 관한 license 가 있는 경우에는 ARM instruction set 을 기반으로 자신들이 알아서 원하는 대로 구현을 하는 것이다.

license a specific processor core (e.g. Cortex A8, A9, A15)

Some Qualcomm SoCs (e.g. the MSM8x25/Snapdragon S4 Play uses ARM Cortex A5s)
Apple A5 : Cortex-A9
Apple A4 : Cortex-A8
Samsung : Exynose seriese

license an ARM instruction set architecture (e.g. ARMv7 ISA)

Qualcomm Scorpion/Krait
Apple A6 : Swift; ARMv7-A compatible

Arm Flexible Access for Startups

ARM 이 2020년 4월30일에 발표한 새로운 라이센스 정책이다.[ref.8]

새로운 정책은 처음(early-stage, $500만 달러이하 펀딩을 받은 스타트업) 에는 돈을 안내고, 그들이 상업적인 실리콘과 비지니스 규모가 되면 그때 돈을 받는 구조이다.

Flexible Access for Startups – Arm

A6, Exynos, Snapdragon

Apple 이나 Qualcomm 은 이 두 가지 license 를 모두 갖고 있는 듯 하다.[ref. 1]

여하튼 그래서 Apple 은 A6 부터 자신만의 CPU 를 design 해서 만들기 시작한 듯 하고[ref. 1], Qualcomm 은 Scorpion 이라고 불리는 시점부터 아니면 그 이전부터 자신들이 직접 core 를 디자인 한 듯 하다.

삼성의 Exynos 는 아직 그런 수준에 이르지는 못한 것으로 보인다. 적어도 지금까지 나온 Exynos 의 사양을 보면 ARM processor core 를 license 해서 그대로 쓰고 있는 듯 보인다.[ref. 6] ref.7 을 보면 이제 더이상 core 개발을 하지 않는다고 한다.

물론 직접 디자인 한 것이 꼭 좋다고는 할 수 없겠으나, ARM 쪽에서 특정 vendor 만을 위한 core design 을 해 줄 까닭이 없기 때문에 아무래도 이런 능력을 가지고 있으며, 자신의 제품을 만들고 있는 Apple 쪽의 전망이 유망해 보이기는 한다.

References

[컴][자바] java 의 G1 collector 1/2

자바 / 가비지 컬렉터 /

G1 Garbage Collector 1/2

server-style garbage collector
multi-processor machines 를 target 으로 한다.
높은 처리량(througput)을 달성하면서 높은 확률로 garbage collection pause time 목표를 충족시키다.
Oracle JDK 7 update 4 부터 지원시작
다음과 같은 어플리케이션을 위해 디자인 됐다.
- CMS collector 와 같은 응용 프로그램 스레드와 동시에 작동 하는 application.
- pause time 을 유발하는 긴 GC가 없이 여유 공간을 compact 하는 application
- 더 예측 가능한 GC pause durations가 필요한 application
- 많은 throughput performance 를 희생하고 싶지 않은 application.
- 훨씬 더 큰 Java 힙이 필요하지 않는 application.

G1은 CMS (Concurrent Mark-Sweep Collector)의 장기 교체 계획이다.

cms 참고글 : Concurrent Mark Sweep (CMS) collector

G1과 CMS를 비교해서, G1의 더 나은 점은

G1이 compacting collector
- G1 은 충분히 압축한다
  - allocation을 위한 fine-grained free list 들의 사용을 완전하게 피하고, 대신에 region 들에 의존하기 위해서 --> 연속된 메모리로 옮겨놓는 것, 이로 인해 분산되어있는 것들에 대한 접근을위해 리스트를 traverse 해야 하지만, 한곳에 모아놓으면 traverse time 을 줄이는 것 같다.
  - 이는 collector의 일부를 상당히 단순화하고 잠재적인 fragmentation 문제를 대부분 제거한다
또한 G1은 CMS collector 보다 좀 더 예측 가능한 garbage collection pauses를 제공하며 사용자가 원하는 desired pause 목표를 지정할 수 있다.

G1 동작 overview

오래된 garbage collectors (직렬, 병렬, CMS)는 모두 heap을 fixed memory size의

young generation
old generation
permanent generation

의 세 섹션으로 구성한다.

모든 메모리 객체는 이 세 섹션 중 하나에 들어간다.

G1 collector는 다른 접근 방식을 취한다.

heap은 equal-sized heap regions 의 집합으로 나뉜다. 각각의 region은 virtual memroy 의 연속된 범위이다.

heap  ---> heap region
      |
      +---> heap region
      |
      +---> heap region

특정 region 집합에는 이전 collector와 동일한 역할 (eden, survivor, old)이 할당되지만 정의된 fixed size 는 없다. 이것이 메모리 사용에 있어서 더 큰 유연성을 제공한다.

marking phase

garbage collections 을 수행 할 때 G1은 CMS collector(Concurrent Mark-Sweep Collector)와 유사한 방식으로 작동한다.

G1은 동시 전역 마킹 단계(concurrent global marking phase)를 수행하여 heap 전체에서 객체의 liveness 를 결정한다.

마크 단계가 완료되면 G1은 어느 region이 많이 비어 있는지 알게 된다. 이 region에서 먼저 collect하여 일반적으로 많은 양의 여유 공간을 생성한다. 이러한 garbage collection 방법이 Garbage-First 라고 불리는 이유이다.

reclaimable heap region

이름에서 알 수 있듯이 G1은 회수 가능한(reclaimable) object들, 즉 garbage로 가득 찬 heap region에 수집 및 압축 활동(collection and compaction activity)을 집중시킨다.

G1은 pause prediction model(정지 예측 모델)을 사용하여 user-defined pause time target(유저가 정의한 정지 시간 목표)를 충족시키고 지정된 pause time target를 기반해서 "수집 할 영역(region)의 수"를 선택합니다.

evacuation

G1 이 reclamation 을 할 만큼 오래(ripe)됐다고 판단한 region은 evacuation(방출) 을 사용하여 garbage collect 가 된다.

G1은 "힙의 하나 이상의 region"에서 "힙의 단일 region"으로 객체를 복사하고 프로세스에서 둘 영역 모두의 메모리를 압축하고 해제한다. --> 여러개로 흩어진 region 을 하나의 연속된 공간으로 모으고, 이것을 하나의 region 으로 처리한다?

이러한 evacuation은 multi processors 에서 병렬로 수행되어 pause times을 줄이고 처리량을 증가시킨다.

따라서 G1은 각각의 garbage collection에서 지속적으로 동작하여 단편화를 줄이고 user-defined pause times 내에서 작업한다.

이것은 이전의 두 가지 방법으로는 불가능하다. CMS (Concurrent Mark Sweep) garbage collector 는 압축(compact)을 수행하지 않는다. ParallelOld garbage collection은 전체 힙 압축(whole-heap compaction) 만 수행하므로 상당한 pause times이 발생한다.

G1은 real-time collector가 아니다. 즉, real-time os 처럼 time critical 하지 않다. 높은 확률로 pause time target를 충족하지만, 절대적으로 확신할 수 있는 것은 아니다.

이전 collection 의 데이터를 기반으로 G1은 사용자가 지정한 target time 내에 collect 할 수있는 "region의 수"를 추정한다.

따라서 collector 는 region을 수집하는 비용에 대해 reasonably accurate model(합리적으로 정확한 모델)을 갖고있고, 이 모델을 사용하여 pause time target 내에 어느 region 을 collect 할지 얼마나 많은 region 을 collect 할 지를 결정한다.

참고 : G1에는 concurrent (애플리케이션 스레드와 함께 실행 (예 : refinement, marking, cleanup))와 parallel (멀티 스레드, 예를 들어 stop-the-world) 단계가 있다. full garbage collection은 여전히 단일 스레드이지만, 잘 tuning 하면 애플리케이션이 full GC를 피할 수 있다.

G1 발자국

ParallelOldGC 또는 CMS collector에서 G1으로 마이그레이션하면 JVM 프로세스 크기가 더 커질 수 있다. 이것은 Remembered Sets 및 Collection Sets과 같은 "accounting" data structures 때문이다.

RSets

Remembered Sets(RSets)는 특정 region 에 대한 obejct referece 들을 추적한다.
heap에는 region 당 하나의 RSet이 있다.
RSet을 사용해서 region의 병렬 및 독립 collection이 가능하게 한다.
RSets의 전체 foot printing 영향은 5 % 미만이다.

CSets

Collection Sets(CSets) GC에서 collect 될 region 들의 집합.
CSet의 모든 라이브 데이터는 GC 중에 evacuated 된다.(copied / moved).
regions 의 집합은 Eden, survivor 및 / 또는 old generation 이 될 수 있다.
CSets는 JVM 크기에 1 % 미만의 영향을 미친다.

G1의 권장 사용 사례

G1의 첫 번째 초점은 limited GC latency time 의 큰 heap이 필요한 애플리케이션을 실행하는 사용자에게 솔루션을 제공하는 것이다.

약 6GB 이상의 힙 크기와 0.5 초 미만의 안정적이고 예측 가능한 pause time 을 의미한다.

CMS 또는 ParallelOldGC garbage collector로 실행중인 응용 프로그램 중 G1로 전환하는 것이 좋은 경우(다음중 하나에 해당하면 G1 을 사용하는 것이 좋다.)

Full GC duration이 너무 길거나 너무 자주 일어난다.
object allocation rate 의 비율 또는 object promotion 비율이 크게 다르다.
원하지 않는 긴 garbage collection 또는 compaction pauses (0.5-1 초 이상)

참고 :

CMS 또는 ParallelOldGC를 사용하고 있고 응용 프로그램에 garbage collection pauses가 길지 않으면 현재 collector를 유지하는 것이 좋다.

최신 JDK를 사용해도 G1 collector 를 꼭 사용하지 않아도 된다.

Reference

Getting Started with the G1 Garbage Collector

[컴][OS] Spinlock, Mutext lock

Spinlock, Mutext lock

Mutex lock 은 os 에서 software 로 lock 을 만들었는데 그중 가장 간단한 도구가 mutex lock 이다.

Mutex lock 을 spinlock 이라고도 부른다. 왜냐하면, process 가 lock 이 available 될 때까지 계속 돌기 때문이다.(spin)

이 spinlock 은 busy waiting 을 만든다. 하지만 반대로 그래서 context switch 가 필요없다.

멀티코어 시스템에서 특정한 경우에는 spinlock 이 훨씬 좋은 선택이 된다.

만약 lock 이 짧은 순간이라면, 다른 thread 는 또다른 core 에서 critical section 을 수행하는 동안 하나의 thread 가 하나의 core 에서 lock 을 기다릴 수 있다.

Spinlock 은 multiprocessor 시스템에서 쓰는 locking mechanism 중 하나이다.
짧은 시간에 lock 을 할 때 사용한다.이 짧은 시간의 정의는

lock 은 2개의context switch 가 필요하다.
1. thread 를 waiting state 로 옮기는 context switch
2. lock 이 가능하게 됐을 때 waiting thread 를 복구 시키는 context switch
일반적인 spinlock 을 사용하는 규칙은 lock 이 2개 미만의 context switch 의 시간동안 lock 을 잡을 때 사용한다.

References

Operating System Concepts 10th edition

[컴][OS] 리눅스 부팅과정

linux boot process / 리눅스 부팅 절차 / 부팅 순서 / 부팅 되는 법 /

리눅스 boot process

리눅스 커널 이미지 (Linux kernel image) 는 압축된 file 이다. 그래서 이것은 memory 로 load 된 이후에 압축을 풀게 된다.
boot process 동안에, boot loader 는 일반적으로 initramfs 라 불리는 임시 RAM file system 을 만든다.
이 파일시스템에는 "실제 root file system"(이것은 main memory 에 없다.) 을 지원하기 위해 설치되어야만 하는 드라이버들과 커널모듈들을 포함하고 있다.
커널이 시작되고, 필요한 드라이버들이 설치되자마자, 커널은 root file system 의 위치를 임시 RAM 에서 적절한 root file system 위치로 옮긴다.
마지막으로 리눅스는 systemd process 를 생성한다. systemd process 가 system 에서 최초의 process 이다.
그리고 나서 다른 서비스들을 시작한다.(웹서버, db 같은..)
최종적으로 system 은 login prompt 를 보여준다.

Android

linux 에서 일반적으로 쓰이는 boot loader 인 GRUB 대신에 boot loader 를 vendors 가 알아서 제작한다.
가장 일반적인 안드로이드 boot loader 는 LK(little kernel) 이다.
안드로이드에서도 압축된 kernel image 를 이용하고, initramfs 를 이용한다.
안드로이드는 initramfs 를 device 의 root file system 으로 이용한다. 하지만, Linux 에서는 모든 필요한 driver 들이 loaded 되고 나서 initramfs 를 버린다.
kernel 이 load 되고, root file system 이 mounted 되면,
Android 는 init process 를 시작하고, 여러개의 service 들을 실행한다.

대부분의 boot loader 들은 recovery mode, 또는 single-user mode 등으로 booting 을 제공해준다.(windows 안전모드 같은)

Reference

Operating System Concepts 10th edition

[컴] V8 관련 글들

크롬엔진 / 크로미엄 엔진 / 노드 엔진 / javascript engine / nodejs engine / j

V8 관련 글들

[컴][머신러닝] 파이토치(Pytorch)에 대한 간략한 설명

What if pytorch / pytorch / desc / 파이토치 설명 / 파이토치 차이점. / Machine learning / ML / 머신러닝 / 기계학습

파이토치에 대한 간략한 설명

파이토치는 문턱이 낮다
파이토치는 공식 문서가 잘 만들어져 있다.
자체 운영 포럼이 있다.
연산 그래프 설정에서도 장점
- 파이토치는 ‘Define by Run’ 방식, 텐서플로우가 ‘Define and Run’ 방식
- ‘Define and Run’ 프레임워크에서는 연산 그래프를 먼저 정의하고 값을 넣어 결과를 얻는다. ‘Define by Run’ 프레임워크는 연산 그래프가 정의되는 동시에 연산이 이루어진다.(인용)
- 텐서플로우 2.0의 ‘즉시실행(Eager execution)’ 모드와 비슷하다.
  - 성능 문제, Very bad performance using Gradient Tape · Issue #30596 · tensorflow/tensorflow · GitHub

파이토치 tutorial

Reference

[컴] 오브젝트 스토리지(Object Storage)

백업 방법 / 여러리즌으로 / back up / 운용 / 주기적인 백업 / object storage

오브젝트 스토리지(Object Storage)

아마존의 심플 스토리지 서비스(Simple Storage Service, S3)
애저의 블롭 스토어
구글의 클라우드 스토리지

Object storage System

계층적 디렉토리 및 서브디렉토리 구조가 없는 일종의 파일 시스템

오브젝트 스토리지 시스템에 저장된 모든 오브젝트는 콘텐츠에 따라 고유 식별자(Unique IDentifier, UID)가 주어진다.
보통 uid 는 SHA-1 같은 hash algorithm 응 이용해서 만든다.

오브젝트 스토리지와 블록 스토리지 간의 큰 차이

오브젝트 스토리지에 저장된 모든 오브젝트는 최소한 3곳의 가용 구역으로 자동으로 복사된다
오브젝트 단위로 복제를 한다.
- 클라우드 블록 스토리지 및 보편적 RAID 시스템은 블록 수준에서 복제가 이루어진다.
오브젝트는 절대로 변경되지 않는다. (위키의 편집같은 것을 생각해보자.)
- 오브젝트가 변경되어야 한다면 그냥 새 오브젝트로 저장
- 버전 분류가 지원된다면 이전 오브젝트 버전은 역사적 목적으로 저장
오브젝트 스토리지는 위해를 끼칠 수 있는 여러 원인을 방어하는 여러 보호 수준을 자동으로 포함
선택적인 WORM(Write-Once-Read-Many) 보호
교차-지역 복제 등 이용 가능한 모든 베스트 프랙티스를 준수
오브젝트 스토리지가 여전히 오류를 범할 수 있는 인간에 의해 작성 –> 오브젝트 스토리지 내 데이터가 미션 크리티컬이라면 백업하는 것이 적절

다른 백업 서비스 이용

AWS 글레이저 딥 아카이브
애저 아카이브 스토리지
구글 콜드라인
상이한 계정 및 지역 을 이용해야 한다.

블록 볼륨 백업

블록 볼륨은 백업해야 한다.
스스로 이들을 백업.
블록 스토리지 스냅샷
- 다른 지역 및 계정으로 복제

오브젝트 스토리지 백업

오브젝트 스토리지는 여러 가용 구역으로 복제가 자동으로 이루어진다.
하지만 완벽이란 없다 –> 필요하다고 판단되면 백업을 해라.

Reference

클라우드 오브젝트 스토리지, 꼭 백업해야 하나?, 2019-10-29

[컴] GSMA IMEI Database

imei blacklist / IMEI 블랙 리스트 / 도난 당한 폰 / 폰 도난 방지 시스템 / 중고 / 해외에 중고로 판대

GSMA IMEI Database

GSMA 가 International Mobile Equipment Identity Database (IMEI Db) 로 알려진 유일한 시스템을 유지보수한다.

블랙리스트에 올라간 IMEI 들이 GSMA 중앙 IMEI DB 에 제공된다. 그러면 이 DB를 이용해서 통신업자들이 국제, 국내망 여러 네트워크에서 그 기기를 block 할 수 있다.

하지만, 이것은 완전하진 않다. ref.2 에서 이야기하듯이 IMEI 값은 변경이 가능하다.

GSMA IMEI DB 에 기록하는 정보

IMEI 는 15자리의 숫자이며, 모바일 네트워크에서 기기를 구분하기 위해 사용된다.

IMEI DB에는 “3GPP 를 따르는 기기들” 제조사들의 모든 공식적인 기기들의 정보를 기록한다.

IMEI 번호 : “3GPP 를 따르는 기기들” 제조사들의 모든 공식적인
제조사 이름
모델이름
주요 네트워크 능력(주파수 대역, 기기타입 등)

References

[컴][머신러닝] Recurrent Neural Networks 예제

ML / machine learning

Recurrent Neural Networks, RNN

python 3 버전

# encoding=utf8

"""
Minimal character-level Vanilla RNN model. Written by Andrej Karpathy (@karpathy)
BSD License
"""
import numpy as np
import codecs

# data I/O
with codecs.open('input.txt', 'r', encoding='utf-8') as fp:
    data = fp.read()
    # data = open('input.txt', 'r').read() # should be simple plain text file
    chars = list(set(data))
    data_size, vocab_size = len(data), len(chars)
    print ('data has %d characters, %d unique.' % (data_size, vocab_size))
    char_to_ix = { ch:i for i,ch in enumerate(chars) }
    ix_to_char = { i:ch for i,ch in enumerate(chars) }

    # hyperparameters
    hidden_size = 100 # size of hidden layer of neurons
    seq_length = 25 # number of steps to unroll the RNN for
    learning_rate = 1e-1

    # model parameters
    Wxh = np.random.randn(hidden_size, vocab_size)*0.01 # input to hidden
    Whh = np.random.randn(hidden_size, hidden_size)*0.01 # hidden to hidden
    Why = np.random.randn(vocab_size, hidden_size)*0.01 # hidden to output
    bh = np.zeros((hidden_size, 1)) # hidden bias
    by = np.zeros((vocab_size, 1)) # output bias

    def lossFun(inputs, targets, hprev):
      """
      inputs,targets are both list of integers.
      hprev is Hx1 array of initial hidden state
      returns the loss, gradients on model parameters, and last hidden state
      """
      xs, hs, ys, ps = {}, {}, {}, {}
      hs[-1] = np.copy(hprev)
      loss = 0
      # forward pass
      for t in range(len(inputs)):
        xs[t] = np.zeros((vocab_size,1)) # encode in 1-of-k representation
        xs[t][inputs[t]] = 1
        hs[t] = np.tanh(np.dot(Wxh, xs[t]) + np.dot(Whh, hs[t-1]) + bh) # hidden state
        ys[t] = np.dot(Why, hs[t]) + by # unnormalized log probabilities for next chars
        ps[t] = np.exp(ys[t]) / np.sum(np.exp(ys[t])) # probabilities for next chars
        loss += -np.log(ps[t][targets[t],0]) # softmax (cross-entropy loss)
      # backward pass: compute gradients going backwards
      dWxh, dWhh, dWhy = np.zeros_like(Wxh), np.zeros_like(Whh), np.zeros_like(Why)
      dbh, dby = np.zeros_like(bh), np.zeros_like(by)
      dhnext = np.zeros_like(hs[0])
      for t in reversed(range(len(inputs))):
        dy = np.copy(ps[t])
        dy[targets[t]] -= 1 # backprop into y. see http://cs231n.github.io/neural-networks-case-study/#grad if confused here
        dWhy += np.dot(dy, hs[t].T)
        dby += dy
        dh = np.dot(Why.T, dy) + dhnext # backprop into h
        dhraw = (1 - hs[t] * hs[t]) * dh # backprop through tanh nonlinearity
        dbh += dhraw
        dWxh += np.dot(dhraw, xs[t].T)
        dWhh += np.dot(dhraw, hs[t-1].T)
        dhnext = np.dot(Whh.T, dhraw)
      for dparam in [dWxh, dWhh, dWhy, dbh, dby]:
        np.clip(dparam, -5, 5, out=dparam) # clip to mitigate exploding gradients
      return loss, dWxh, dWhh, dWhy, dbh, dby, hs[len(inputs)-1]

    def sample(h, seed_ix, n):
      """ 
      sample a sequence of integers from the model 
      h is memory state, seed_ix is seed letter for first time step
      """
      x = np.zeros((vocab_size, 1))
      x[seed_ix] = 1
      ixes = []
      for t in range(n):
        h = np.tanh(np.dot(Wxh, x) + np.dot(Whh, h) + bh)
        y = np.dot(Why, h) + by
        p = np.exp(y) / np.sum(np.exp(y))
        ix = np.random.choice(range(vocab_size), p=p.ravel())
        x = np.zeros((vocab_size, 1))
        x[ix] = 1
        ixes.append(ix)
      return ixes

    n, p = 0, 0
    mWxh, mWhh, mWhy = np.zeros_like(Wxh), np.zeros_like(Whh), np.zeros_like(Why)
    mbh, mby = np.zeros_like(bh), np.zeros_like(by) # memory variables for Adagrad
    smooth_loss = -np.log(1.0/vocab_size)*seq_length # loss at iteration 0
    while True:
      # prepare inputs (we're sweeping from left to right in steps seq_length long)
      if p+seq_length+1 >= len(data) or n == 0: 
        hprev = np.zeros((hidden_size,1)) # reset RNN memory
        p = 0 # go from start of data
      inputs = [char_to_ix[ch] for ch in data[p:p+seq_length]]
      targets = [char_to_ix[ch] for ch in data[p+1:p+seq_length+1]]

      # sample from the model now and then
      if n % 100 == 0:
        sample_ix = sample(hprev, inputs[0], 200)
        txt = ''.join(ix_to_char[ix] for ix in sample_ix)
        print ('----\n %s \n----' % (txt, ))

      # forward seq_length characters through the net and fetch gradient
      loss, dWxh, dWhh, dWhy, dbh, dby, hprev = lossFun(inputs, targets, hprev)
      smooth_loss = smooth_loss * 0.999 + loss * 0.001
      if n % 100 == 0: print ('iter %d, loss: %f' % (n, smooth_loss)) # print progress
      
      # perform parameter update with Adagrad
      for param, dparam, mem in zip([Wxh, Whh, Why, bh, by], 
                                    [dWxh, dWhh, dWhy, dbh, dby], 
                                    [mWxh, mWhh, mWhy, mbh, mby]):
        mem += dparam * dparam
        param += -learning_rate * dparam / np.sqrt(mem + 1e-8) # adagrad update

      p += seq_length # move data pointer
      n += 1 # iteration counter

fdsfds

결과

무한 loop 이라, 적정한 시점에 알아서 멈추면 된다.

아래 결과는 대략 12시간 정도를 돌렸다. 컴퓨터 스펙은 아래와 같다.

CPU: i7-6700, 3.40GHz
RAM: 16GB

----
 beahngy
amo k ns aeo?cdse nh a taei.rairrhelardr nela haeiahe.
Ddelnss.eelaishaner” cot AAfhB ht ltny
ehbih a”on bhnte ectrsnae abeahngy
amo k ns aeo?cdse nh a taei.rairrhelardr nol e iohahenasen
----
iter 9309400, loss: 0.000086
----
 e nh a taei.rairrhelardr naioa aneaa ayio pe e bhnte ayio pe e h’e btentmuhgehi bcgdltt. gey heho grpiahe.
Ddelnss.eelaishaner” cot AAfhB ht ltny
ehbih a”on bhnte ectrsnae abeahngy
amo k ns aeo?cds
----
iter 9309500, loss: 0.000086
----
 jCTCnhoofeoxelif edElobe negnk e iohehasenoldndAmdaI ayio pe e h’e btentmuhgehi bcgdltt. gey heho grpiahe.
Ddelnss.eelaishaner” cot AAfhB ht ltny
ehbih a”on bhnte ectrsnae abeahngy
amo k ns aeo?cds
----
iter 9309600, loss: 0.000086
----
 negnk e iohehasenoldndAmdaI ayio pe e h’e btentmuhgehi bcgdltt. gey heho grpiahe.
Ddelnss.eelaishaner” cot AAfhB ht ltny
ehbih a”on bhnte ectrsnae abeahngy
amo k ns aeo?cdse nh a taei.rairrhelardr
----
iter 9309700, loss: 0.000086
----
 aI ayio pe e h’e btentmuhgehi bcgdltt. gey heho grpiahe.
Ddelnss.eelaishaner” cot AAfhB ht ltny
ehbih a”on bhnte ectrsnae abeahngy
amo k ns aeo?cdse nh a taei.rairrhelardr neli ae e angnI hyho gben
----
iter 9309800, loss: 0.000086
----
 gehi bcgdltt. gey heho grpiahe.
Ddelnss.eelaishaner” cot AAfhB ht ltny
ehbih a”on bhnte ectrsnae abeahngy
amo k ns aeo?cdse nh a taei.rairrhelardr nela dr iohecgrpiahe.
Ddelnss.eelaishaner” cot AA
----
iter 9309900, loss: 0.000086
----
 piahe.
Ddelnss.eelaishaner” cot AAfhB ht ltny
ehbih a”on bhnte ectrsnae abeahngy
amo k ns aeo?cdse nh a taei.rairrhelardr nol e iohahenasenese hbea bhnte ectrsnae abeahngy
amo k ns aeo?cdse nh a t
----
iter 9310000, loss: 0.000086
----
 er” cot AAfhB ht ltny
ehbih a”on bhnte ectrsnae abeahngy
amo k ns aeo?cdse nh a taei.rairrhelardr nela hamnaI ayio pe e h’e btentmuhgnhi beahe
Ddabealohe bee amoi bcgdltt. gey heho grpiahe.
Ddeln
----
iter 9310100, loss: 0.000086
----
 bih a”on bhnte ectrsnae abeahngy
amo k ns aeo?cdse nh a taei.rairrhelardr nol gyio pe e h’e btentmuhgehi bcgdltt. gey heho grpiahe.
Ddelnss.eelaishaner” cot AAfhB ht ltny
ehbih a”on bhnte ectrsnae
----
iter 9310200, loss: 0.000086
----
 beahngy
amo k ns aeo?cdse nh a taei.rairrhelardr ntlhnegnns. e
amo k ns aeh?cdse nh a taei.rairrhelardr nol e iohehengrpiahe.
Ddelnss.eelaishaner” cot AAfhB ht ltny
ehbih a”on bhnte ectrsnae abeah
----
iter 9310300, loss: 0.000086
----
 e nh a taei.rairrhelardr nol’e btentmuhgehi gcdslatha arenbggcodaeta tehr he ni.rhelaney gehnha e ar i ho  bee amote ectrsnae abeahngy
amo k ns aeo?cdse nh a taei.rairrhelardr nol nyio chge heiohecgr
----
iter 9310400, loss: 0.000086
----
 jCTCnhoofeoxelif edElobe negnk e iohehasenoldndAmdaI ayio pe e h’e btentmuhgehi bcgdltt. gey heho grpiahe.
Ddelnss.eelaishaner” cot AAfhB ht ltny
ehbih a”on bhnte ectrsnae abeahngy
amo k ns aeo?cds
----
iter 9310500, loss: 0.000086
----
 negnk e iohehasenoldndAmdaI ayio pe e h’e btentmuhgehi bcgdltt. gey heho grpiahe.
Ddelnss.eelaishaner” cot AAfhB ht ltny
ehbih a”on bhnte ectrsnae abeahngy
amo k ns aeo?cdse nh a taei.rairrhelardr
----
iter 9310600, loss: 0.000086
----
 aI ayio pe e h’e btentmuhgehi bcgdltt. gey heho grpiahe.
Ddelnss.eelaishaner” cot AAfhB ht ltny
ehbih a”on bhnte ectrsnae abeahngy
amo k ns aeo?cdse nh a taei.rairrhelardr nelardae abeahngy
amo k
----
iter 9310700, loss: 0.000086
----
 gehi bcgdltt. gey heho grpiahe.
Ddelnss.eelaishaner” cot AAfhB ht ltny
ehbih a”on bhnte ectrsnae abeahngy
amo k ns aeo?cdse nh a taei.rairrhelardr ntl negnk t hi rsnse nhk br ne” a naeiarairr elirs
----
iter 9310800, loss: 0.000086
----
 piahe.
Ddelnss.eelaishaner” cot AAfhB ht ltny
ehbih a”on bhnte ectrsnae abeahngy
amo k ns aeo?cdse nh a taei.rairrhelardr nelardaenabeahngelareierhi. aif edElobe negrcih gey gey heho grpiahe.
Ddel
----

[컴] git rebase 설명

git rebase / 깃 리베이스 /

git rebase

git rebase 는 아래처럼 설명이 된다.

Git - Rebase하기

실제로 일어나는 일을 설명하자면 일단 두 브랜치가 나뉘기 전인 공통 커밋으로 이동하고 나서

그 커밋부터 지금 Checkout한 브랜치가 가리키는 커밋까지 diff를 차례로 만들어

어딘가에 임시로 저장해 놓는다.

Rebase할 브랜치(역주 - experiment)가 합칠 브랜치(역주 - master)가 가리키는 커밋을 가리키게 하고

아까 저장해 놓았던 변경사항을 차례대로 적용한다.

그림 3-29는 이러한 과정을 나타내고 있다.

다른 vcs 와 연관지어서...

이보다 이해를 돕기 위해 좀 더 근원적인 설명을 해보려 한다.

일단 rebase 의 의미에서 시작하자. rebase 라는 용어는 git 에서 처음사용된 것이 아니다. 이전의 version control software 에서 존재하던 용어이다.

그냥 단순하게 해석하면 "base 를 다시(re)" 만드는 것이다.

그럼 base 가 무엇을 이야기 하는 것일까. 그것은 현재 내가 작업하고 있는 source 다. 그래서 rebase 를 한다는 것은 현재 내가 base 로 사용하던 source 를 새로 다시 받는 것(rebase)이라고 보면 된다.

이것이 결국 git 에서도 같은 방식으로 동작한다.

git rebase 를 하면 현재 내가 branch 를 받은 시점, 즉 내 작업들이 적용되기 전,(commit 을 하기전) 으로 돌아가서, 나의 base 를 새로운 내용이 적용된 base 로 변경한다. 그리고 내가 작업한 내용(commit) 을 적용하게 되는 것이다.

git rebase 는 여기에 더해 훨씬 다양한 기능을 제공한다. 자세한 내용은 git rebase man page 를 확인하자.

[컴][js] redux-saga 동작 분석

redux-saga 동작 분석

source from : https://github.com/redux-saga/redux-saga-beginner-tutorial

위의 예제로 분석할 것이다.

// saga.js
import { put, takeEvery, all  } from 'redux-saga/effects'

const delay = (ms) => new Promise(res => setTimeout(res, ms))


export function* helloSaga() {
  console.log('Hello Sagas!')
}


// ...

// Our worker Saga: will perform the async increment task
export function* incrementAsync() {
  yield delay(1000)
  yield put({ type: 'INCREMENT' })
}

// Our watcher Saga: spawn a new incrementAsync task on each INCREMENT_ASYNC
export function* watchIncrementAsync() {
  yield takeEvery('INCREMENT_ASYNC', incrementAsync)
}



// notice how we now only export the rootSaga
// single entry point to start all Sagas at once
export default function* rootSaga() {
  yield all([
    helloSaga(),
    watchIncrementAsync()
  ])
}

// main.js
import "babel-polyfill"

import React from 'react'
import ReactDOM from 'react-dom'
import { createStore, applyMiddleware } from 'redux'

import Counter from './Counter'
import reducer from './reducers'
//// default export 인 sagaMiddlewareFactory 를 createSagaMiddleware 에 assign 한 것
import createSagaMiddleware from 'redux-saga'
// import { rootSaga } from './sagas'
import rootSaga from './sagas'

// const store = createStore(reducer)
const sagaMiddleware = createSagaMiddleware()
const store = createStore(
  reducer,
  applyMiddleware(sagaMiddleware)
)
sagaMiddleware.run(rootSaga)

const action = type => store.dispatch({type})

function render() {
  ReactDOM.render(
    <Counter
      value={store.getState()}
      onIncrement={() => action('INCREMENT')}
      onDecrement={() => action('DECREMENT')}
      onIncrementAsync={() => action('INCREMENT_ASYNC')} />,
    document.getElementById('root')
  )
}

render()
store.subscribe(render)

createSagaMiddleware()

createSagaMiddleware() --> sagaMiddlewareFactory() 가 호출된다. sagaMiddlewareFactory 를 확인해보면 대체로 이전버전의 사용법에 대한 error 를 던져주기 위한 용도인듯 하다. 그외에는 sagaMiddleware 라는 function 을 만들어서 return 해준다.

sagaMiddleware() 를 호출할 때 runSaga 에 첫번째 argument 를 assign 해준다.(이것이 boundRunSaga) 그리고 sagaMiddleware.run 을 하면, 이 boundRunSaga 를 호출한다.

createStore() / applyMiddleware()

applyMiddleware() 는 단순하다. 기존의 createStore() 를 실행하고, 그 이외에 다른 동작을 하기 위한 함수라고 보면 된다. python 의 decorator 같은 역할이다.

물론 로직은 좀 더 복잡하게 짜여있다. applyMiddleware() 를 직접실행시키는 것이 아니라createStore 내에서 applyMiddleware() 가 decorator 처럼 동작해야 해서 createStore 내에서 다시 호출(call)한다. 그런 이유로 applyMiddleware() 가 function 을 return 한다. 말이 복잡하다. source code 를 확인하자.

export default function applyMiddleware(...middlewares) {
  return createStore => (...args) => { // pass the createStore as an argument
    const store = createStore(...args)
    let dispatch = () => {
      throw new Error(
        'Dispatching while constructing your middleware is not allowed. ' +
          'Other middleware would not be applied to this dispatch.'
      )
    }

    const middlewareAPI = {
      getState: store.getState,
      dispatch: (...args) => dispatch(...args)
    }
   // 기존의 dispatch 에 middleware 를 추가해서 dispatch 를 새롭게 만든다.
    const chain = middlewares.map(middleware => middleware(middlewareAPI))
    dispatch = compose(...chain)(store.dispatch)

    return {
      ...store,
      dispatch
    }
  }
}
...
export default function createStore(reducer, preloadedState, enhancer) {
  ...
  if (typeof enhancer !== 'undefined') {
    if (typeof enhancer !== 'function') {
      throw new Error('Expected the enhancer to be a function.')
    }

    return enhancer(createStore)(reducer, preloadedState)
  }
  ...
}

sagaMiddleware.run(rootSaga)

sagaMiddleware.run(rootSaga) 를 하면 runSaga() 가 호출된다. runSaga() 에서는 필요한 설정들에 대한 값들을 set 하고(sagaMonitor 같은) proc() 를 호출한다.(proc.js)

proc(env, iterator, context, effectId, getMetaInfo(saga), /* isRoot */ true, noop)

proc 에서 stdChannel 를 생성하는데, 이녀석은 약간의 경고문구만 추가한 eventChannel 이다. 이녀석은 createSagaMiddleware 를 할 때 생성된다.

proc 에서 next() 를 호출하고, next 는 iterator.next() 를 호출한다. iterator 를 이용해서 all() 로 묶인 task 들을 한번에 처리한다.

// saga.js
export default function* rootSaga() {
  yield all([
    helloSaga(),
    watchIncrementAsync()
  ])
}
...

// runSaga.js
function runSaga(_ref, saga) {
  ...

  var iterator = saga.apply(void 0, args);
  ...
  return immediately(function () {
    var task = proc(env, iterator, context, effectId, __chunk_1.getMetaInfo(saga),
    /* isRoot */
    true, __chunk_1.noop);

    if (sagaMonitor) {
      sagaMonitor.effectResolved(effectId, task);
    }

    return task;
  });
}

function proc(env, iterator, parentContext, parentEffectId, meta, isRoot, cont) {
  if (iterator[__chunk_1.asyncIteratorSymbol]) {
    throw new Error("redux-saga doesn't support async generators, please use only regular ones");
  }
  ...
  next(); // then return the task descriptor to the caller
  
  return task;
  /**
   * This is the generator driver
   * It's a recursive async/continuation function which calls itself
   * until the generator terminates or throws
   * @param {internal commands(TASK_CANCEL | TERMINATE) | any} arg - value, generator will be resumed with.
   * @param {boolean} isErr - the flag shows if effect finished with an error
   *
   * receives either (command | effect result, false) or (any thrown thing, true)
   */

  function next(arg, isErr) {
    try {
      var result;
      
      if (isErr) {
        result = iterator.throw(arg); // user handled the error, we can clear bookkept values
        clear();
      } else if (__chunk_1.shouldCancel(arg)) {
        /**
          getting TASK_CANCEL automatically cancels the main task
          We can get this value here
           - By cancelling the parent task manually
          - By joining a Cancelled task
        **/
        ...
      } else if (__chunk_1.shouldTerminate(arg)) {
        // We get TERMINATE flag, i.e. by taking from a channel that ended using `take` (and not `takem` used to trap End of channels)
        ...
      } else {
        result = iterator.next(arg);
      }

      ...
    } catch (error) {
      if (mainTask.status === CANCELLED) {
        throw error;
      }

      mainTask.status = ABORTED;
      mainTask.cont(error, true);
    }
  }
  ...
}

redux-saga 사용

redux-saga 는 특정 이벤트가 끝날때에 대한 작업을 미리 등록해 놓을 수 있다.(saga.js)

그래서 원래라면,

특정 이벤트가 발생하고,
그 작업을 수행한 후(event handler)에
우리가 원하는 작업을 실행하기 위해서 우리의 code를 추가해야 한다.(callback)

하지만 redux-saga 는 그부분의 대한 구현에 관계없이 구현을 하면 된다.

redux-saga 를 이용해서 event 가 끝나는 시점에 호출되는 대신에,

먼저 특정 event 를 기다리는 code 를 실행하고
event 가 발생할 때 까지 control 을 다른 곳에 넘겨준다. 그리고 event 가 발생하면 다시 control 을 받아온다.(yield)

  yield takeEvery('INCREMENT_ASYNC', incrementAsync)

간단히 이야기 하면, await/async 를 사용하게 해주는 것이라 보면 된다.

아래 글을 읽으면 자세한 이야기를 알 수 있다.

https://redux-saga.js.org/docs/advanced/FutureActions.html

[컴][NodeJs] CPU Intensive job 을 Node.js 에서 처리할 때

노드js nodejs 에서 싱글 쓰레드 / single thread job / cpu 사용량이 큰 작업 처리 방법 / node multi / nodejs / multi / multi thread / multithread

CPU Intensive job 을 Node.js 에서 처리할 때

node js - single thread

기본적으로 single thread 로 알려져 있는데, 이것은 약간 이해를 잘 못 할 수 있어서 간단한 설명을 하자면, ref. 2 에서 여기 를 보면 node js 의 구성을 살짝 엿볼 수 있는데, libeio 를 통해 thread pool 을 만들어서 사용한다. (ref. 3)

Node.js Event Loop

그렇기 때문에 이것이 single threaded 인 것은 user 의 source code 를 run 하는 부분을 이야기 한다. 아래 코드를 예로 들면 user 의 code 부분은 하나의 thread 에서 동작한다. 하지만 fs.readFile 같은 부분은 Thread Pool 로 던져질 것이다. 그리고 response 가 오기까지 또 이 single thread 는 console.log('right after') 부분으로 처리할 수 있게 되는 것이다.

var fs = require('fs');  
fs.readFile(‘/files/help.txt’, function(err, buf) {  
    console.log('done');
});
console.log('right after');

from : ref. 1

cpu intensive job

그러므로 nodejs 로 만든 server 들은 기본적으로 single thread 로 user request 를 처리하게 된다. 하지만 만약 request 하나에서 cpu intensive 한 작업을 처리하는 경우가 있는 request 에 대해서는 nodejs 에서 처리가능한 request 양이 급격히 줄어들 수 밖에 없다. 이런 문제점에 대한 해결책에 대한 이야기가 아래 글들에 담겨있다.

대략적인 내용은 cluster 를 사용해서 request 마다 process 를 하나 더 만들어서 cpu intensive job 을 그쪽으로 넘기는 것인데, 문제는 이것이 너무 많은 memory 를 할당해야 한다는 것이다. 그래서 cluster 와 queue(kue library) 를 이용해서 일종의 process pool 을 만들어서 사용하는 해결책을 제시한다.

여담으로, 이 방법들을 살펴보고 있으려니까 아주 오래전에 linux 에서 c 로 server programming 을 하던 때의 생각이 난다.

node js http server timeout

What is the default timeout of a req in express? · Issue #3330 · expressjs/express · GitHub

nodejs v6.x 에서 일단 기본 timeout 이 2 분으로 설정되어 있다고 한다. 즉, request 를 보낸지 2분후 까지 response 가 없으면 error 가 된다. 이 timeout 내에 더 많은 request 를 처리하기 위해서라도 "cluster 의 사용"은 필요할 듯 하다.

References

Introduction to NodeJS, A SSJS: Part II - EventLoop Explained
https://youtu.be/L0pjVcIsU6A?t=366

동영상에서는 Node.js 의 철학? 을 알 수 있다.

javascript - Nodejs Event Loop - Stack Overflow
Cluster | Node.js v12.3.1 Documentation

[컴][웹][js] v8 Scanner

from: https://v8.dev/blog/scanner

v8 Scanner

v8 이 소스를 파싱해서 AST 로 만든다.abstract syntax tree (AST)
이걸로 프로그램 구조를 표현한다.

v8은 complile 해야 프로그램을 실행할 수 있다.
v8 parser 은 scanner 에서 만들어놓은 token 을 소모한다.

이때 scanner 가 사용하는 character stream의 encoding 은 utf16만 지원한다.
scanner 에 input 으로 제공하기 전에 이전에 모든 encoding 을 utf16으로 convert 하는 과정을 거친다.

scanner는 이 utf16을 decode 해서 unicode 로 만들어서 사용한다.

Scanner::Advance()

이 decoded character 들을 이제 가져다 scanner 가 token 을 만든다.

Scanner::ScanString()

Whitespace

2개의 token 사이에 whitespace 가 있으면 거기에 ;(semicolon)을 넣는다.(ECMAScript® 2020 Language Specification > Auto semicolon insertion)

다음 token 을 scan 하기 전까지 모든 whitespace는 skip 된다.(newline 이 발생하는지 추적하면서)

real world 에서는 대체로 js 들이 minified 돼서 여러개의 whitespace 가 연속해서 나타나는 경우가 드물다.

그래서 V8 은 whitespace 도 하나의 token 으로 취급한다. (Token::WHITESPACE) 그래서 '//' 같이 comment 가 오는 경우에도 Token::WHITESPACE 를 return 해 버린다. 그러면 paring 하는 loop 에서 다른 Token이 나올때 까지 계속 scan 을 이어나갈 수 있게 된다.

(역주: 다시 말하면, // 가 나와서 이것을 다른 token 으로 정의(Token::COMMENT) 등으로 정의해서 사용하게 되면, loop 을 한번 벗어난 후에 다시 parsing 을 시작해야 하지만, 이것을 whitespace 와 같은 속성으로 취급해 버리면, 이 overhead 를 없앨 수 있게 되는 것이다.)

Scanner::ScanSingleToken()

V8_INLINE Token::Value Scanner::ScanSingleToken() {
  Token::Value token;
  do {
    next().location.beg_pos = source_pos();

    if (V8_LIKELY(static_cast<unsigned>(c0_) <= kMaxAscii)) {
      token = one_char_tokens[c0_];

      switch (token) {
        ...
        
        case Token::DIV:
          // /  // /* /=
          Advance();
          if (c0_ == '/') {
            uc32 c = Peek();
            if (c == '#' || c == '@') {
              Advance();
              Advance();
              token = SkipSourceURLComment();
              continue;
            }
            token = SkipSingleLineComment();
            continue;
          }
          if (c0_ == '*') {
            token = SkipMultiLineComment();
            continue;
          }
          if (c0_ == '=') return Select(Token::ASSIGN_DIV);
          return Token::DIV;
          
        ...

        case Token::WHITESPACE:
          token = SkipWhiteSpace();
          continue;

        case Token::NUMBER:
          return ScanNumber(false);

        case Token::IDENTIFIER:
          return ScanIdentifierOrKeyword();

        default:
          UNREACHABLE();
      }
    }

    if (IsIdentifierStart(c0_) ||
        (CombineSurrogatePair() && IsIdentifierStart(c0_))) {
      return ScanIdentifierOrKeyword();
    }
    if (c0_ == kEndOfInput) {
      return source_->has_parser_error() ? Token::ILLEGAL : Token::EOS;
    }
    token = SkipWhiteSpace();

    // Continue scanning for tokens as long as we're just skipping whitespace.
  } while (token == Token::WHITESPACE);

  return token;
}

Identifier scanning
Internalizing minified identifiers
Keywords
Surrogate pairs
AdvanceUntil

썬더클랩(Thunderclap)

이 취약점은 물리적인 접근을 한 공격자에게 Thunderbolt(썬더볼트) 포트를 이용해서 해당기계를 위험하게 한다.
공격자가 가장높은 권한으로 임의의 코드를 실행시킬 수 있게 해준다. 그래서 잠재적으로 암호, 은행 로그인, 암화화 키, 개인파일등에 접근할 수 있게 해준다.
또한, 공격자가 이 취약점을 이용하면 겉으로 무해해 보이는 충전기, 프로젝터 같은 주변기기가 충전이나, 영상을 쏘아 주는 사이에 "host 머신"을 위험하게 할 수 있다.(이러면, 우리가 제3자의 충전기도 마음대로 사용하기 힘들어진다. 마치 공용 wifi 같이)

동작

네트워크 카드의 동작

OS 가 network interface card(NIC) 에 packet 을 보내라고 요청할 때
OS 는 NIC에 보낼 data 에 대한 address 를 제공해 준다.
그러면 NIC 의 payload 함수는 plaintext data를 찾으려고 memory 근처를 검색할 것이다.

Thunderclap 방법

When an IOMMU is in operation, the most obvious way touse it for protection requires some changes to ring buffer usage.

First, packet data must be allocated from a pool of physical memory that allows exposure to devices (some memory maybe inaccessible due to hardware limitations).
Second, before a data block is placed in the ring buffer for transmission, a window must be opened for it to be accessible by the device. This involves creating a mapping for the block in the IOMMU page table.
Third, the address written into the ring buffer is now the I/O virtual address of the mapping, rather than the physical address.
Finally, when the device is finished with the data, the operating system should close the window again, revoking the mapping from the IOMMU page table and IOTLB.

How do the Thunderclap vulnerabilities work?

The Thunderclap vulnerabilities stem from the fact that computer peripherals such as network cards and GPUs have traditionally been trusted parts of a computer system: they have direct memory access (DMA), which allows them to read and write all of system memory without operating system oversight. DMA allows peripherals to bypass operating system security policies, and DMA attacks abusing this access have been widely employed by hackers and the intelligence community to take control of and exfiltrate sensitive data from target machines. This means passwords, banking logins, private files and browser activity are all exposed, and an attacker can inject any code they wish onto your machine.

Current systems feature input-output memory management units (IOMMUs), protection mechanisms that allow the operating system to restrict peripheral-device memory access. With IOMMU usage enabled, operating systems can protect against DMA attacks by restricting memory access to peripherals that perform legitimate functions and only allowing access to non-sensitive regions of memory. Unfortunately, IOMMU protection is turned off by default in many systems.

Our work leverages vulnerabilities in operating system IOMMU usage to compromise a target system via DMA, even in the presence of an IOMMU that is enabled and configured to defend against DMA attacks. The novel Thunderclap security evaluation platform, built on field-programmable gate array (FPGA) hardware, mimics the functionality of a legitimate peripheral device to convince a target operating system to grant it access to regions of memory. It then examines those regions of memory to find a rich and nuanced attack surface of vulnerable structures that can be exploited to take control of the system.

The rise of hardware interconnects like Thunderbolt 3 over USB-C that combine power input, video output, and peripheral device DMA over the same port greatly increases the real-world applicability of Thunderclap vulnerabilities. Thunderbolt can allow potentially malicious devices to hotplug into a running machine and obtain direct memory access, which makes DMA attacks against temporarily unattended targets feasible. Furthermore, the confusion of power, video, and DMA facilitates the creation of malicious charging stations or projectors that take control of connected machines.

Additionally, our work shows that the Thunderclap vulnerabilities can also be exploited by compromised firmware on existing PCI Express devices, for example network cards or baseboard management controllers (BMCs) integrated into servers. A firmware compromise might be introduced via a firmware vulnerability or a compromise in the device supply chain or factory.

How does this work differ from earlier DMA attacks such as Inception?

Early DMA attacks relied on the absence of an IOMMU. They involved scanning all of a system’s memory for sensitive data from devices that did not appear to the system as legitimate peripherals. These attacks were addressed by the introduction of IOMMUs, which block all memory access from unrecognized devices.

Some previous DMA attacks have taken advantage of weaknesses in IOMMU configuration or setup to disable IOMMU protections. Thunderclap explores serious vulnerabilities that are present even once the IOMMU is configured correctly.

Technical details of the Thunderclap platform

The Thunderclap platform consists of an FPGA that runs the Thunderclap application. The FPGA then plugs into a computer via PCI Express or Thunderbolt. The Thunderclap application makes the FPGA behave to the computer like a genuine Ethernet card (the Intel 82574L network interface card or NIC). The operating system will identify the ethernet peripheral, load drivers, allow the device to access memory (via DMA and an IOMMU if enabled), and ask it to send and receive packets.

With this deep interaction with the operating system, Thunderclap’s device model provides hooks that allow payload functions to be added to device behavior. For example, when the operating system asks the NIC to send a packet, it provides the NIC with the address of the data to send. A payload function might search nearby memory looking for plaintext data that was intended for a different network device.

The Thunderclap application runs on Intel/Altera FPGA boards:

Intel Arria 10 SoC Development Kit ($4500) with Samtec HDR-181157-01-PCIEC cable (available from Samtec direct) - currently recommended
Enclustra Mercury+ AA1 module (ME-AA1-270-3E4-D11E) on PE1 carrier board (~EUR 800) - work in progress
Terasic DE5-Net board (Stratix V) with BERI soft-CPU - no longer supported
As far as we can ascertain, Xilinx, Lattice and Intel Cyclone FPGAs don’t allow us to replace the vendor-supplied implementation of configuration registers with our own (Intel calls it ‘config bypass’ mode) which we require.

It is composed of several pieces:

The underlying FPGA bitfile, containing the hardware that receives PCIe packets (TLPs) and delivers them to software. The FPGA contains an Arm Cortex A9 CPU (hard processor system or HPS) to run our software stack. GitHub repo
The Ubuntu 16.04 operating system running on the on the Arm, including kernel, device tree and u-boot bootloader (which also loads the FPGA bitfile at boot time). Automated build scripts (work in progress): GitHub repo
The Thunderclap application, which is a substantially cut down version of QEMU, based on its e1000e device. This runs in Ubuntu on the ARM core and connects directly to the PCIe queues provided by the hardware. GitHub repo

[컴] V8 의 메모리 관련, Garbage Collector GC

가비지 컬렉터 ,

V8 의 메모리 관련 글

Trash talk: the Orinoco garbage collector · V8, 2019.01.03 : V8의 GC : Orinoco garbage collector
https://developers.google.com/web/tools/chrome-devtools/memory-problems/memory-101 : V8 의 메모리관련 정보
High-performance garbage collection for C++ · V8

브라우저의 대부분과 Blink rendering 은 c++ 로 되어 있다.
v8은 Blink 안에 들어가 있다.
Oilpan 은 c++ memory 를 관리하기 위해 만들어진 garbage collector 이다. c++ 로 되어 있다.
이 c++ memory 는 cross-component tracing 을 이용해서 v8 과 연결될 수 있다.
이 cross-component tracing 은 얽혀있는 c++/javascript object graph 를 하나의 heap 처럼 다룬다.

iOS 보안-시스템 보안

기기암호화(device encryption) 는 기본적으로 설정되어 있으며, 설정을 변경할 수 없다.

Secure Boot Chain

iOS 켜기
--> AP 가 Boot ROM 의 code 를 실행

the hardware root of trust 라고 알려져 있다.
the chain of trust 의 첫번째 단계(first step) 이다.
이 code 는 변경불가능(immutable code)
칩제조(chip fabrication) 때 만들어 진다.
implicitly trusted
Apple Root CA public key 를 가지고 있다.

이 public key 는 iBoot bootloader 가 load 되기 전에 Apple 에 의해 서명(sign) 됐다는 것을 확인(verify) 하는데 사용된다.(아래 boot-time verification 참고.)

Boot ROM 에 의해 추가적인 Low-Level Bootloader(LLB) stage 가 load 되고, 확인(verify) 된다.

A9를 포함한 A9 이전의 A시리즈 processor 에만 존재.

만약 Boot ROM 이 LLB 또는 iBoot 을 load 하는 것을 실패하면, 기기는 DFU mode (Device Firmware Upgrade)로 들어가게 된다.

오래된 기기들에서 LLB 를 load 하는 것을 실패 하거나,
새로운 기기들에서 iBoot 을 load 하는 것을 실패하는 것

LLB 나 iBoot 이 다음 작업(next step) 을 load 하거나 verify 하는 것을 실패하는 경우

startup 은 멈춰지고(halt) iTunes 에 접속하라는 화면이 보여진다.
--> recovery mode 이다.

DFU mode 로 가거나, recovery mode 로 들어가면, USB 를 이용해서 iTunes 에 접속해서 "공장 기본 세팅(factory default settings)" 로 복구되어 져야만 한다.
기기가 Recovery Mode 와 DFU Mode 로 들어가기 전에 The Boot Progress Register (BPR) 가 update 된다. 이 update 된 값을 보고 Secure Enclave 가 user data 에 대한 접근제한을 하게 된다.

Recovery Mode

A10, S2 그리고 최신의 SoCs 를 가진 기기에서 iBoot 에 의해 BPR 이 set.

DFU Mode : A12 SoC 를 가진 기기에서 Boot ROM 에 의해 BPR 이 set.

--> iBoot 이 일들을 마치면
--> iBoot 은 iOS kernel 을 확인(verify) 하고, 실행한다.

baseband subsystem 또한 이것과 비슷한 자신의 secure booting 을 이용한다.
Secure Enclave coprocessor 또한 secure boot process 를 이용하고, 그로인해 그것의 분리되 소프트웨어가 verified and signed by Apple 되었다는 것을 확신시켜준다.

secure boot

시스템 소프트웨어 권한부여(System Software Authorization)

만약 과거버전의 iOS의 설치가 가능하다면 해커는 과거버전의 iOS를 깔고 권한을 가져올 수 있다. 이것을 막기 위해서 iOS는 System Software Authorization 을 이용한다. Secure Enclave 도 이것을 이용한다.

iOS update

iOS update 는 2가지 방식(iTunes, OTA) 으로 가능하다.

iTunes 로 업데이트를 할 때

iOS image 전체를 download
Content Caching 옵션을 켠 macOS High Sierra 를 돌리는(run) Mac 을 사용한다면, 미리 cache 되어서, iOS 의 update 를 바로 할 수 있게 해준다.

OTA 로 할 때: patch 를 할 내용만 다운로드한다.

installation authorization server

iOS update 할때 기기는 Apple 의 installation authorization server 에 접근한다.
그리고 다음 3가지를 확인 한다.

bundle의 measurements: 설치될 installation bundle들에 대한 암호화된 measurements(측정)

installation bundle : iBoot, kernel, iOS Image

nonce : signed data 를 가져다 다른 기기에 쓰는 것을 막아주고, system software 를 변경하지 못하게 막는다.
ECID: 기기의 고유한 ECID(Exclusive Chip Identification) 를 서버로 보내게 된다.

installation authorization server 에서 하는일

서버에서는 이 installation bundle 의 버전들이 맞는 구성인지를 확인한다. 예를 들면 특정버전의 iBoot 은 특정 version 의 kernel 을 써야만 하는식이다. 이렇게 버전이 맞으면 이 measurement 에 ECID 를 더해서 sign 을 하고 이 sign 된 data 를 기기에 보내준다.

sign(measurment + ECID) ---> device

boot-time verification

위에 설명한 startup process 가 "Apple 이 sign 한 code" 만 기기에 설치되는 것을 보장해 준다.

Apple 의 signature : boot time 에 일어나는 chain-of-trust evaluation 은 Apple 의 signature 를 verify 한다.
disk 에서 load 되는 item 의 measurement: "disk 에서 load 되는 item 의 measurement (기기의 ECID를 이용해서 만들어져 있다.)"와 "signature 에 의해 감싸진 measurement(위의 installation authorization server 에서 받은)"가 맞는지를 verify 한다.

이 절차들이 특정기기에 대한 authorization 을 해주고, 기기에서 옛 iOS버전이 다른 곳으로 copy 안되도록 한다.

Secure Enclave

Secure Enclave 는 보조 프로세서이다.
SoC 안에 들어있다.(fabricated)
암호화된 메모리를 사용
하드웨어 난수발생기(random number generator) 를 포함.
Secure Enclave Boot ROM

application processor Boot ROM 과 비슷하게, Secure Enclave Boot ROM 은 변경이 불가능한 code (immutable code) 이다. Secure Enclave 를 위한 "하드웨어 root of trust" 를 확립한다.(establish)
device 가 시작될 때, Secure Enclave Boot ROM 이 수명이 짧은 메모리 보호 키(ephemeral memory protection key)를 만든다.

Secure Enclave 는 데이터 보호 키 관리에게 (Data Protection key management) 모든 암호화 기능(cryptographic operations)들을 제공
심지어 커널이 허락한 상태에도(kernel has been compromised) Data Protection 의 integrity (데이터가 변경, 파괴되지 않은 상태)을 유지.
AP(Application processor) 와 Secure Enclave 사이의 통신은 interrupt-driven mailbox 와 shared memory data buffer 들과 분리되어 있다.
Secure Enclave OS 를 실행

Secure Enclave OS

"L4 마이크로 커널의 Apple-customized version" 에 기초해서 만들어진 OS
Apple 에 의해 sign 된다.
Secure Enclave Boot ROM 가 verify
개인의 소프트웨어 업데이트 프로세스를 통해 update 된다.

device 가 시작될 때, Secure Enclave Boot ROM 이 수명이 짧은 메모리 보호 키(ephemeral memory protection key)를 만든다.

이 키(memory protection key)는 기기의 UID 와 얽히게 만들고(entangle), 기기의 memory space 의 "Secure Enclave 부분"을 암호화 할 때 사용된다.
예외적으로 Apple 의 A7 에선, Secure Enclave memory "memory protection key" 로도 인증(authenticated) 이 가능하다. A11과 A11 이후 그리고 S4 SoC 들에서는 "memory protection key" 와 on-chip SRAM 에 저장된 nonce 들에 의해 인증된다.(authenticated)
그리고, A11과 A11 이후 그리고 S4 SoC 들에서 integrity tree 는 보안이 중요한 Secure Enclave memory 의 replay 를 막기위해 사용되어진다.

Secure Enclave 에 의해 file system 에 저장된 data 는 UID 와 entangled 된 key와 anti-replay counter 로 암호화되어 진다. anti-replay counter 는 dedicated 비휘발성 메모리 IC 안에 저장되어 있다.

A12 와 S4 SoC 들이 있는 기기들에서, Secure Enclave 는 anti-replay counter storage 를 위해 secure storage IC 와 짝을 이룬다. secure storage IC 는 immutable ROM code, 하드웨어 random number generator, 암호화엔진들, physical tamper detection(물리적인 조작 감지) 과 함께 디자인됐다. counter 들을 읽고 업데이트 하기 위해서, Secure Enclave 와 storage IC 는 counter 들에게 exclusive access 를 보장하는 '안전한 protocol '를 사용한다.

Secure Enclave 의 "anti-replay 서비스"들은 event 에 사용된 data 의 철회(revocation of data over events)를 위해 사용되어진다.

이 이벤트들은 anti-replay 의 경계들(boundaries)을 표시 해 준다.
이벤트는 다음것들을 포함한다. 하지만 제한되진 않는다.

Passcode change
Touch ID or Face ID enable/disable
Fingerprint add/delete
Face ID reset
Apple Pay card add/remove
Erase All Content and Settings

Secure Enclave 는 또한 지문을 처리하고, Touch ID 와 Face ID 센서들로 부터 오는 얼굴데이터를 처리하는 책임도 진다. 그리고 맞는지 여부를 결정하고, 그리고나서, 유저를 대신해서 access 또는 구매를 가능하게 한다.

Reference

https://www.apple.com/business/site/docs/iOS_Security_Guide.pdf

[컴][웹] AdoniJS at Start up

아도니스 제이에스 / js

Adonisjs at startup

app.js

_loadPreLoadFiles 이전에 this._registerProviders()와 await this._bootProviders()를 호출한다. 이 때 app.js 에 접근해서 Providers 를 가져오고, 이 Provider 의 register() 와 boot() 을 호출한다.

preloaded files

아도니스js 에서 처음에 load 하는 file 들은 다음과 같다.(_loadPreLoadFiles)

'start/routes',
'start/events',
'start/socket',
'start/kernel',
'start/wsKernel'

이 부분은 Ignitor source 에서 확인할 수 있다.

new Ignitor(require('@adonisjs/fold'))
  .appRoot(__dirname)
  .fireHttpServer()
  .catch(console.error)



class Ignitor {
  constructor (fold) {
    this._fold = fold
    this._appRoot = null
    this._modulesRoot = null
    this._loadCommands = false

    /**
     * Files to be preloaded
     *
     * @type {Array}
     */
    this._preLoadFiles = [
      'start/routes',
      'start/events',
      'start/socket',
      'start/kernel',
      'start/wsKernel'
    ]

    /**
     * Default app file
     *
     * @type {String}
     */
    this._appFile = 'start/app.js'
    ...
  }
  ...
  async fire () {
   ...
   this._registerProviders()
   await this._bootProviders()
   ...
   this._loadPreLoadFiles()
  }

  async fireHttpServer (httpServerCallback) {
    try {
      await this.fire()
      await this._startHttpServer(httpServerCallback)
    } catch (error) {
      this._printError(error)
    }
  }

  /**
   * Return the exported values from the appFile. Also
   * it will validate the exports object to have all
   * required keys.
   *
   * @method _getAppAttributes
   *
   * @return {Object}
   *
   * @private
   */
  _getAppAttributes () {
    return require(path.join(this._appRoot, this._appFile))
  }

  /**
   * Registers an array of providers to the Ioc container. This
   * method will make use of the `appFile` to get the providers
   * list.
   *
   * @method _registerProviders
   *
   * @return {void}
   *
   * @private
   */
  _registerProviders () {
    this._callHooks('before', 'providersRegistered')

    /**
     * Getting list of providers and registering them.
     */
    const { providers, aceProviders } = this._getAppAttributes()
    const providersToRegister = this._loadCommands ? providers.concat(aceProviders) : providers
    this._fold.registrar.providers(providersToRegister).register()

    debug('registered providers')
    this._callHooks('after', 'providersRegistered')
  }

  /**
   * Boot providers
   *
   * @method _bootProviders
   *
   * @return {void}
   *
   * @async
   *
   * @private
   */
  async _bootProviders () {
    this._callHooks('before', 'providersBooted')

    /**
     * The providers set set on `registrar` when they were registered. We
     * use the same set to boot the previously registered providers.
     */
    await this._fold.registrar.boot()

    debug('booted providers')
    this._callHooks('after', 'providersBooted')
  }

  ...
  _loadPreLoadFiles () {
    this._callHooks('before', 'preloading')
    ...

    this._preLoadFiles.forEach((file) => {
      const filePath = path.isAbsolute(file) ? file : path.join(this._appRoot, file)

      /**
       * Require file when it's not optional or when optional
       * file exists
       */
      if (!this._isOptional(file) || this._fileExists(filePath)) {
        require(filePath)
      }
    })

    this._callHooks('after', 'preloading')
  }
}

[컴][웹] 간단하게 주기적으로 message 를 수신하는 web page

EventSource / cors / jsonp / different host / subscribe model /

tornado

EventSource object | Javascript

EventSource object 로 server 에 연결(subscribe) 을 해서 주기적으로 event 를 받을 수 있다. 자세한 동작은 일단 생략한다. ref. 2 에서 어느정도 설명을 해준다.

different domain

domain 이 다른 곳에 event 를 날려주는 server 가 있어도 무리없이 사용할 수 있다. 보통 html(javascript) 에서 다른 domain 인 경우는 보안을 이유로 ajax 요청이 되지 않는다. 그래서 jsonp 등(일반적으로 Cross-Origin Resource Sharing 이라고 이야기하는 방법)을 이용한다.

하지만 이 EventSource 는 그런 것의 제약을 받지 않는다. [ref. 1]

아래처럼 다른 domain 인 경우에는 다른 domain 을 uri 로 적어주면 된다.

var evtSource = new EventSource("//api.example.com/ssedemo.php", { withCredentials: true } );

evtSource.addEventListener("ping", function(e) {
  var newElement = document.createElement("li");
 
  var obj = JSON.parse(e.data);
  newElement.innerHTML = "ping at " + obj.time;
  eventList.appendChild(newElement);
}, false);

이렇게 다른 domain 에 대한 제약이 없어서 좋은 점은 기존의 web server 와 관계없이 새로운 event 처리를 하는 server 를 사용할 수 있다는 점이다. 그러면 주기적인 push event 등을 처리하는 server 를 따로 둘 수 있다.

Server side event vs WebSocket

event driven page 를 만들 땐 SSE 를 사용해도 되고, WebSocket 을 사용해도 된다. 개인적으로는 필요한 부분은 단방향이라서, 양방향 통신이 되는 WebSocket 이 굳이 필요하지 않았다.

아래는 SSE 와 WebSocket 을 간략하게 정리했다.

SSE

2009년 4월 23 일 WHATWG 에서 승인했다. [ref. 7]
하지만 SSE 가 좀 더 간편하다.
그리고 http protocol 위에서 구현되었다.
단방향 통신
client 의 connection 이 종료되었는지 여부를 알 수는 없다.(http 라서 당연한 것일지도.)

WebSocket

반면에 WebSocket 은 Tcp/ip 로 다른 port 를 사용한다. 그래서 이 port 가 firewall 에 의해 막혀 있을 수도 있고,
이 녀석을 이용하려면, protocol 을 또 정의해야 한다. (ref. 5 에서 좀 더 다양한 의견을 확인할 수 있다.)
양방향 통신

from: https://usp.technology/specification/mtp/websocket/websocket-session-handshake.png

Reference

[컴][웹] V8 에서 JavaScript 의 pipeline

v8 engine 의 동작 / 동작원리 / 크롬 자바스크립트 엔진 / 크롬 엔진 / 크롬 렌더링 엔진 /

V8 에서 JavaScript 의 pipeline

v5.9 이전의 pipeline

from: https://v8.dev/blog/ignition-interpreter

원래는 baseline compiler(위의 그림에서 Ignition 과 Full codegen 의 위치에 있는 것이라 여기면 될 듯 하다.) 가 machine code 를 빠르게 만들고, 이 code 가 실행되는 동안에 이 code를 분석하고 일부를 optimizing compiler 가 optimized code 로 다시 compile 한다.

이 때 사용되는 2개의 optimizing compiler 가 crankshaft, turbofan 이다.

TurboFan : Ignition의 bytecode를 바로 최적화된 machine code 로 바꿀 수 있다.[ref. 4]
Crankshaft: 소스코드를 다시 컴파일을 해서 최적화된 machine code 를 만든다.[ref. 4]

참고로, v5.9 부터 Full-codegen 과 Crankshaft 는 사용하지 않는다.[ref. 4]

Ignition 의 등장

그런데 이 상황에서 Ignition 을 만들어서 "baseline compiler 가 machine code 를 만드는 것"을 대신해서 Bytecode 를 만들게 했다.

Ignition 는 처음에 모바일에서 사용하기 위해서 만들었다.[ref. 4] JIT 가 만든 machine code 의 size 가 커서 메모리를 너무 잡아먹었기 때문이다. Ignition 에 의해서 chrome tab 마다 메모리를 약 5% 정도 아꼈다.

Ignition 은 bytecode 를 생성하는 compiler 이고, 이녀석은 이전의 baseline compiler 를 대체한다. Ignition 이 bytecode 를 만들때도 당연히 최적화를 한다.

이 Ignition 은 register machine 이다. 이것은 stack machine 과 다르다. 이건 내 생각이지만, 안드로이드의 경험이 덕을 본듯 하다. (stack-based vs register-based)

Ignition 이 생기면서 고성능의 interpreter 가 생겼고 이 녀석이 Ignition 이 만든 bytecode 를 실행해주는데, 실제웹사이트에서 속도가 이전의 baseline compiler 가 만든 code 의 속도에 근접한다.

References

Ignition · V8 : ignition 에 대한 여러자세한 설명들의 link 들이 있다.
Firing up the Ignition interpreter · V8
Home · v8/v8 Wiki · GitHub
Launching Ignition and TurboFan · V8
TurboFan · V8 : turbofan 에 대한 정보들이 모여 있다.

ARM 의 license

license a specific processor core (e.g. Cortex A8, A9, A15)

license an ARM instruction set architecture (e.g. ARMv7 ISA)

Arm Flexible Access for Startups

A6, Exynos, Snapdragon

References

G1 Garbage Collector 1/2

G1 동작 overview

marking phase

reclaimable heap region

evacuation

G1 발자국

RSets

CSets

G1의 권장 사용 사례

See Also

Reference

Spinlock, Mutext lock

References

리눅스 boot process

Android

See Also

Reference

V8 관련 글들

파이토치에 대한 간략한 설명

파이토치 tutorial

See Also

Reference

오브젝트 스토리지(Object Storage)

Object storage System

오브젝트 스토리지와 블록 스토리지 간의 큰 차이

다른 백업 서비스 이용

블록 볼륨 백업

오브젝트 스토리지 백업

See Also

Reference

GSMA IMEI Database

GSMA IMEI DB 에 기록하는 정보

References

Recurrent Neural Networks, RNN

python 3 버전

결과

git rebase

다른 vcs 와 연관지어서...

redux-saga 동작 분석

createSagaMiddleware()

createStore() / applyMiddleware()

sagaMiddleware.run(rootSaga)

redux-saga 사용

CPU Intensive job 을 Node.js 에서 처리할 때

node js - single thread

cpu intensive job

node js http server timeout

See Also

References

v8 Scanner

Whitespace

See Also

only DNS over HTTPS (DoH)

DNS over TLS (DoT)

접속 Test

Tun2socks

Test

See Also

썬더클랩(Thunderclap)

동작

네트워크 카드의 동작

Thunderclap 방법

How do the Thunderclap vulnerabilities work?

How does this work differ from earlier DMA attacks such as Inception?

Technical details of the Thunderclap platform

V8 의 메모리 관련 글

See Also

iOS 보안-시스템 보안

Secure Boot Chain

시스템 소프트웨어 권한부여(System Software Authorization)

iOS update

installation authorization server

installation authorization server 에서 하는일

boot-time verification