분산 트랜잭션 (Distributed transaction)

분산 트랜잭션은 두 개 이상의 데이터 저장소(특히 데이터베이스)에 걸쳐 수행되는 일련의 데이터 작업이다.
이는 일반적으로 네트워크로 연결된 별도의 노드에서 조정되며, 단일 서버의 여러 데이터베이스에 걸쳐 있을 수도 있다.

특징:

ACID 속성 준수: 원자성(Atomicity), 일관성(Consistency), 격리성(Isolation), 지속성(Durability)을 보장한다.
일관성 유지: 모든 분산 데이터베이스가 최신 정보로 동일하게 업데이트되어야 한다.
종료 보장: 분산 트랜잭션은 완전히 실행되거나 전혀 실행되지 않아야 한다.

장점:

데이터 일관성 보장
복잡한 비즈니스 프로세스 지원
시스템 신뢰성 향상

단점:

성능 오버헤드 발생 가능
구현 및 관리의 복잡성
네트워크 지연으로 인한 성능 저하 가능성

분산 트랜잭션 처리 방식

2단계 커밋 프로토콜 (Two-Phase Commit Protocol, 2PC)

가장 널리 사용되는 방식으로, 다음 두 단계로 구성된다:

준비 단계: 트랜잭션 코디네이터가 모든 참여 노드에 커밋 준비 요청을 보낸다.
커밋 단계: 모든 노드가 준비되면 코디네이터가 커밋 요청을 보내고, 그렇지 않으면 롤백을 요청한다.

장점은 데이터 일관성을 완벽하게 보장한다는 것이지만, 단점은 모든 노드가 응답할 때까지 기다려야 하므로 성능이 저하될 수 있다.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
class TransactionCoordinator:
    def execute_transaction(self, nodes, transaction):
        # 1단계: 준비 단계
        prepare_responses = []
        for node in nodes:
            response = node.prepare(transaction)
            if response != "READY":
                self.abort_transaction(nodes)
                return "Transaction Failed"
            prepare_responses.append(response)
        
        # 2단계: 커밋 단계
        if all(response == "READY" for response in prepare_responses):
            for node in nodes:
                node.commit(transaction)
            return "Transaction Committed"

3단계 커밋 프로토콜 (Three-Phase Commit Protocol, 3PC)

2PC의 확장 버전으로, 추가적인 “사전 커밋” 단계를 포함하여 일부 실패 시나리오를 개선한다.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
class ThreePhaseCoordinator:
    def execute_transaction(self, nodes, transaction):
        # 1단계: 투표 요청
        if not self.can_commit(nodes, transaction):
            return self.abort(nodes)
            
        # 2단계: 사전 커밋
        if not self.pre_commit(nodes, transaction):
            return self.abort(nodes)
            
        # 3단계: 최종 커밋
        return self.do_commit(nodes, transaction)

보상 트랜잭션 (Compensating Transactions)

장기 실행 트랜잭션의 경우, 각 단계를 개별적으로 커밋하고 실패 시 보상 트랜잭션을 실행하여 변경사항을 취소한다.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
class CompensatingTransaction:
    def execute_with_compensation(self, steps):
        executed_steps = []
        try:
            for step in steps:
                step.execute()
                executed_steps.append(step)
        except Exception:
            # 문제 발생 시 역순으로 보상 트랜잭션 실행
            for step in reversed(executed_steps):
                step.compensate()
            raise

분산 타임스탬프 기법

트랜잭션에 고유한 타임스탬프를 부여하여 순서를 보장한다.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
class DistributedTimestampManager:
    def __init__(self):
        self.logical_clock = 0
        
    def get_timestamp(self, node_id):
        self.logical_clock += 1
        return f"{self.logical_clock}_{node_id}"
        
    def order_transactions(self, transactions):
        return sorted(transactions, key=lambda t: t.timestamp)

분산 락킹 메커니즘

데이터 일관성을 보장하기 위해 분산 환경에서의 락 관리가 필요하다.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
class DistributedLockManager:
    def acquire_locks(self, transaction, resources):
        locked_resources = []
        try:
            for resource in resources:
                if self.try_lock(resource, transaction.id):
                    locked_resources.append(resource)
                else:
                    self.release_locks(locked_resources)
                    return False
            return True
        except Exception as e:
            self.release_locks(locked_resources)
            raise e

고려사항

데드락 감지 및 해결
분산 환경에서의 데드락은 더욱 복잡하다.
전역적인 대기 그래프를 유지하고 주기적으로 검사해야 한다.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
class DeadlockDetector:
    def detect_deadlocks(self, wait_for_graph):
        visited = set()
        path = set()

        def has_cycle(node):
            if node in path:
                return True
            if node in visited:
                return False

            visited.add(node)
            path.add(node)

            for neighbor in wait_for_graph[node]:
                if has_cycle(neighbor):
                    return True

            path.remove(node)
            return False

        for node in wait_for_graph:
            if node not in visited:
                if has_cycle(node):
                    return True
        return False

복구 관리 장애 발생 시 일관된 상태로 복구하기 위한 메커니즘이 필요하다:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
class RecoveryManager:
    def recover_from_failure(self, log_records):
        # REDO 단계: 커밋된 트랜잭션 재실행
        committed_transactions = self.get_committed_transactions(log_records)
        for transaction in committed_transactions:
            self.redo_transaction(transaction)

        # UNDO 단계: 미완료 트랜잭션 롤백
        incomplete_transactions = self.get_incomplete_transactions(log_records)
        for transaction in incomplete_transactions:
            self.undo_transaction(transaction)

성능 최적화 트랜잭션 처리 성능을 향상시키기 위한 여러 기법들:
- 트랜잭션 분할: 큰 트랜잭션을 작은 단위로 분할
- 비동기 복제: 성능을 위해 일부 노드는 비동기적으로 업데이트
- 캐싱 전략: 자주 사용되는 데이터는 로컬에 캐시

분산 트랜잭션 처리의 과제

네트워크 파티션 처리
노드 장애 대응
데이터 일관성 유지
성능 최적화

## 용어 정리

용어	설명