deepdfa + linevul 운영(초안)단계 시도해보기!

Notice

Recent Posts

Recent Comments

Link

« 2025/04 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

Tags more

Archives

Today

Total

관리 메뉴

agencies

deepdfa + linevul 운영(초안)단계 시도해보기! 본문

Ⅳ. 기타

deepdfa + linevul 운영(초안)단계 시도해보기!

agencies 2024. 11. 20. 18:10

colab 마운트

from google.colab import drive
drive.mount('/content/drive')

모델 평가모드로 진행(strict=false 를 사용해서 걱정된다)

import torch
import torch.nn as nn
from torch.nn import CrossEntropyLoss
from transformers import RobertaModel, RobertaConfig, RobertaTokenizer, RobertaForSequenceClassification
from DeepDFA.DDFA.code_gnn.models.flow_gnn.ggnn import FlowGNNGGNNModule

# FlowGNNGGNNModule 초기화
flowgnn_encoder = FlowGNNGGNNModule(
    feat="_ABS_DATAFLOW",       # 사용하려는 특징 키 (예: "_ABS_DATAFLOW")
    input_dim=128,             # 입력 차원 (특성 개수)
    hidden_dim=128,            # 숨겨진 상태 크기
    n_steps=5,                 # GGNN의 업데이트 스텝 수
    num_output_layers=2,       # 출력 레이어 수
    label_style="graph",       # 그래프 기반 예측
    concat_all_absdf=False,    # 모든 특성 결합 여부
    encoder_mode=True          # 출력 임베딩 활성화
)

# RobertaClassificationHead 클래스 정의
class RobertaClassificationHead(nn.Module):
    """Head for sentence-level classification tasks."""
    def __init__(self, config, extra_dim):
        super().__init__()
        self.dense = nn.Linear(config.hidden_size + extra_dim, config.hidden_size)
        self.dropout = nn.Dropout(config.hidden_dropout_prob)
        self.out_proj = nn.Linear(config.hidden_size, 2)

    def forward(self, features, flowgnn_embed, **kwargs):
        x = features[:, 0, :]  # take <s> token (equiv. to [CLS])
        if flowgnn_embed is not None:
            x = torch.cat((x, flowgnn_embed), dim=1)
        x = self.dropout(x)
        x = self.dense(x)
        x = torch.tanh(x)
        x = self.dropout(x)
        x = self.out_proj(x)
        return x

# Model 클래스 정의
class Model(RobertaForSequenceClassification):   
    def __init__(self, encoder, flowgnn_encoder, config, tokenizer, args):
        super(Model, self).__init__(config=config)
        self.encoder = encoder
        if not args.no_flowgnn:
            self.flowgnn_encoder = flowgnn_encoder
        self.tokenizer = tokenizer
        self.classifier = RobertaClassificationHead(config, 0 if args.no_flowgnn else self.flowgnn_encoder.out_dim)
        self.args = args

    def forward(self, input_embed=None, labels=None, graphs=None, output_attentions=False, input_ids=None):
        if self.args.no_flowgnn:
            flowgnn_embed = None
        elif graphs is not None:
            flowgnn_embed = self.flowgnn_encoder(graphs, {})
        if output_attentions:
            if input_ids is not None:
                outputs = self.encoder.roberta(input_ids, attention_mask=input_ids.ne(1), output_attentions=output_attentions)
            else:
                outputs = self.encoder.roberta(inputs_embeds=input_embed, output_attentions=output_attentions)
            attentions = outputs.attentions
            last_hidden_state = outputs.last_hidden_state
            logits = self.classifier(last_hidden_state, flowgnn_embed)
            prob = torch.softmax(logits, dim=-1)
            if labels is not None:
                loss_fct = CrossEntropyLoss()
                loss = loss_fct(logits, labels)
                return loss, prob, attentions
            else:
                return prob, attentions
        else:
            if input_ids is not None:
                outputs = self.encoder.roberta(input_ids, attention_mask=input_ids.ne(1), output_attentions=output_attentions)[0]
            else:
                outputs = self.encoder.roberta(inputs_embeds=input_embed, output_attentions=output_attentions)[0]
            logits = self.classifier(outputs, flowgnn_embed)
            prob = torch.softmax(logits, dim=-1)
            if labels is not None:
                loss_fct = CrossEntropyLoss()
                loss = loss_fct(logits, labels)
                return loss, prob
            else:
                return prob

# 로버타 토크나이저 로드
tokenizer = RobertaTokenizer.from_pretrained("roberta-base")

# 로버타 설정 생성
config = RobertaConfig.from_pretrained("roberta-base")

# 사전 학습된 RoBERTa 모델 로드
encoder = RobertaModel.from_pretrained("roberta-base")

# Args 클래스 정의
class Args:
    def __init__(self):
        self.no_flowgnn = False  # FlowGNN 사용 여부

args = Args()

# 모델 초기화
model = Model(
    encoder=encoder,
    flowgnn_encoder=flowgnn_encoder,  # FlowGNN 활성화
    config=config,
    tokenizer=tokenizer,
    args=args
)

# 저장된 가중치 로드
model_path = "/content/drive/MyDrive/model.bin"
state_dict = torch.load(model_path, map_location=torch.device('cpu'))



# 가중치 키 확인
print(state_dict.keys())


#model.load_state_dict(state_dict)
model.load_state_dict(state_dict, strict=False)

model.eval()

before 폴더의 0.c를 가지고오

# test.c 파일 읽기
test_code_path = "0.c"
with open(test_code_path, "r") as file:
    test_code = file.read()

print("Test Code:\n", test_code)

input_ids와 attention_mask 까지는 잘 출력되는데 그 밑 부분이 안된다.

# 토크나이저를 사용하여 입력 텐서 생성
inputs = tokenizer(
    test_code,
    max_length=512,           # 입력의 최대 길이
    padding="max_length",     # 최대 길이에 맞춰 패딩
    truncation=True,          # 입력을 자를지 여부
    return_tensors="pt"       # 텐서 형식 반환
)

# input_ids와 attention_mask 생성
input_ids = inputs["input_ids"]
attention_mask = inputs["attention_mask"]

print(input_ids)
print(attention_mask)


# 모델 평가
model.eval()
with torch.no_grad():
    # 모델 예측
    outputs = model(input_ids=input_ids, attention_mask=attention_mask)
    logits = outputs[0]

# 확률 계산
probs = torch.softmax(logits, dim=-1)
predicted_label = torch.argmax(probs, dim=1).item()

# 결과 출력
print("Source Code:", test_code)
print("Predicted Probabilities (Non-Vulnerable, Vulnerable):", probs)
print("Predicted Label:", "Vulnerable (1)" if predicted_label == 1 else "Non-Vulnerable (0)")

GPU 런타임 이슈로 더이상 시도 불가...

'Ⅳ. 기타' 카테고리의 다른 글

CVE database (구축) : 고도화 (combined -> ) (2)	2024.11.25
deepdfa + linevul 운영(초안)단계 시도해보기! 두번째 (2)	2024.11.21
소스코드 글씨를 예쁘게 색칠되도록 변환해주는 온라인 사이트 (0)	2024.11.15
네트워크 및 소프트웨어 드라이브를 자동으로 잡아주는 유틸리티 설치 (0)	2024.11.15
네이버 메일 PDF로 저장하는 방법 (1)	2024.11.06

'Ⅳ. 기타' Related Articles

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

agencies

agencies

deepdfa + linevul 운영(초안)단계 시도해보기! 본문

deepdfa + linevul 운영(초안)단계 시도해보기!

'Ⅳ. 기타' 카테고리의 다른 글

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역