agencies

정규표현식을 이용하여 C 언어에서 함수 추출해보기! 본문

Ⅰ. 프로그래밍

정규표현식을 이용하여 C 언어에서 함수 추출해보기!

agencies 2024. 12. 2. 18:21
import re

def extract_user_defined_function_names(file_path):
    with open(file_path, 'r') as file:
        code = file.read()
    
    # Regex pattern to match function definitions (not calls)
    function_pattern = re.compile(
        r'\b(?:static|extern|inline)?\s*'  # Optional specifiers
        r'(?:void|int|long|float|double|char|short|unsigned|signed|struct\s+\w+|enum\s+\w+|[a-zA-Z_]\w*\s*\*)\s+'  # Return type
        r'([a-zA-Z_]\w*)\s*\(',  # Function name
        re.DOTALL
    )

    # Extract all matches
    all_function_names = function_pattern.findall(code)
    
    # Remove duplicates
    unique_function_names = set(all_function_names)
    
    # Identify user-defined functions (those defined in the file)
    user_defined_functions = set()

    # Regex to find actual function definitions in the code
    definition_pattern = re.compile(
        r'\b(?:static|extern|inline)?\s*'  # Optional specifiers
        r'(?:void|int|long|float|double|char|short|unsigned|signed|struct\s+\w+|enum\s+\w+|[a-zA-Z_]\w*\s*\*)\s+'  # Return type
        r'([a-zA-Z_]\w*)\s*\([^)]*\)\s*\{',  # Function name with arguments and opening brace
        re.DOTALL
    )

    # Extract function definitions
    defined_functions = definition_pattern.findall(code)
    user_defined_functions.update(defined_functions)

    # Filter out library function calls (leave only user-defined functions)
    filtered_functions = sorted(user_defined_functions.intersection(unique_function_names))
    
    # Print and return the filtered function names
    print(filtered_functions)
    return filtered_functions

# Call the function with your uploaded file
extract_user_defined_function_names('./util_tmp.c')

 

정규표현식을 이용해서 c언어 소스코드에 있는 모든 함수들을 추출합니다.

이때 예약어등에 대해 필터링을 해줘야 하는데 (많은 예약어를 등록함으로써 오탐을 줄일 수 있을 것 같습니다)

 

 

 

실행결과

 

 

 

테스트 c 파일

util_tmp.c
0.01MB