프로젝트

언론사 카테고리의 확장

content0474 2025. 1. 13. 16:06

현재 db에 저장된 카테고리 이름은 아래와 같다.

1 "Main"
2 "Technology"
3 "Business"
4 "Science"
5 "Health"
6 "Politics"
7 "Art"
8 "Sport"

하지만 언론사마다 카테고리 이름이 조금씩 다르다

예를 들어 뉴욕타임즈는 world, technology, business, science, health, politics, entertainment, sport

데이터를 그대로 받아올 경우, entertainment는 db에 저장이 되지 않거나, 

1 "Main"
2 "Technology"
3 "Business"
4 "Science"
5 "Health"
6 "Politics"
7 "Art"
8 "Sport"
9 "sport"
10 "technology"

위와 같이 일부 카테고리는 소문자로 새로 생성되기도 한다.

다른 언론사에서 데이터를 받을때도 비슷한 문제가 생길것 같아, 기존의 카테고리 모델을 확장시키고 데이터를 매핑하기로 했다.

 

기존코드

accounts/models.py

class Category(models.Model):
    name = models.CharField(max_length=100, unique=True)

    def __str__(self):
        return self.name
@shared_task
def fetch_and_store_news():
    API_KEY = config("NYT_API_KEY")

    categories = Category.objects.all()
    if not categories.exists():
        print("No categories found in the database.")
        return

    for category in categories:
        section = category.name.lower()
        url = f"https://api.nytimes.com/svc/topstories/v2/{section}.json"
        params = {'api-key': API_KEY}

        response = requests.get(url, params=params)
        if response.status_code == 200:
            data = response.json()
            articles = data.get('results', [])
            print(f"Fetching articles for category: {category.name}")

 

수정코드

class Category(models.Model):
    name = models.CharField(max_length=100, unique=True)
    source_categories = models.JSONField(default=dict, blank=True)

    def __str__(self):
        return self.name

    @classmethod
    def get_source_category(cls, standard_category_name, news_source):
        try:
            category = cls.objects.get(name=standard_category_name)
            return category.source_categories.get(news_source, None)  
        except cls.DoesNotExist:
            print(f"Category '{standard_category_name}' does not exist.")
            return None
@shared_task
def fetch_and_store_news(news_source="NYTimes"):
    API_KEY = config("NYT_API_KEY")

    categories = Category.objects.all()
    if not categories.exists():
        print("No categories found in the database.")
        return

    for category in categories:
        # 언론사별 카테고리 이름 가져오기
        source_category = category.get_source_category(category.name, news_source)
        if not source_category:
            print(f"No mapping found for category '{category.name}' in source '{news_source}'.")
            continue

        url = f"https://api.nytimes.com/svc/topstories/v2/{source_category}.json"
        params = {'api-key': API_KEY}

        response = requests.get(url, params=params)
        if response.status_code == 200:
            data = response.json()
            articles = data.get('results', [])
            print(f"Fetching articles for category: {category.name} ({source_category})")

카테고리 모델에 source_categories 필드를 추가하고 json형식으로 정보를 저장

get_source_category 함수를 정의해 api로 뉴스를 받아올 때 호출되어서

카테고리이름과 뉴스출처를 받아서 해당 뉴스에서 사용하는 이름으로 반환

ex Art, NYTimes → entertainment

 

매핑

json 형식으로 db에 직접 매핑(수동)

도커로 실행할 경우, json파일을 만든 후 

docker-compose exec web python [manage.py](http://manage.py/) loaddata fixtures/category_mapping.json

위의 명령어로 데이터 로드

 

 

+추가고려사항

언론사가 늘어나면 수동으로 매핑하지 않고 자동으로 매핑하는 로직 추가

매핑 실패 시 로그를 반영하는 코드 추가