18/08/2023

242.Flowers.png

💡Why I started this project

I came across this tutorial that taught me how to create a simple data pipeline. Upon completing the tutorial, I realized it would be an awesome idea to give back to the YouTuber who had helped me learn so much from his content!

Twitter Data Pipeline using Airflow for Beginners | Data Engineering Project

Through some Google searching, I discovered that I could retrieve comments from specific YouTube videos. This got me thinking about developing a simple ETL to extract data from the Google Data API and use the data to get some actionable insights that might help him to improve the content.

🏗️How I built the project

Step 1 - Data extraction

Google has simplified their API usage, providing an API explorer that lets me experiment with queries and see the returned data directly within the explorer itself.

API Explorer.PNG

They also have great documentation to teach me how to use their API:

API Reference  |  YouTube Data API  |  Google for Developers

Additionally, they provide a starter code demonstrating how to retrieve data from the API based on my explorations in the API explorer. Here is the starter code I used👇🏽

# -*- coding: utf-8 -*-

# Sample Python code for youtube.commentThreads.list
# See instructions for running these code samples locally:
# <https://developers.google.com/explorer-help/code-samples#python>

import os

import googleapiclient.discovery

def main():
    # Disable OAuthlib's HTTPS verification when running locally.
    # *DO NOT* leave this option enabled in production.
    os.environ["OAUTHLIB_INSECURE_TRANSPORT"] = "1"

    api_service_name = "youtube"
    api_version = "v3"
    DEVELOPER_KEY = "YOUR_API_KEY"

    youtube = googleapiclient.discovery.build(
        api_service_name, api_version, developerKey = DEVELOPER_KEY)

    request = youtube.commentThreads().list(
        part="snippet, replies",
        videoId="q8q3OFFfY6c"
    )
    response = request.execute()

    print(response)

if __name__ == "__main__":
    main()