zoomの録画ファイル(mp4)から、AWS Transcribe(音声テキスト化)を使って、文字起こししてみた。

admin

2年前

概要：
1, mp4ファイルを、S3にアップロード
2, アップロード・トリガーで、lambda→Transcribeを実行
3, 出力された文字起こし.txtを、S3に保存。手動でダウンロードする

参考URL

AWS Transcribeを利用した自動文字起こしハンズオン

区切りが無くて見づらいので、複数話者フラグを追加(ShowSpeakerLabels)
構造的にややこしくなったjson出力から、speaker別に整形(aws-transcribe-transcript)
半角スペースがあって見づらいので置換。まあまあ、チャットっぽい感じになった。

参考URL
https://qiita.com/sakaia/items/867d42c893064b84dde9

詳細な操作：

1, S3バケット(入力用・出力用)作成。これって同じS3バケットじゃダメなの？
in-transcribe-20230912
out-transcribe-20230912

2, lambda生成。設計図からs3getを選択、S3バケット(in-transcribe-20230912)を選択。suffixを.mp4にする(mp4ファイル以外は動作させない！)
lambda関数に『AmazonS3FullAccess』と『AmazonTranscribeFullAccess』の2つのポリシーを付与する

※エラーが出たけど無視！
Lambda 関数「Transcribe_function」は正常に作成されましたが、トリガーの作成時にエラー Unable to validate the following destination configurations が発生しました。

import json
import urllib.parse
import boto3
import datetime

s3 = boto3.client('s3')
transcribe = boto3.client('transcribe')

def lambda_handler(event, context):
    #print("Received event: " + json.dumps(event, indent=2))

    # Get the object from the event and show its content type
    bucket = event['Records'][0]['s3']['bucket']['name']
    key = urllib.parse.unquote_plus(event['Records'][0]['s3']['object']['key'], encoding='utf-8')
    try:
        transcribe.start_transcription_job(
            TranscriptionJobName= datetime.datetime.now().strftime('%Y%m%d%H%M%S') + '_Transcription', # 出力ファイル名。重複対策で日時追加
            LanguageCode='ja-JP',
            Media={
                'MediaFileUri': 'https://s3.ap-northeast-1.amazonaws.com/' + bucket + '/' + key
            },
            Settings={
                'ShowSpeakerLabels': True, #複数話者フラグ
                'MaxSpeakerLabels': 10 # 最大10人まで
            },            
            OutputBucketName='out-transcribe-20230912' # 保存先のS3
        )
    except Exception as e:
        print(e)
        print('Error getting object {} from bucket {}. Make sure they exist and your bucket is in the same region as this function.'.format(key, bucket))
        raise e

import json

import urllib.parse

import boto3

import datetime

s3 = boto3.client('s3')

transcribe = boto3.client('transcribe')

def lambda_handler(event, context):

#print("Received event: " + json.dumps(event, indent=2))

# Get the object from the event and show its content type

bucket = event['Records'][0]['s3']['bucket']['name']

key = urllib.parse.unquote_plus(event['Records'][0]['s3']['object']['key'], encoding='utf-8')

try:

transcribe.start_transcription_job(

TranscriptionJobName= datetime.datetime.now().strftime('%Y%m%d%H%M%S') + '_Transcription', # 出力ファイル名。重複対策で日時追加

LanguageCode='ja-JP',

Media={

'MediaFileUri': 'https://s3.ap-northeast-1.amazonaws.com/' + bucket + '/' + key

Settings={

'ShowSpeakerLabels': True, #複数話者フラグ

'MaxSpeakerLabels': 10 # 最大10人まで

OutputBucketName='out-transcribe-20230912' # 保存先のS3

)

except Exception as e:

print(e)

print('Error getting object {} from bucket {}. Make sure they exist and your bucket is in the same region as this function.'.format(key, bucket))

raise e

関連記事