JS SDK Demo

Ⅰ Overview

Build an interactive dialogue digital human application by integrating third-party ASR (Automatic Speech Recognition) and LLM (Large Language Model).

Ⅱ Key Features

1. Integrate Embodia AI Embodied Driving SDK: Powering the 3D digital human's movements and speech.

2. LLM Integration: Connect with Large Language Models to achieve text-based conversations.

3. ASR Integration: Convert voice to text to enable seamless verbal interaction with the digital human.

Ⅲ Environment Requirements

Requirement

Description

Frontend Framework

Vue 3 + TypeScript

Build Tool

Vite

Digital Human SDK

https://media.xingyun3d.com/xingyun3d/general/litesdk/xmovAvatar.0.1.0-alpha.75.js

Speech Recognition

Third-party ASR

LLM

OpenAI-compatible API (e.g., ByteDance Volcano Engine Ark)

Encryption Library

CryptoJS

Ⅳ Quick Start

1. Download the Demo: Download Link and extract the files.

2. Install Dependencies: Open a terminal in the project directory and run:

pnpm i

3. Run the Project: Start the development server:

pnpm run dev

4. Access the Application: Open your browser and navigate to http://localhost:5173/.

5. Configure SDK Parameters: Enter the Embodied Driving SDK connection parameters: App ID and App Secret.

Note: These can be obtained from Embodia AI -> Application Management -> View Key.

6. Configure ASR Parameters: Enter the ASR App ID, Secret ID, and Secret Key.

Select the ASR provider from the dropdown. This demo uses Tencent ASR as an example; parameters must be obtained from the provider.

7. Configure LLM Parameters: Enter the LLM Version and API Key.

This demo connects to the Volcano Ark LLM; parameters can be obtained from the Volcano Ark platform.

8. Start Interaction:

  • Type text and click "Send" to chat with the digital human.
  • Alternatively, click the voice recognition button, speak, and once the recognition is complete, the digital human will respond.

Ⅴ Advanced Integration

1. ASR Module Replacement

The ASR functionality is encapsulated in src/lib. The external API entry is useAsr. Executing useAsr provides the following three APIs:

  • start: Start capturing voice and converting it to text.
  • stop: Stop receiving voice.
  • asrText: The text content recognized from the voice.

To replace the ASR provider, simply ensure these three APIs maintain consistent functionality and output. You can modify the internal implementation according to your preferred third-party ASR documentation.

2. LLM Module Replacement

The LLM module handles the large model connection and is encapsulated in src/services/llm.ts. The current implementation uses the OpenAI access mode.

To replace it, modify this file according to the specific LLM provider's documentation. Ensure that the send method of the class returns a string to complete the replacement.

Ⅵ FAQ

Q: Why do I get the error "VideoDecoder is not defined"?
A: Some methods in the Embodia AI Embodied Driving SDK do not currently support non-secure (HTTP) calls. Please try using localhost or an https connection.