Multimedia program development



Multimedia program development

Multimedia program development refers to the technical field that integrates text, images, audio, video and animation to implement interactive functions through programming language. Its development focuses on hardware acceleration, coding efficiency and user experience smoothness.


core development components

Mainstream development tools and languages

Development areas Commonly used languages Technical framework/tools
Web multimedia JavaScript / TypeScript HTML5 Canvas, WebGL, Three.js
Mobile Apps/Games C++ / C# / Swift Unity, Unreal Engine, Metal
Back-end audio and video processing Python / Go / C++ FFmpeg, OpenCV, GStreamer

Common development processes

  1. Requirements analysis: Determine media types (such as streaming media, interactive games, educational software).
  2. Resource preparation: material collection and format conversion (optimizing file size and resolution).
  3. Programming: Implement playback logic, filter effects or interactive algorithms.
  4. Performance tuning: Perform memory management and multi-thread optimization to ensure high frame rate operation.
  5. Deployment and testing: Cross-platform compatibility testing to ensure that it can operate under different screen sizes and hardware specifications.
Note: When developing multimedia programs that involve a large amount of calculations, hardware decoding should be given priority to reduce CPU load.


DirectX

DirectX is a series of application programming interfaces (APIs) developed by Microsoft to allow software (especially games) to communicate directly with hardware such as graphics cards and sound effects cards. It is a core pillar of multimedia development for Windows platforms and Xbox consoles.


Main API components

DirectX version evolution comparison

Version Important features Applicable environment
DirectX 11 Introducing surface tessellation (Tessellation) and multi-thread rendering for high stability. Windows 7 and above
DirectX 12 The underlying API (Low-level) greatly reduces CPU overhead and supports multi-core scheduling of graphics cards. Windows 10 / 11
DirectX 12 Ultimate Integrate next-generation technologies such as Ray Tracing and Mesh Shaders. High-End GPUs and Xbox Series X/S

Development advantages

  1. Hardware abstraction: Developers do not need to write specific code for different brands of graphics cards.
  2. High performance: DirectX 12 allows developers to manage GPU resources more granularly and reduce system latency.
  3. Complete ecosystem: closely integrated with Visual Studio and Microsoft development tool chain, and rich in debugging tools (such as PIX).
Note: In modern game development, developers usually call DirectX through engines such as Unity or Unreal Engine instead of directly writing low-level instructions to improve development efficiency.


Media Foundation

Media Foundation (MF) is a multimedia framework launched by Microsoft after Windows Vista and is designed to replace the old DirectShow. It adopts a new pipeline design and is optimized for high-resolution video, digital rights management (DRM) and more efficient hardware acceleration. It is the core technology for modern Windows applications to process audio and video.


Core architectural components

Media Foundation breaks down the multimedia processing process into three main levels. This design provides extremely high flexibility of control:

Comparison of technical advantages

characteristic Media Foundation DirectShow (old version)
High resolution support Natively optimized for 4K, 8K and HDR content. The scalability is limited and it is difficult to handle ultra-high resolution.
Hardware acceleration Deeply integrated with DXVA 2.0, extremely efficient. Depending on specific filter implementation, performance may vary.
Content protection Built-in PMP (Protected Media Path) supports DRM. There is a lack of unified copyright protection mechanism.
Thread model Use asynchronous topology to reduce UI freezes. Synchronous execution model can easily lead to interface lag.

Common development interface

  1. Source Reader:A simplified API for developers who only need to get decoded frames from an archive or camera.
  2. Sink Writer:A quick tool for encoding audio and video data into files in a specific format.
  3. Media Session:A complete pipeline controller provides full control over play, pause, jump and other actions.
Note: Although Media Foundation has excellent performance, its API design is relatively complex and rigorous. It is recommended that developers use the MFTrace tool provided by Microsoft for debugging to track the event flow in the media pipeline.


DirectShow

DirectShow is a multimedia framework based on the Component Object Model (COM), mainly used for audio and video capture and playback on the Windows platform. Although Microsoft later launched Media Foundation as its successor, DirectShow is still widely used in industrial cameras, medical imaging, and traditional audio and video software due to its strong compatibility and flexibility.


filter graph model

The core concept of DirectShow is the Filter Graph, which processes multimedia data by connecting different filters into links:

Core development functions

Functional classification illustrate
media playback Supports integration of multiple container formats (such as AVI, WMV, MP4) and codecs.
Image capture Provides a standard interface for communicating with WDM (Windows Driver Model) devices, suitable for USB cameras.
Hardware acceleration Hardware-accelerated rendering can be performed using the graphics card via Video Mixing Renderer (VMR) or EVR.
format conversion Supports resampling, cropping, and color space conversion (such as YUV to RGB) of real-time video streams.

Development advantages and challenges

  1. Highly modular:Developers can write custom filters and insert them into existing graphic links.
  2. Automated wiring:It has an Intelligent Connect mechanism that can automatically find and combine the required filters.
  3. Learning curve:Due to its deep reliance on the COM interface, it is more complicated for developers who are not familiar with COM indicators and memory management.
Note: When carrying out modern development, if you do not need to support older systems, Microsoft recommends giving priority to using Media Foundation, which has more advantages in handling high-resolution content and digital rights management (DRM).


Vulkan

Vulkan is a next-generation cross-platform graphics and computing API developed by Khronos Group. Unlike OpenGL, Vulkan is a low-level API designed to provide more direct hardware control, minimize the driver's overhead, and improve the utilization of multi-core processors.


Core design features

Vulkan’s design logic requires developers to assume more management responsibilities in exchange for ultimate performance:

Differences between Vulkan and OpenGL

characteristic Vulkan OpenGL
Driver burden Very low, most logic is implemented by developers. At a higher level, the driver takes care of a lot of background management.
Multi-thread support Native support for parallel task distribution. Mainly relies on a single thread.
Development complexity Extremely high, the amount of code is usually several times that of OpenGL. Medium, more friendly to beginners.
Hardware utilization High, can accurately control GPU computing and memory. , limited by the abstraction level of the API.

key development components

  1. Instance & Physical Device:Initialize Vulkan and enumerate the graphics card hardware on the system.
  2. Logical Device & Queues:Establish logical connections from physical devices and obtain queues that handle graphics, compute, or transfer tasks.
  3. Pipeline State Objects (PSO):Pre-encapsulate the rendering state (such as blending mode, depth test) to avoid dynamically changing the state during drawing, resulting in performance frame drops.
  4. Render Pass:Clearly defining the rendering target and operation steps is conducive to the optimization of tile rendering (Tile-based rendering) on ​​mobile GPUs.
Note: Due to the extremely high development threshold of Vulkan, it is usually recommended for 3D game engine cores that require extreme performance (such as id Tech 7) or scientific simulation programs that require cross-platform high-performance computing.


Machine vision program development

OpenCV

1. What is OpenCV?

OpenCV (Open Source Computer Vision Library) is an open source computer vision and machine learning software library for real-time image processing and analysis.

2. Supported functions

3. Supported platforms

4. Usage examples

# Read the image and display it
import cv2
image = cv2.imread("image.jpg")
cv2.imshow("Image", image)
cv2.waitKey(0)
cv2.destroyAllWindows()

5. Resources and Documents



cv::imread

1. Basic grammar

In OpenCV, the core function for reading images iscv::imread. It will load the image file ascv::MatMatrix format.

#include <opencv2/opencv.hpp>

// Grammar prototype
cv::Mat img = cv::imread(const std::string& filename, int flags = cv::IMREAD_COLOR);

Commonly used tags (Flags):


2. Exception checking and handling mechanism

Key ideas:cv::imreadfailed andNo C++ exceptions are thrown, so traditional try-catch is not effective for it. When the read fails (such as path error, unsupported format or insufficient permissions), it will return an emptycv::Matobject.

The correct processing flow should be usedempty()Member function to check:

#include <opencv2/opencv.hpp>
#include <iostream>

int main() {
    std::string path = "data/image.jpg";
    cv::Mat img = cv::imread(path);

    // Must check if the image is loaded successfully
    if (img.empty()) {
        std::cerr << "Error: Unable to read image file!" << std::endl;
        std::cerr << "Please confirm whether the path is correct:" << path << std::endl;
        return -1;
    }

    //Execute the operation after successful reading
    std::cout << "Image width: " << img.cols << " Height: " << img.rows << std::endl;
    return 0;
}

3. Analysis of common failure reasons

ifimg.empty()is true, usually due to the following reasons:

reason Explanation and Countermeasures
File path error Most common reasons. Please check whether the relative path is relative to the executable directory, or use an absolute path.
Unsupported file extension OpenCV needs a corresponding decoder (such as libjpeg, libpng). If OpenCV is compiled without support, it cannot be read.
Chinese path problem In Windows environment, old version or specific compilation environmentcv::imreadPoor support for Chinese paths.
Insufficient permissions The user executing the program does not have operating system permissions to read the file.

4. Advanced solution: Chinese path reading

If reading fails due to a Windows Chinese path, it is recommended to read the file into the memory Buffer first, and thencv::imdecodeTo decode:


#include <fstream>
#include <vector>

cv::Mat imread_unicode(std::string path) {
    std::ifstream fs(path, std::ios::binary | std::ios::ate);
    if (!fs.is_open()) return cv::Mat();

    std::streamsize size = fs.tellg();
    fs.seekg(0, std::ios::beg);

    std::vector<char> buffer(size);
    if (fs.read(buffer.data(), size)) {
        return cv::imdecode(cv::Mat(buffer), cv::IMREAD_COLOR);
    }
    return cv::Mat();
}


Oscillation point group grouping

When the order of point groups (such as screw edges or sine waves) is disordered, they must first be projected in the direction of the fitted straight line and sorted, and then the points can be correctly grouped according to their positive and negative offsets relative to the straight line (Signed Distance). The following is an implementation plan for integrating OpenCV and standard C++.


Coordinate point definition and distance sorting

First implement the specified point distance sorting function you require. This can be used to locate a starting point or a specific feature point.

#include <vector>
#include <array>
#include <algorithm>
#include <opencv2/opencv.hpp>

using Point2D = std::array<float, 2>;
using Points = std::vector<Point2D>;

namespace GeometryPointsUtil {
    bool FindSortedPointsByDistOfPoint(Points& retPoints, const Points& allPoints, const Point2D& aPoint) {
        if (allPoints.empty()) return false;

        retPoints = allPoints;
        std::sort(retPoints.begin(), retPoints.end(), [&aPoint](const Point2D& p1, const Point2D& p2) {
            float dx1 = p1[0] - aPoint[0];
            float dy1 = p1[1] - aPoint[1];
            float dx2 = p2[0] - aPoint[0];
            float dy2 = p2[1] - aPoint[1];
            // Use sum of squares comparison to avoid sqrt operation overhead
            return (dx1 * dx1 + dy1 * dy1) < (dx2 * dx2 + dy2 * dy2);
        });
        return true;
    }
}

Grouping Algorithm Along Lines for Out-of-Order Point Groups

For oscillating lines, this function will automatically fit the straight line, sort the projection, and segment it according to both sides of the straight line.

std::vector<Points> splitOscillatingPoints(const Points& allPoints) {
    if (allPoints.size() < 2) return {allPoints};

    // 1. Straight line fitting
    std::vector<cv::Point2f> cvPts;
    for (const auto& p : allPoints) cvPts.push_back({p[0], p[1]});
    
    cv::Vec4f line; // (vx, vy, x0, y0)
    cv::fitLine(cvPts, line, cv::DIST_L2, 0, 0.01, 0.01);
    float vx = line[0], vy = line[1], x0 = line[2], y0 = line[3];

    // 2. Projection sorting: ensure that the points are arranged along a straight line
    struct ProjectedPoint {
        Point2D original;
        float t; // projection length
        float side; // algebraic distance to straight line
    };

    std::vector<ProjectedPoint> projected;
    float nx = -vy; // normal vector x
    float ny = vx; // normal vector y

    for (const auto& p : allPoints) {
        float dx = p[0] - x0;
        float dy = p[1] - y0;
        float t = dx * vx + dy * vy; // Displacement projected onto a straight line
        float s = dx * nx + dy * ny; // Distance perpendicular to the straight line (including plus and minus signs)
        projected.push_back({p, t, s});
    }

    std::sort(projected.begin(), projected.end(), [](const ProjectedPoint& a, const ProjectedPoint& b) {
        return a.t < b.t;
    });

    // 3. Grouping based on positive and negative sign transitions
    std::vector<Points> segments;
    if (projected.empty()) return segments;

    Points currentGroup;
    bool lastSide = (projected[0].side >= 0);

    for (const auto& item : projected) {
        bool currentSide = (item.side >= 0);

        if (currentSide != lastSide && !currentGroup.empty()) {
            segments.push_back(currentGroup);
            currentGroup.clear();
        }
        
        currentGroup.push_back(item.original);
        lastSide = currentSide;
    }

    if (!currentGroup.empty()) segments.push_back(currentGroup);
    return segments;
}

Explanation of implementation points



Halcon

Features

Halcon is a powerful industrial vision software developed by MVTec, specifically designed for image processing and machine vision applications.

Function

Application areas

resource



Video editing program development

Common functions

Common Tools and Library

Application examples



Open source video editing software

1. Shotcut

Shotcut is a free and open source video editing software that supports multiple formats and has many powerful editing tools. Features include:

Applicable platforms: Windows, Mac, Linux

2. OpenShot

OpenShot is an easy-to-use open source video editing tool that is powerful and supports multiple formats. Its main features include:

Applicable platforms: Windows, Mac, Linux

3. Blender

Blender is a well-known open source 3D modeling and animation software with a built-in powerful video editor suitable for video editing and special effects production. Its features include:

Applicable platforms: Windows, Mac, Linux

4. Kdenlive

Kdenlive is a widely used open source video editing software on Linux and also supports Windows. Its main functions include:

Applicable platforms: Windows, Mac, Linux

5. Lightworks

Lightworks offers free and paid versions, with the free version offering basic editing features. Features include:

Applicable platforms: Windows, Mac, Linux

. Avidemux
. Cinelerra
. LiVES
. Losslesscut
. Natron
. Pitivi

The above open source video editing software provides powerful functions that are suitable for different levels of video editing needs, from simple home video editing to professional video production.

Google search volume ranking

Software name Approximate search volume
OpenShot 110,000
Kdenlive 90,500
Shotcut 49,500
Avidemux 18,100
Losslesscut 14,800
Blender VSE 10,000
Natron 6,600
Cinelerra 5,400
Pitivi 3,600
LiVES 1,600


A useful program library for video editing

FFmpeg

MoviePy(Python)

OpenCV(C++/Python)

GStreamer

AVFoundation(macOS/iOS)

Microsoft Media Foundation(Windows)

Kapwing API / Shotstack / Cloudinary

Adobe Premiere Pro API(Adobe UXP)



OpenShot

Project introduction

OpenShot is a free and open source video editor, the project name isOpenShot/openshot-qt, mainly based onPythonandQtdevelopment. The project aims to provide an easy-to-use and feature-rich video editing tool suitable for users of all levels.

Features

Technical architecture

OpenShot usesPyQtas a graphical user interface and combined withlibopenshot(C++ implementation) to handle the core logic of video editing. Additionally, OpenShot leveragesFFmpegTo support decoding and encoding of multiple formats.

Usage context

OpenShot is suitable for users who need simple yet powerful video editing needs. Whether for amateur video creators or for educational purposes, OpenShot provides flexible tools and plug-ins to make editing and creation easy.

community and contribution

The OpenShot project has an active open source community, and users and developers can contribute code, report issues, or submit new feature suggestions through GitHub. Everyone is welcome to participate and help improve the functionality and stability of OpenShot.

How to get OpenShot

Users can download the source code through the GitHub page, or download the executable file from the OpenShot official website. Detailed installation instructions and documentation are also available on GitHub.



Python Kdenlive automation

How Kdenlive project files (KDENLIVE) work

Kdenlive's project files are essentially plain text files in XML format. To achieve "automatic opening and importing", the most stable and efficient way is not to simulate mouse clicks, but to use Python to directly generate or modify the XML file, and then call the Kdenlive program to open it. This method can accurately specify the position of voice, subtitles (SRT) and video on the timeline.

Automated import script implementation

This script demonstrates how to create a basic Kdenlive project XML structure and write the resource paths you specify into it.
import os
import subprocess

def create_kdenlive_project(project_path, video_path, audio_path, srt_path):
    """
    Create a basic Kdenlive XML project file and import assets
    """
    # Get the absolute path of the file to ensure Kdenlive can read it correctly
    video_abs = os.path.abspath(video_path)
    audio_abs = os.path.abspath(audio_path)
    srt_abs = os.path.abspath(srt_path)

    # Basic Kdenlive MLT structure (simplified version)
    kdenlive_xml = f"""<?xml version="1.0" encoding="UTF-8"?>
<mlt version="7.24.0" title="Auto Generated Project">
  <producer id="video_main" resource="{video_abs}"/>
  <producer id="audio_main" resource="{audio_abs}"/>
  <producer id="subtitle_main" resource="{srt_abs}"/>
  
  <playlist id="main_bin">
    <entry producer="video_main"/>
    <entry producer="audio_main"/>
    <entry producer="subtitle_main"/>
  </playlist>

  <tractor id="main_timeline">
    <multitrack>
      <track name="Video Track">
        <entry producer="video_main" in="0" out="1000"/>
      </track>
      <track name="Audio Track">
        <entry producer="audio_main" in="0" out="1000"/>
      </track>
    </multitrack>
  </tractor>
</mlt>
"""

    with open(project_path, "w", encoding="utf-8") as f:
        f.write(kdenlive_xml)
    print(f"Project file has been generated: {project_path}")

def open_with_kdenlive(project_path, kdenlive_exe_path):
    """
    Start Kdenlive and load the generated project
    """
    try:
        # Use subprocess to open the program and bring in file parameters
        subprocess.Popen([kdenlive_exe_path, project_path])
        print("Starting Kdenlive...")
    except Exception as e:
        print(f"Startup failed: {e}")

if __name__ == "__main__":
    # Set file path
    MY_VIDEO = "input_video.mp4"
    MY_AUDIO = "output_voice.wav"
    MY_SRT = "output_subtitle.srt"
    SAVE_PROJECT = "auto_project.kdenlive"
    
    # Kdenlive executable file path (Windows example, Linux usually uses 'kdenlive' directly)
    KDENLIVE_PATH = r"C:\Program Files\kdenlive\bin\kdenlive.exe"

    # 1. Generate project file
    create_kdenlive_project(SAVE_PROJECT, MY_VIDEO, MY_AUDIO, MY_SRT)
    
    # 2. Start Kdenlive
    open_with_kdenlive(SAVE_PROJECT, KDENLIVE_PATH)

Advanced automation suggestions



MLT multimedia framework

MLT (Media Lovin' Toolkit) core architecture

MLT is an open source multimedia framework and the underlying engine of editing software such as Kdenlive and Shotcut. It adopts non-linear editing (NLE) design, defines video, audio, filters and transitions as XML structures (called MLT XML), and performs real-time preview or rendering through efficient pipelines.

Components of MLT

Using Python to manipulate MLT XML examples

Instead of doing it manually in Kdenlive, you can use Python to generate MLT scripts to automate batch editing.
import subprocess

# Define a simple MLT XML structure
# This XML defines the playback order of two pieces of material.
mlt_xml_content = """<mlt>
  <producer id="clip1" resource="video_part1.mp4" />
  <producer id="clip2" resource="video_part2.mp4" />
  <playlist id="main_track">
    <entry producer="clip1" in="0" out="150" />
    <entry producer="clip2" in="0" out="300" />
  </playlist>
</mlt>
"""

#Write content to file
with open("auto_edit.mlt", "w", encoding="utf-8") as f:
    f.write(mlt_xml_content)

def render_video(mlt_file, output_file):
    """
    Use the melt command line tool to render videos directly (without opening the GUI)
    """
    # melt is the command line interface tool for MLT
    command = [
        "melt",
        mlt_file,
        "-consumer", f"avformat:{output_file}",
        "acodec=aac", "vcodec=libx264", "preset=fast"
    ]
    
    try:
        print(f"Start background rendering: {output_file}...")
        subprocess.run(command, check=True)
        print("Rendering completed!")
    except FileNotFoundError:
        print("Error: The melt executable file cannot be found, please confirm whether the MLT framework is installed.")

if __name__ == "__main__":
    #Perform rendering
    render_video("auto_edit.mlt", "final_result.mp4")

Why choose MLT for automation?



Python clipping automation

This script uses image recognition to position UI elements. Before executing, please capture the small icons of the "Image and Text into Movies" and "Generate Video" buttons in the editing interface and save them asbtn_start.pngandbtn_generate.pngStore it in the same directory as the program code.


Preparation

Please install the necessary Python libraries first:

pip install pyautogui pyperclip opencv-python

Automation code examples

import os
import time
importpyautogui
import pyperclip

# Set parameters
JIANYING_PATH = r"C:\Users\YourName\AppData\Local\JianyingPro\Apps\JianyingPro.exe" # Please replace it with your actual path
SCRIPT_FILE = "my_script.txt" # Pre-prepared script file
CONFIDENCE_LEVEL = 0.8 # Image recognition accuracy (0-1)

def run_automation():
    # 1. Read the content of the document
    if not os.path.exists(SCRIPT_FILE):
        print("Error: Document file not found")
        return
    with open(SCRIPT_FILE, "r", encoding="utf-8") as f:
        content = f.read()

    # 2. Turn on clipping
    print("Starting clipping...")
    os.startfile(JIANYING_PATH)
    time.sleep(8) # Wait for the software to fully load

    try:
        # 3. Locate and click the "Picture and Text into Film" button
        start_btn = pyautogui.locateCenterOnScreen('btn_start.png', confidence=CONFIDENCE_LEVEL)
        if start_btn:
            pyautogui.click(start_btn)
            print("You have entered the picture and text interface")
            time.sleep(2)
        else:
            print("Unable to locate the "Picture and Text into Film" button")
            return

        # 4. Process document input
        pyperclip.copy(content) # Copy the document to the clipboard
        pyautogui.click(x=pyautogui.size().width//2, y=pyautogui.size().height//2) # Click the center of the window to ensure focus
        pyautogui.hotkey('ctrl', 'v')
        print("The manuscript has been pasted")
        time.sleep(1)

        # 5. Locate and click "Generate Video"
        gen_btn = pyautogui.locateCenterOnScreen('btn_generate.png', confidence=CONFIDENCE_LEVEL)
        if gen_btn:
            pyautogui.click(gen_btn)
            print("Generating project...")
        else:
            print("Unable to locate the "Generate Video" button")

    except Exception as e:
        print(f"An error occurred: {e}")

if __name__ == "__main__":
    run_automation()

Implementation key details

step illustrate
Image capture When capturing images, try to capture only the text or icon in the center of the button, and avoid including too many background colors to increase compatibility under different background themes.
Time Sleep The most common reason for failure in automation is "the program clicked before the software responded." Please adjust according to computer performancetime.sleepvalue.
Fail-Safe PyAutoGUI has a built-in protection mechanism: quickly moving the mouse to the "upper left corner" of the screen can immediately terminate the program.

Optimization suggestions

  1. Resolution adaptation:If you change the computer and execute it, you must re-capture it..pngAs shown in the figure, recognition may fail due to changes in screen resolution or zoom ratio (DPI).
  2. Window on top:Recommended before clickingpygetwindowThe library forces the clipping window to be the active foreground window.
  3. Multiple confirmations:Instead of using a fixed wait time, you can use a loop to check whether the button exists and wait until the button appears before clicking.
Note: Frequent UI automation may become invalid due to software updates (interface location changes). If long-term stable operation is required, studying "Path 2: Modify JSON Draft" will be a more robust solution.


FFmpeg

Introduction

Common functions

advantage

Usage

Official website and download



FFmpeg automatic deployment

In multimedia development, it is a basic requirement to ensure that the execution environment has FFmpeg. Via PythonsubprocessModules andurllib, we can implement automated environment configuration process.


Core logic flow

The program code is mainly divided into two stages: detecting the system path and remote downloading and decompression.

Python implementation example

import os
import shutil
import platform
import urllib.request
import zipfile

def ensure_ffmpeg():
    # 1. Check whether the system PATH already has ffmpeg
    if shutil.which("ffmpeg"):
        print("FFmpeg already exists in the system path.")
        return True

    print("FFmpeg not detected, ready to start downloading...")
    
    # 2. Download information according to operating system settings (taking Windows as an example)
    if platform.system() == "Windows":
        url = "https://www.gyan.dev/ffmpeg/builds/ffmpeg-release-essentials.zip"
        target_zip = "ffmpeg.zip"
        extract_dir = "ffmpeg_bin"
        
        # Download file
        urllib.request.urlretrieve(url, target_zip)
        
        # decompress
        with zipfile.ZipFile(target_zip, 'r') as zip_ref:
            zip_ref.extractall(extract_dir)
            
        # Find the decompressed bin directory and add environment variables
        # The actual path depends on the compressed package structure.
        ffmpeg_path = os.path.abspath(os.path.join(extract_dir, "ffmpeg-release-essentials", "bin"))
        os.environ["PATH"] += os.pathsep + ffmpeg_path
        
        print(f"FFmpeg has been deployed to: {ffmpeg_path}")
        return True
    else:
        print("The current example only supports automatic download for Windows, please install manually for other systems.")
        return False

# Perform checks
ensure_ffmpeg()

Development considerations

project illustrate
Permissions issue Under Linux or macOS, the downloaded binary may need to beos.chmod(path, 0o755)Grant execution permissions.
version locked It is recommended to download from a reliable source (such as Gyan.dev or BtbN) and confirm that the version is compatible with your code.
Network timeout FFmpeg is larger in size, so it is recommended to add it when downloading.try-exceptTo handle network outages, or userequestsThe library displays a progress bar.

Path management suggestions

  1. Order of priority: The program should first look for user-defined paths, then look for system PATH, and finally perform automatic downloads.
  2. Persistence: It is recommended that the automatically downloaded FFmpeg be placed in the applicationAppDataOr the project root directory to avoid repeated downloads.
  3. Static version: For published applications, a statically compiled FFmpeg executable file is usually more stable than a dynamic download.
Note: In a production environment, frequent downloads of large binaries may impact user experience. It is recommended to prompt the user when starting for the first time, or to preload it in the installation package.


Python screen recording

The most common and stable method for screen recording in Python is to combinePyAutoGUI(for capturing images),OpenCV(for encoding and storing videos) andNumPy(for processing image data).


1. Preparation

First you need to install the necessary packages. Open a terminal and execute the following commands:

pip install opencv-python pyautogui numpy

2. Core implementation examples

The following code will capture the full screen image and save it as an output.mp4 file. Press the q key on your keyboard to stop recording.

import cv2
importpyautogui
import numpy as np

# Get screen resolution
SCREEN_SIZE = tuple(pyautogui.size())

# Define video encoding format (FourCC)
fourcc = cv2.VideoWriter_fourcc(*"mp4v")

# Create VideoWriter object (file name, encoding, frame rate, resolution)
out = cv2.VideoWriter("output.mp4", fourcc, 20.0, SCREEN_SIZE)

print("Recording... Press the 'q' key to stop.")

try:
    while True:
        # Capture screen
        img = pyautogui.screenshot()
        
        # Convert to NumPy array
        frame = np.array(img)
        
        # Convert color from RGB to BGR (OpenCV standard format)
        frame = cv2.cvtColor(frame, cv2.COLOR_RGB2BGR)
        
        # Write frames to video file
        out.write(frame)
        
        # Show video preview (optional)
        # cv2.imshow("Preview", frame)
        
        # Detect keyboard input
        if cv2.waitKey(1) == ord("q"):
            break
finally:
    # Release resources and close the window
    out.release()
    cv2.destroyAllWindows()
    print("The recording is over and the file has been saved.")

3. Description of key components


4. Frequently Asked Questions and Solutions

Problem phenomenon Reasons and suggestions
Video plays too fast The actual recorded FPS is lower than the set value. The writing FPS should be lowered, or a more efficient retrieval library such as mss should be used instead.
Color is abnormal Forgot to do the COLOR_RGB2BGR conversion.
Stuttering when executing the code Capturing a high-resolution screen is very CPU-intensive. It is recommended to lower the screen resolution or record only a specific area.


Manim - Python Animation

Manim (Mathematical Animation Engine) is an animation library written in Python, specifically used to create mathematical images and animations. Manim It can be used to generate high-quality animations that illustrate mathematical concepts, code execution processes, or anything else represented by images and animations.

Main features of Manim

How to use Manim

Manim animation is generally completed by writing Python scripts and then generating video files. Each animation usually contains one or more scenes (Scene), and each scene is composed of different objects (Mobject).

basic example

from manim import *

class MyFirstScene(Scene):
    def construct(self):
        text = Text("Hello, Manim!") # Create a text object
        self.play(Write(text)) # Generate animation

Install Manim

Manim can be installed via pip:

pip install manim


3D graphics and animation program development

OpenGL

OpenGL (Open Graphics Library) is a cross-language, cross-platform application programming interface (API) for rendering 2D and 3D vector graphics. It is maintained by the Khronos Group and is widely used in computer-aided design (CAD), virtual reality, scientific visualization, and video game development.


Drawing pipeline process

OpenGL uses a pipeline architecture to convert 3D data into pixels on the screen. Modern OpenGL core mode relies heavily on shaders:

Technical features and advantages

characteristic illustrate
Cross-platform compatibility Runs on Windows, Linux, macOS (via translation layer) and mobile devices (OpenGL ES).
State machine model OpenGL operates like a huge state machine. Developers set the state (such as current color, bound texture) and then execute drawing instructions.
GLSL language Use C-like OpenGL Shading Language to write GPU programs, which has powerful computing capabilities.
Extension mechanism Allow hardware manufacturers to introduce new graphics card functions through Extension without updating the API standard.

Core mode and immediate mode

  1. Immediate Mode:Early versions used glBegin/glEnd instructions, which were easy to learn but extremely inefficient and have now been deprecated.
  2. Core Profile:Modern development standards mandate the use of buffer objects (VBO/VAO) and shaders to maximize the performance of the hardware.
Note: Although Vulkan has been regarded as the successor of OpenGL, providing lower-level hardware control, OpenGL is still the first choice for learning graphics program development due to its relatively simple entry barrier and rich documentation.


ManimGL

Introduction

ManimGL is an efficient variant of Manim for making mathematical animations, focusing on OpenGL acceleration to improve rendering speed.

Install

Install using pip:

pip install manimgl

Or get the latest version from GitHub:

git clone https://github.com/ManimCommunity/ManimGL.git
cd ManimGL
pip install -e .

Basic use

Render a simple scene using ManimGL:

from manimlib import *

class HelloManim(Scene):
    def construct(self):
        text = Text("Hello, ManimGL!")
        self.play(Write(text))
        self.wait(2)

Run command:

manimgl script.py HelloManim

Main features

FAQ

If you encounter installation or operational problems, try:

Related resources



Blender 3D creation software

Blender is an open source and all-in-one 3D creation software that covers a complete pipeline from modeling, animation, rendering to compositing and video editing. Known for its powerful Cycles rendering engine and flexible Python API, it is a core tool for independent developers and small and medium-sized studios.


Core technology module

Blender's architecture is extremely compact and uses multiple dedicated engines to work together:

Comparison of technical characteristics

  • Cross-platform support
  • characteristic illustrate
    Python API Almost the entire UI and functions can be controlled through Python scripts, making it easy to develop add-ons. It natively supports Windows, macOS (Apple Silicon) and Linux, and the file format (.blend) is universal across all platforms.
    Integrated pipeline Built-in video editor (VSE) and compositor (Compositor), no need to switch software to complete post-production.

    Developer automation path

    For developers who need to batch process 3D materials or automate modeling, Blender provides a powerful background mode:

    1. Background rendering:Execute via command lineblender -b -P script.py, you can perform automated tasks without opening the graphical interface.
    2. bpy module:Blender's exclusive Python library can operate every vertex, material and animation frame in the scene.
    3. Custom UI:Developers can use Python to write custom panels and toolbars to optimize specific workflows.
    Note: Blender updates very quickly (about one version every three months). When developing scripts, you need to pay attention to API compatibility changes between different versions.


    Blender Python Mods

    bpyThe module is a Python API designed specifically for Blender that allows users to create, modify, and manage 3D images and animations through code within Blender.

    what isbpy

    bpyis the abbreviation of Blender Python, which is a set of function libraries that allow the use of Python scripts to operate the core functions of Blender. throughbpy, users can:

    bpyMain modules and features of

    bpyContains multiple sub-modules, each with a specific purpose:

    Simple example: Create a cube

    The following is usedbpySimple example of creating a cube:

    import bpy
    
    # Delete existing objects
    bpy.ops.object.select_all(action='SELECT')
    bpy.ops.object.delete(use_global=False)
    
    #Add cube
    bpy.ops.mesh.primitive_cube_add(size=2, enter_editmode=False, align='WORLD', location=(0, 0, 0))
        

    Why usebpy

    usebpyAllows you to automate repetitive tasks and produce complex models, animations and renderings. For professionals such as game designers, architects, and animators,bpyPowerful tools are provided to optimize workflow.

    References

    To learn more aboutbpyFor module details, please refer to the official documentation:Blender Python API Documentation



    Game program development

    Unity

    Unity is a powerful game development engine and platform designed for creating 2D and 3D games, interactive applications, and virtual reality (VR) and augmented reality (AR) experiences. It provides an easy-to-use interface and rich tools, suitable for both beginners and professional developers.

    1. Main features of Unity
    2. Unity’s core components
    3. Application scope of Unity
    4. Advantages of Unity

    Unity is a powerful and flexible development engine that provides developers with a wide range of application scenarios and tool support. Whether you are a beginner or a professional developer, you can use Unity to quickly create high-quality 2D and 3D games and interactive applications.



    Cocos game engine

    Cocos is the world's leading open source mobile game development framework, including the early pure code-driven Cocos2d-x and the modern full-featured editor Cocos Creator. Known for its lightweight, efficient and cross-platform support, it is the preferred tool for developing 2D and 3D mobile games and mini-games (such as WeChat mini-games and TikTok mini-games).


    Core product evolution

    The Cocos family is mainly divided into two important development stages to meet the needs of different development habits:

    Technical advantages and features

    characteristic illustrate
    Extremely cross-platform Supports iOS, Android, Windows, Mac and various web browsers and instant game platforms.
    High performance renderer The bottom layer uses the self-developed GFX abstraction layer, which supports multiple graphics backends such as Vulkan, Metal, DirectX and WebGL.
    Lightweight and bulky The engine core is compact and the packaged game starts up quickly, making it suitable for platforms with limited network environments or high reading speed requirements.
    TypeScript support Cocos Creator deeply integrates TypeScript, provides complete type checking and syntax prompts, and reduces the difficulty of maintaining large projects.

    Core functional components

    1. Scene management:Using Node and Component architecture, developers can easily manage complex hierarchical relationships.
    2. Physics engine:Built-in support for a variety of physics backends (such as Box2D, Bullet, Cannon.js), which can be switched according to project needs.
    3. UI system:Provides flexible layout components, coordinate conversion and automatic picture combining functions to greatly optimize interface rendering efficiency.
    4. Animation system:Supports skeletal animation (Spine, DragonBones), key frame animation and self-developed Marionette dynamic state machine.
    Note: Cocos Creator has now evolved to version 3.x, which fully integrates the core technologies of 2D and 3D. Developers can mix and produce 2D UI and 3D scenes in the same project.


    Sound program development

    Speech synthesis development

    core development process

    Developing a speech synthesis system is usually divided into three stages. first isFront-end processing, convert the original text into linguistic features (such as word segmentation, phonetic symbol conversion, prosody prediction); followed byacoustic model, map these features into acoustic representation (such as mel spectrum); finallyVocoder, responsible for reducing the acoustic representation into human-audible waveform audio.

    Mainstream development frameworks and libraries

    category Tools/Models Development features
    Open source framework Coqui TTS / ESPnet Modular design supports a large number of pre-trained models and Fine-tuning
    lightweight engine MeloTTS / Kokoro CPU friendly, suitable for edge computing or embedded devices
    Conversation optimization ChatTTS Designed specifically for spoken dialogue, supporting the insertion of laughter, catchphrases and other details
    Research grade model StyleTTS 2 / VITS Based on Generative Adversarial Network (GAN), the sound quality is extremely close to real people

    Custom model training (Fine-tuning)

    To develop a TTS with a specific timbre, you need to prepare a high-quality data set (usually 1 to 10 hours of recordings and corresponding text). Commonly used by developersTransfer Learningtechnology, fine-tuning on large base models, significantly reduces data volume requirements and improves sound similarity and naturalness.

    API integration development

    For most application developers, directly calling mature cloud APIs is the most efficient solution. For exampleElevenLabs APIProvide strong emotional expression,Microsoft Azure Speech SDKProvides the most complete SSML (Speech Synthesis Markup Language) support, allowing developers to precisely control pauses, stress, and tone through tags. also,OpenAI TTS APIWith its simple interface and extremely low reasoning delay, it is very popular in real-time interactive applications.

    Technical selection suggestions

    In the early stages of development, it is recommended to give priority to the balance between "latency (RTF)" and "sound quality". If it is applied to real-time customer service, low-latency streaming (Streaming) is the key; if it is applied to audio books, priority should be given to pursuing a model with long text processing capabilities and a rich sense of rhythm. In addition, it is necessary to pay attention to the G2P (character to phoneme) support status of each language, which directly determines the correct understanding of pronunciation.



    CosyVoice 2

    CosyVoice 2 is an advanced version of Alibaba’s open source speech synthesis (TTS) model. Compared with the first generation, it has achieved significant breakthroughs in pronunciation accuracy, fine-grained emotion control, and streaming reasoning latency. It not only supports high-quality tone cloning, but also introduces command-controllable technology to make AI speech more "human".


    Core technology upgrade

    CosyVoice 2 uses "text-speech language model" and "Flow Matching" technology to achieve end-to-end speech generation:

    Functional Features Comparison

    Function CosyVoice 2 Description
    Multi-language support Supports Chinese, English, Japanese, Korean and multiple dialects (Cantonese, Sichuan, Shanghainese, Tianjin, etc.).
    emotion/command control Voice emotion and speaking speed can be controlled through commands (such as "Speak happily", "Speak angrily").
    3 seconds super fast cloning Zero-shot high-fidelity sound reproduction can be achieved with just 3 to 10 seconds of sample audio.
    mixed language synthesis It supports mixing Chinese and English multiple languages ​​​​in the same text, and the timbre remains highly consistent.

    Application scenarios and development suggestions

    1. Intelligent customer service and virtual assistant:Leverage its ultra-low latency of 150ms to create a responsive and emotional dialogue system.
    2. Audiobooks and film dubbing:Through fine-grained tone tag control, the emotional ups and downs and speaking styles of different characters can be simulated.
    3. Education and dialect protection:It has a rich built-in dialect data set that can be used for digital dialect teaching or local cultural content creation.
    Note: When deploying CosyVoice 2 locally, it is recommended to equip an NVIDIA graphics card with at least 8GB of video memory and use the officially recommendedvLLMAccelerate the framework for optimal RTF (real-time rate) performance.


    CozyVoice 2 basic usage

    CosyVoice 2 is developed based on Python. Since it involves complex audio processing and deep learning environments, it is strongly recommended to use it.CondaVirtual environment for isolated installation. Currently, Linux has the highest official support, and Windows users are recommended to deploy through WSL2 or a specific community modified version.


    1. Environment preparation and installation

    Before starting, please make sure your system has the NVIDIA driver (recommended 8GB or more video memory) and Conda installed.

    1. Create a virtual environment:
      conda create -n cosyvoice2 python=3.10
      conda activate cosyvoice2
    2. Install key dependencies Pynini:

      Pynini is the core component that handles text normalization and must be installed through conda:

      conda install -y -c conda-forge pynini==2.1.5
    3. Copy the project and install dependencies:
      git clone --recursive https://github.com/FunAudioLLM/CosyVoice.git
      cd CosyVoice
      pip install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple/

    2. Model download

    CosyVoice 2 requires downloading pre-trained model weights. You can automate the download via a Python script:

    from modelscope import snapshot_download
    # Download 0.5B main model
    snapshot_download('iic/CosyVoice2-0.5B', local_dir='pretrained_models/CosyVoice2-0.5B')
    # Download text normalization resources
    snapshot_download('iic/CosyVoice-ttsfrd', local_dir='pretrained_models/CosyVoice-ttsfrd')

    3. Basic usage

    CozyVoice 2 offers a variety of modes to suit your needs, from quick dubbing to professional cloning:

    usage pattern Operating Instructions Applicable scenarios
    Start WebUI implementpython webui.py, open the visual interface in the browser. Manual dubbing and quick test effects.
    3 seconds extremely fast reproduction Upload 3-10 seconds of reference audio and corresponding text to achieve sound cloning. Personalized voice package, self-media dubbing.
    Cross-language/dialect Input Chinese text and select Cantonese or Sichuan dialect tone output. Localized content production.
    command control Add a command before the text (eg: [laughter], [angry]). Audiobooks, dramatized voice-overs.

    4. Developer API calling examples

    If you want to integrate CosyVoice 2 into your own Python project (such as Kdenlive's automation script):

    from cosyvoice.cli.cosyvoice import CosyVoice2
    import torchaudio
    
    #Initialize the model
    cosyvoice = CosyVoice2('pretrained_models/CosyVoice2-0.5B')
    
    # Execute inference (taking pre-trained timbre as an example)
    output = cosyvoice.inference_sft('Hello, I am an artificial intelligence voice assistant.', 'Chinese female')
    
    # Save message
    torchaudio.save('output.wav', output['tts_speech'], cozyvoice.sample_rate)
    Note: If you are installing on Windows and encountersoxor compilation error, please refer to GitHub Issue #1046, or try to use one-click installation package.


    CosyVoice 2 long article synthesis

    Text segmentation and grammatical integrity

    The core of long article generation lies in preprocessing. Since TTS models usually have an upper limit on inference length (Context Window), directly inputting too long text will cause the model output to be garbled or directly cut off. Regular expressions are used in the program code to accurately capture the punctuation at the end of the sentence, ensuring that the segmentation point is at the pause of the tone, maintaining the naturalness of the synthesized voice quality.

    Tensor splicing technology

    This code uses PyTorch's native method torch.cat to process information. Compared to saving each audio segment into a file and then merging it, splicing Tensors directly in GPU/CPU memory can significantly reduce disk I/O overhead and effectively eliminate digital noise that may occur between segments.

    Hardware resources and performance

    The CosyVoice2 model is large and it is recommended to run in an environment with NVIDIA GPU for the best generation speed. When processing long articles, the system will perform reasoning segment by segment. If the video memory (VRAM) is small, the limit parameter in the segment_text function can be appropriately lowered in order to achieve a more stable execution process.
    import os
    import torch
    import torchaudio
    import re
    from cosyvoice.cli.cosyvoice import CosyVoice
    
    #Initialize CosyVoice2 model
    # Make sure the path points to the folder containing the core weights and configuration files
    cosyvoice = CosyVoice('pretrained_models/CosyVoice2-0.5B')
    
    def segment_text(text, limit=80):
        """
        Divide long articles into segments of appropriate length based on punctuation marks to avoid interruptions in speech generation or memory overflow.
        """
        # Lock common end-of-sentence punctuation in Chinese and English
        pattern = r'([.!?;!\?\n])'
        parts = re.split(pattern, text)
        
        chunks = []
        current = ""
        for i in range(0, len(parts)-1, 2):
            sentence = parts[i] + parts[i+1]
            if len(current) + len(sentence) <= limit:
                current += sentence
            else:
                if current:
                    chunks.append(current.strip())
                current=sentence
        
        if current:
            chunks.append(current.strip())
        return [c for c in chunks if c]
    
    def run_tts_pipeline(text, spk_id, file_name):
        """
        Perform long text inference and combine information at the Tensor level
        """
        text_list = segment_text(text)
        combined_tensors = []
        
        print(f"Processing, the article has been divided into {len(text_list)} sections")
        
        for idx, segment in enumerate(text_list):
            # Call CosyVoice2 inference interface
            # Can be switched to inference_zero_shot to use reference audio
            result = cozyvoice.inference_sft(segment, spk_id)
            combined_tensors.append(result['tts_speech'])
            print(f"Completed: {idx + 1}/{len(text_list)}")
    
        if combined_tensors:
            # Use torch.cat for seamless splicing
            final_audio = torch.cat(combined_tensors, dim=1)
            # Save as wav, the recommended sampling rate is 22050Hz
            torchaudio.save(file_name, final_audio, 22050)
            print(f"Task successful! File saved to: {file_name}")
    
    if __name__ == "__main__":
        long_content = "Paste the content of your long article here, and this code will automatically handle segmentation and merging."
        run_tts_pipeline(long_content, 'Chinese female', 'output_v2.wav')
    


    CosyVoice 2 subtitle synchronization generation

    Timestamp alignment logic

    To generate accurate SRT subtitles, the core is to obtain the accurate length (Duration) of each piece of audio. When PyTorch processes audio tensors, the exact number of seconds can be calculated through the sample rate (Sample Rate) and tensor length, thereby establishing a correspondence between text and the timeline.

    Automated generation process

    While performing speech synthesis, this program will record the start and end time of each text, and automatically format it into a standard SRT file for easy import into Kdenlive or other editing software.
    import torch
    import torchaudio
    import re
    from cosyvoice.cli.cosyvoice import CosyVoice
    
    #Initialize CosyVoice 2
    cosyvoice = CosyVoice('pretrained_models/CosyVoice2-0.5B')
    
    def format_srt_time(seconds):
        """Convert seconds to SRT time format HH:MM:SS,mmm"""
        milliseconds = int((seconds - int(seconds)) * 1000)
        seconds = int(seconds)
        minutes, seconds = divmod(seconds, 60)
        hours, minutes = divmod(minutes, 60)
        return f"{hours:02}:{minutes:02}:{seconds:02},{milliseconds:03}"
    
    def generate_audio_and_srt(full_text, speaker_id, output_wav, output_srt):
        # Split long articles according to punctuation marks
        segments = re.split(r'([.!?;!\?\n])', full_text)
        chunks = []
        for i in range(0, len(segments)-1, 2):
            text = (segments[i] + segments[i+1]).strip()
            if text: chunks.append(text)
    
        audio_list = []
        srt_entries = []
        current_time = 0.0
        sample_rate = 22050
    
        print(f"Start processing {len(chunks)} text...")
    
        for i, chunk in enumerate(chunks):
            # Reasoning to generate speech tensor
            output = cosyvoice.inference_sft(chunk, speaker_id)
            audio_tensor = output['tts_speech']
            audio_list.append(audio_tensor)
    
            # Calculate the number of seconds this audio segment lasts (tensor length / sampling rate)
            duration = audio_tensor.shape[1] / sample_rate
            end_time = current_time + duration
    
            # Create SRT entries
            srt_entries.append(
                f"{i+1}\n"
                f"{format_srt_time(current_time)} --> {format_srt_time(end_time)}\n"
                f"{chunk}\n"
            )
    
            current_time = end_time
            print(f"Alignment of segment {i+1} completed")
    
        # Merge and save audio
        combined_audio = torch.cat(audio_list, dim=1)
        torchaudio.save(output_wav, combined_audio, sample_rate)
    
        # Save SRT file
        with open(output_srt, 'w', encoding='utf-8') as f:
            f.write("\n".join(srt_entries))
    
        print(f"Completed! Audio: {output_wav}, subtitles: {output_srt}")
    
    if __name__ == "__main__":
        article = "This is a long article example. [laughter] We can accurately calculate the time of each sentence. In this way, it will be automatically aligned when imported into Kdenlive."
        generate_audio_and_srt(article, 'Chinese female', 'output.wav', 'output.srt')
    

    Things to note when importing into editing software

    When using the generated SRT and WAV files, please note the following points:

    CozyVoice 2 Command Control

    Emotional and nonverbal symbolic control

    CozyVoice 2 supports inserting specific tags into text to control the emotional expression of the voice or add non-verbal actions. These tags can significantly improve the fidelity of synthesized speech, making AI no longer just a rigid reading, but an expression with emotional ups and downs.

    Core tag list

    When using, please embed the tag directly into the text. It is recommended to leave appropriate spaces before and after the tag to achieve the best connection effect:

    Long article processing and tag embedding

    When dealing with long text, the logic should include preserving these special tags to ensure that the segmentation algorithm does not cut the tags off. The following code shows how to apply these instructions in a long text flow.
    import os
    import torch
    import torchaudio
    import re
    from cosyvoice.cli.cosyvoice import CosyVoice
    
    #Initialize the model
    cosyvoice = CosyVoice('pretrained_models/CosyVoice2-0.5B')
    
    def segment_text_with_tags(text, limit=100):
        """
        Split long text while ensuring tags like [laughter] are not cut
        """
        # Match Chinese punctuation marks and newlines
        pattern = r'([.!?;!\?\n])'
        parts = re.split(pattern, text)
        
        chunks = []
        current = ""
        for i in range(0, len(parts)-1, 2):
            sentence = parts[i] + parts[i+1]
            if len(current) + len(sentence) <= limit:
                current += sentence
            else:
                if current:
                    chunks.append(current.strip())
                current=sentence
        
        if current:
            chunks.append(current.strip())
        return chunks
    
    def generate_expressive_audio(text, spk_id, output_path):
        """
        Generate long speech containing emotional instructions
        """
        segments = segment_text_with_tags(text)
        audio_data = []
    
        for idx, seg in enumerate(segments):
            # Use instruct mode for better tag execution
            # If you use sft mode, basic tags are also supported, but instruct mode is more precise for emotional control.
            output = cosyvoice.inference_instruct(seg, spk_id, 'Control tone and emotion')
            audio_data.append(output['tts_speech'])
            print(f"Processing paragraphs {idx+1}/{len(segments)}")
    
        if audio_data:
            final_wav = torch.cat(audio_data, dim=1)
            torchaudio.save(output_path, final_wav, 22050)
            print(f"Message containing emotion command has been saved: {output_path}")
    
    if __name__ == "__main__":
        #Example: long text embedding emotion tags
        rich_text = "This is great news! [laughter] I can't believe it. [surprise] But if this messes up, [angry] I will be very angry."
        generate_expressive_audio(rich_text, 'Chinese female', 'expressive_output.wav')
    


    Speech recognition development

    Development process and key stages

    Developing an ASR (Automatic Speech Recognition) system usually follows the following core path. first isaudio preprocessing(such as noise reduction, VAD voice activity detection and feature extraction); then enterModel inference, converting acoustic signals into text probability; finally throughPost-processing(such as punctuation recovery, inverse text normalization ITN) to produce the final text. Modern development trends have shifted from traditional HMM to "End-to-End" neural network architecture, which greatly simplifies development complexity.

    Mainstream ASR development models and frameworks

    category Tools/Models Development features for 2026
    base model OpenAI Whisper (V3) Industry standard, with strong noise immunity and multi-language support, it is most suitable for transcribing long audio files.
    Live streaming NVIDIA Parakeet-TDT Designed for ultra-low latency, supports streaming, and is suitable for AI voice assistants.
    Domestic optimization FunASR / Yating engine It is deeply optimized for Chinese, Chinese-English mixed and Taiwanese accents, and supports timestamp and speaker recognition.
    Deployment framework Faster-Whisper / Sherpa-ONNX Significantly improves inference speed and reduces memory usage, making it suitable for running on edge devices or local servers.

    Technical indicators faced by developers

    When developing an ASR system, focus on monitoringCER (Character Error Rate)to assess accuracy. For immediate applications,RTF (real time factor)andLatencyCrucially, it is important to ensure that speech processing is much faster than speaking. The development focus in 2026 has shifted to "long text memory" and "context awareness", such as integrating LLM to correct identification biases in professional terminology or specific industries.

    API and cloud integrated development

    If developers pursue rapid launch, they usually call cloud APIs.DeepgramandAssemblyAIIt will be favored in 2026 for its low latency and rich metadata (such as emotion detection, key summaries).Microsoft Azure Speech SDKIt provides the most complete custom model fine-tuning (Custom Speech) interface, allowing developers to upload text data in specific fields to solve the problem of inaccurate recognition of special vocabulary such as medical and legal.

    Deployment and environment selection recommendations

    For individual developers, it is recommended to useHugging Face TransformersLibrary matchingPyTorchRun a quick experiment. If the application scenario involves privacy (such as medical records), you should useWhisper.cpporVoskPerform a completely offline local deployment. If you need to build a large-scale voice service, it is recommended to useTriton Inference ServerorDockerContainerization technology enables efficient scheduling and expansion of the ASR model.



    JavaScript drawing

    Canvas and Context

    Basic concepts of Canvas

    HTML5<canvas>An element is an area that can be drawn using JavaScript, allowing 2D and 3D to be rendered on the web page images. It is a container that can perform drawing operations through programming code, such as drawing lines, graphics and pictures. It is suitable for applications such as games and graphics editing that require real-time generation.

    The following iscanvasBasic syntax of elements:

    <canvas id="myCanvas" width="500" height="500"></canvas>

    The role of getContext

    to be incanvasTo draw content on an element, you must usegetContextmethod. This method allows to obtain the drawing context, currently the most commonly used option is "2d". it will return aCanvasRenderingContext2DObject, providing many drawing methods.

    For example, the following JavaScript code getscanvas2D drawing context for:

    var canvas = document.getElementById("myCanvas");
    var ctx = canvas.getContext("2d");

    Basic drawing operations

    usegetContext("2d")The obtained drawing context can perform basic drawing operations such as drawing lines, drawing rectangles, and filling colors. For example:

    Sample code:

    ctx.fillStyle = "blue";
    ctx.fillRect(50, 50, 100, 100); // Draw a blue rectangle
    ctx.strokeStyle = "red";
    ctx.beginPath();
    ctx.moveTo(0, 0);
    ctx.lineTo(200, 200);
    ctx.stroke(); // Draw a red line

    Clear Canvas

    To clearcanvasImages in can be usedclearRect(x, y, width, height)method. For example, the code to clear the entire canvas is:

    ctx.clearRect(0, 0, canvas.width, canvas.height);

    Dynamic Drawing and Animation

    userequestAnimationFrame()Smooth animation effects can be achieved. Dynamic effects can be drawn by clearing the contents of the previous frame before updating the screen each time. Here is a simple animation example:

    function draw() {
    ctx.clearRect(0, 0, canvas.width, canvas.height);
    ctx.fillRect(x, y, 50, 50); // Draw a square
    x += 1; // update position
    requestAnimationFrame(draw);
    }
    draw();

    Things to note when using Canvas

    The size of the Canvas should be set in HTML. Changing the size using CSS may cause image distortion. also,canvasNot intended to replace high-resolution images, but to be used for instant generation and dynamic drawing.



    style.transform

    Basic concepts

    style.transformIt is one of the properties of CSS and can be used to perform 2D or 3D transformation operations such as rotation, scaling, displacement, and tilt on elements.

    scale()It is the "zoom" function, and the syntax is:

    
    transform: scale(sx [, sy]);
    

    in:

    ---

    JavaScript setting method

    const el = document.getElementById("target");
    el.style.transform = "scale(1.5)"; // Both x and y are enlarged by 1.5 times
    el.style.transform = "scale(1.5, 0.5)"; // Zoom in 1.5 times horizontally and reduce it to half vertically
    ---

    It has nothing to do with width and height, but it will affect the visual size

    scale()is a "visual transformation" that does not change the actual DOM properties of the element (e.g.offsetWidthorclientWidth), but will changegetBoundingClientRect()return value.

    el.getBoundingClientRect().width // Will reflect the influence of scale
    el.offsetWidth //Original width, not affected by scale
    ---

    Common applications

    ---

    example

    
    <style>
      #box {
        width: 100px;
        height: 100px;
        background: skyblue;
        transition: transform 0.3s;
      }
      #box:hover {
        transform: scale(1.5);
      }
    </style>
    
    <div id="box"></div>
    
    ---

    Things to note



    Canvas drawing pie chart

    Example description

    The following example is from HTML<table>Read data, use native<canvas>API'sarc()Draw pie charts without any external packages.

    ---

    HTML structure

    <table id="dataTable" border="1" style="margin:10px auto;">
      <tr><th>Category</th><th>Value</th></tr>
      <tr><td>Apple</td><td>30</td></tr>
      <tr><td>Banana</td><td>15</td></tr>
      <tr><td>Cherry</td><td>25</td></tr>
      <tr><td>Mango</td><td>20</td></tr>
    </table>
    
    <canvas id="pieCanvas" width="400" height="400" style="display:block; margin:auto; border:1px solid #aaa;"></canvas>
    ---

    JavaScript program

    
    const table = document.getElementById("dataTable");
    const canvas = document.getElementById("pieCanvas");
    const ctx = canvas.getContext("2d");
    
    const labels = [];
    const values = [];
    
    for (let i = 1; i < table.rows.length; i++) { // 跳過表頭
      const row = table.rows[i];
      labels.push(row.cells[0].textContent);
      values.push(parseFloat(row.cells[1].textContent));
    }
    
    // 計算總和
    const total = values.reduce((a, b) =>a + b, 0);
    
    //Draw a pie chart
    let startAngle = 0;
    const centerX = canvas.width / 2;
    const centerY = canvas.height / 2;
    const radius = 120;
    
    // Automatic color matching
    const colors = ["#FF6384", "#36A2EB", "#FFCE56", "#4BC0C0", "#9966FF", "#FF9F40"];
    
    for (let i = 0; i< values.length; i++) {
      const sliceAngle = (values[i] / total) * 2 * Math.PI;
      const endAngle = startAngle + sliceAngle;
    
      // 畫圓餅區塊
      ctx.beginPath();
      ctx.moveTo(centerX, centerY);
      ctx.arc(centerX, centerY, radius, startAngle, endAngle);
      ctx.closePath();
      ctx.fillStyle = colors[i % colors.length];
      ctx.fill();
    
      // 標示文字
      const midAngle = startAngle + sliceAngle / 2;
      const textX = centerX + Math.cos(midAngle) * (radius + 20);
      const textY = centerY + Math.sin(midAngle) * (radius + 20);
      ctx.fillStyle = "black";
      ctx.font = "14px sans-serif";
      ctx.textAlign = "center";
      ctx.fillText(labels[i], textX, textY);
    
      startAngle = endAngle;
    }
    
    // 標題
    ctx.font = "16px bold sans-serif";
    ctx.textAlign = "center";
    ctx.fillText("水果銷售比例", centerX, centerY - radius - 30);
    
    ---

    illustrate

    ---

    extend

    You can add mouse events (such as hover to zoom in or show percentage), or userequestAnimationFrame()Add animation effects. Do you want me to add a mouse hover version that displays the data percentage?



    SVG

    concept

    SVG (Scalable Vector Graphics) is an XML-based vector graphics format that can draw lines, graphics, and text on web pages, and supports scaling and animation. Unlike bitmaps, SVG will not be distorted when zoomed in or out, making it suitable for applications such as charts, ICONs, maps, and flowcharts.

    Features

    Basic grammar example

    <svg width="200" height="100">
      <rect x="10" y="10" width="50" height="50" fill="blue" />
      <circle cx="100" cy="35" r="25" fill="green" />
      <line x1="150" y1="10" x2="190" y2="60" stroke="red" stroke-width="2" />
      <text x="10" y="90" font-size="14" fill="black">This is SVG</text>
    </svg>

    Common elements

    events and interactions

    <svg width="100" height="100">
      <circle cx="50" cy="50" r="40" fill="orange" onclick="alert('You clicked on the circle')" />
    </svg>

    Animations and styles

    permeableCSSor<animate>Label animation:

    
    <circle cx="30" cy="50" r="20" fill="blue">
      <animate attributeName="cx" from="30" to="170" dur="2s" repeatCount="indefinite" />
    </circle>
    

    Combined with JavaScript

    
    <svg id="mysvg" width="200" height="100">
      <circle id="c1" cx="50" cy="50" r="30" fill="gray" />
    </svg>
    
    <script>
      document.getElementById("c1").setAttribute("fill", "red");
    </script>
    

    Application scope

    in conclusion

    SVG is one of the very important graphics standards in web front-ends. It has high resolution, interactivity and animation, and can be seamlessly integrated with HTML/CSS/JavaScript. Suitable for graphic representation scenarios that require precise, scalable performance.



    SVG reuse pattern

    Purpose

    Available in SVG via<symbol>or<defs>Define the pattern once and use it again<use>Repeat references elsewhere, saving code and improving consistency.

    basic grammar

    
    <svg width="0" height="0" style="position:absolute">
      <symbol id="star" viewBox="0 0 100 100">
        <polygon points="50,5 61,39 98,39 68,59 79,91 50,70 21,91 32,59 2,39 39,39"
                 fill="gold" stroke="black" stroke-width="2"/>
      </symbol>
    </svg>
    
    <svg width="200" height="100">
      <use href="#star" x="0" y="0" width="50" height="50"/>
      <use href="#star" x="60" y="0" width="50" height="50" fill="red"/>
      <use href="#star" x="120" y="0" width="50" height="50" fill="blue"/>
    </svg>
    

    exhibit

    illustrate

    Property inheritance

    <use>Can be changedfillstrokeand other attributes, overwriting the original definition.

    Application scope

    compatibility

    in conclusion

    through<symbol> + <use>, SVG can realize componentized and modular graphics development, which can be reused and conveniently manage styles and positions. It is very suitable for graphic design and data visualization applications.



    WebGL

    concept

    WebGL (Web Graphics Library) is a set of JavaScript APIs based on OpenGL ES that can use HTML5 in the browser<canvas>Elements perform hardware-accelerated drawing of 2D and 3D graphics without any plug-ins.

    Features

    Simple example

    Draw a colored triangle:

    <canvas id="glCanvas" width="300" height="300"></canvas>
    <script>
      const canvas = document.getElementById('glCanvas');
      const gl = canvas.getContext('webgl');
    
      if (!gl) {
        alert("Your browser does not support WebGL");
      }
    
      const vertexShaderSource = `
        attribute vec2 a_position;
        void main() {
          gl_Position = vec4(a_position, 0, 1);
        }
      `;
    
      const fragmentShaderSource = `
        void main() {
          gl_FragColor = vec4(1, 0, 0, 1); // red
        }
      `;
    
      function createShader(gl, type, source) {
        const shader = gl.createShader(type);
        gl.shaderSource(shader, source);
        gl.compileShader(shader);
        return shader;
      }
    
      const vertexShader = createShader(gl, gl.VERTEX_SHADER, vertexShaderSource);
      const fragmentShader = createShader(gl, gl.FRAGMENT_SHADER, fragmentShaderSource);
    
      const program = gl.createProgram();
      gl.attachShader(program, vertexShader);
      gl.attachShader(program, fragmentShader);
      gl.linkProgram(program);
    
      gl.useProgram(program);
    
      const positionBuffer = gl.createBuffer();
      gl.bindBuffer(gl.ARRAY_BUFFER, positionBuffer);
      gl.bufferData(gl.ARRAY_BUFFER, new Float32Array([
        0, 1,
       -1, -1,
        1, -1
      ]), gl.STATIC_DRAW);
    
      const posAttribLoc = gl.getAttribLocation(program, "a_position");
      gl.enableVertexAttribArray(posAttribLoc);
      gl.vertexAttribPointer(posAttribLoc, 2, gl.FLOAT, false, 0, 0);
    
      gl.clearColor(0, 0, 0, 1);
      gl.clear(gl.COLOR_BUFFER_BIT);
      gl.drawArrays(gl.TRIANGLES, 0, 3);
    </script>

    show

    Application scope

    Commonly used kits

    in conclusion

    WebGL provides web developers with GPU-accelerated 3D graphics rendering capabilities and is one of the core technologies for modern web games, digital art, simulation and visualization. Although native WebGL is relatively low-level, it can be used with high-order function libraries to simplify the development process.



    Spirograph

    What is a Spirograph?

    SpirographIt is a geometric pattern used to create complex shapes. The principle is to use the rotation of two circles to depict multiple circles and wavy curves. This type of graphics is often used in artistic creation and education to show the geometric beauty of mathematics.

    Implementing Spirograph in HTML5

    Here's an example of implementing a Spirograph using HTML5's <canvas> element and JavaScript:



    Vector Graphics JavaScript Libraries Comparison

    library name grammatical expressiveness Graphic type Suitable for objects Whether to support interaction Whether to support animation
    Mermaid.js Extremely high (using Markdown-like syntax) Flow chart, sequence chart, Gantt chart, ER chart, Class chart Document visualization, rapid prototyping Limited support Partial support
    D3.js Medium (needs to understand data binding and DOM operations) Almost any graphics (extremely customizable) Advanced data visualization developer Full support Full support
    Cytoscape.js High (nodes and edges defined in JSON) Network diagram, flow chart Bioinformatics, social network analysis Full support Partial support
    Vega / Vega-Lite High (use JSON declarative description of chart) Statistical charts (bar charts, scatter charts, etc.) Data Science, Dashboard Design support Partial support
    Graphviz via Viz.js High (DOT syntax is similar to text programming) Flow chart, graph theory structure Academic use, quick architecture diagram Not supported Not supported
    JSXGraph High (geometric semantics are clear) Geometric figures, coordinate diagrams mathematics education support support


    Chart.js

    Overview

    Chart.js is an open source, lightweight and powerful JavaScript chart drawing function library. Available in HTML5<canvas>Draw various interactive diagrams on elements. It is known for its simple API, beautiful default styles and highly customizable options. Suitable for quickly visualizing data on websites or applications.

    ---

    Main features

    ---

    Installation and use

    1. CDN loading

    
    <script src="https://cdn.jsdelivr.net/npm/chart.js"></script>
    

    2. NPM installation

    
    npm install chart.js
    

    3. Basic usage examples

    <canvas id="myChart"></canvas>
    
    <script>
    const ctx = document.getElementById('myChart').getContext('2d');
    
    new Chart(ctx, {
      type: 'bar',
      data: {
        labels: ['red', 'blue', 'yellow', 'green', 'purple', 'orange'],
        datasets: [{
          label: 'votes',
          data: [12, 19, 3, 5, 2, 3],
          backgroundColor: [
            'rgba(255, 99, 132, 0.6)',
            'rgba(54, 162, 235, 0.6)',
            'rgba(255, 206, 86, 0.6)',
            'rgba(75, 192, 192, 0.6)',
            'rgba(153, 102, 255, 0.6)',
            'rgba(255, 159, 64, 0.6)'
          ],
          borderWidth: 1
        }]
      },
      options: {
        responsive: true,
        scales: {
          y: { beginAtZero: true }
        }
      }
    });
    </script>
    ---

    Common chart types

    chart typeSet typeInstructions for use
    Line chartlineDisplay time series or trend data.
    bar chartbarCompare values ​​from different categories.
    pie chartpieShows overall proportional distribution.
    donut chartdoughnutA variation of the pie chart, the center can be left blank to display the title.
    radar chartradarComparison of multidimensional data.
    polar region mappolarAreaThe effect of combining round cakes and strips.
    ---

    version check

    You can check the version of Chart.js using:

    
    console.log(Chart.version);
    
    ---

    Advantages and Disadvantages

    advantage:

    shortcoming:

    ---

    official resources



    Draw a pie chart - Chart.js

    Example description

    The following example demonstrates how to extract an HTML<table>Read the data and use JavaScript to dynamically draw a pie chart. This example usesChart.js, easy to use and supports automatic color matching and animation.

    ---

    HTML structure

    <!-- Table data -->
    <table id="dataTable" border="1" style="margin:10px auto;">
      <tr><th>Category</th><th>Value</th></tr>
      <tr><td>Apple</td><td>30</td></tr>
      <tr><td>Banana</td><td>15</td></tr>
      <tr><td>Cherry</td><td>25</td></tr>
      <tr><td>Mango</td><td>20</td></tr>
    </table>
    
    <!-- Pie chart container -->
    <canvas id="pieChart" width="400" height="400"></canvas>
    
    <!-- Load Chart.js -->
    <script src="https://cdn.jsdelivr.net/npm/chart.js"></script>
    ---

    JavaScript program

    //Read table data
    const table = document.getElementById("dataTable");
    const labels = [];
    const values = [];
    
    for (let i = 1; i< table.rows.length; i++) { // 跳過表頭
      const row = table.rows[i];
      labels.push(row.cells[0].textContent);
      values.push(parseFloat(row.cells[1].textContent));
    }
    
    // 建立 Chart.js 圓餅圖
    const ctx = document.getElementById("pieChart").getContext("2d");
    new Chart(ctx, {
      type: "pie",
      data: {
        labels: labels,
        datasets: [{
          data: values,
          backgroundColor: [
            "rgba(255, 99, 132, 0.7)",
            "rgba(54, 162, 235, 0.7)",
            "rgba(255, 206, 86, 0.7)",
            "rgba(75, 192, 192, 0.7)"
          ],
          borderColor: "white",
          borderWidth: 2
        }]
      },
      options: {
        responsive: true,
        plugins: {
          legend: { position: "bottom" },
          title: { display: true, text: "水果銷售比例" }
        }
      }
    });
    
    ---

    illustrate

    ---

    Extended application

    If you want to draw in pure JavaScript (without using an external library), you can useCanvasRenderingContext2D.arc()Draw the fan shape yourself. Want me to show you the "without Chart.js" version?



    Example of drawing UML diagram using HTML

    1. Use SVG to draw a simple category diagram

    In HTML you can use<svg>tags to draw basic UML class diagrams. Here's an example of how to use a rectangle and text to represent a simple category.

    <svg width="300" height="200">
        <rect x="50" y="20" width="200" height="30" fill="lightblue" stroke="black"/>
        <text x="60" y="40" font-family="Arial" font-size="16">Class Name</text>
        
        <rect x="50" y="50" width="200" height="50" fill="white" stroke="black"/>
        <text x="60" y="70" font-family="Arial" font-size="14">+ attribute1 : Type</text>
        <text x="60" y="90" font-family="Arial" font-size="14">+ attribute2 : Type</text>
        
        <rect x="50" y="100" width="200" height="50" fill="white" stroke="black"/>
        <text x="60" y="120" font-family="Arial" font-size="14">+ method1() : ReturnType</text>
        <text x="60" y="140" font-family="Arial" font-size="14">+ method2() : ReturnType</text>
    </svg>
    
    Class Name + attribute1 : Type + attribute2 : Type + method1() : ReturnType + method2() : ReturnType

    2. Customize UML elements using HTML and CSS

    Different UML elements can be defined using HTML and CSS styles. The following example shows how to use<div>andCSSto draw a category box and adjust its style to mimic UML Structure of category diagrams.

    <style>
    .class-box {
        width: 200px;
        border: 1px solid black;
        margin: 10px;
    }
    .header {
        background-color: lightblue;
        text-align: center;
        font-weight: bold;
    }
    .attributes, .methods {
        padding: 10px;
        border-top: 1px solid black;
    }
    </style>
    
    <div class="class-box">
        <div class="header">ClassName</div>
        <div class="attributes">
            + attribute1 : Type <br>
            + attribute2 : Type
        </div>
        <div class="methods">
            + method1() : ReturnType <br>
            + method2() : ReturnType
        </div>
    </div>
    
    ClassName
    + attribute1 : Type
    + attribute2 : Type
    + method1() : ReturnType
    + method2() : ReturnType

    3. Use mermaid.js to draw more complex UML diagrams

    To draw more complex UML diagrams in HTML, you can use an external JavaScript library like mermaid.js. It supports a variety of UML diagrams and can be directly embedded in HTML. First you need to reference mermaid.js and then use<pre>Tags compose UML diagram definitions.

    <script type="module">
    import mermaid from 'https://cdn.jsdelivr.net/npm/mermaid@10/dist/mermaid.esm.min.mjs';
    mermaid.initialize({ startOnLoad: true });
    </script>
    
    <pre class="mermaid">
    classDiagram
        Class01 <|-- Class02 : Inheritance
        Class01 : +method1() void
        Class02 : +method2() void
        Class03 : +attribute int
        Class04 : +method() void
    </pre>
    
    classDiagram
        Class01 <|-- Class02 : Inheritance
        Class01 : +method1() void
        Class02 : +method2() void
        Class03 : +attribute int
        Class04 : +method() void
    

    Such an example can easily use mermaid.js to draw more complex and clear UML diagrams, and supports different diagram types.



    Mermaid draws complex UML examples

    Example 1: Complex class diagram

    This example shows inheritance, composition, aggregation, and association between categories.

    <pre class="mermaid">
    classDiagram
        Animal <|-- Mammal
        Animal <|-- Bird
        Mammal o-- Dog : has-a
        Bird --> Wing : has-a
        class Animal {
            +String name
            +int age
            +eat() void
        }
        class Mammal {
            +hasFur() bool
        }
        class Dog {
            +bark() void
        }
        class Bird {
            +fly() void
        }
        class Wing {
            +wingSpan int
        }
    </pre>
    
    classDiagram
        Animal <|-- Mammal
        Animal <|-- Bird
        Mammal o-- Dog : has-a
        Bird --> Wing : has-a
        class Animal {
            +String name
            +int age
            +eat() void
        }
        class Mammal {
            +hasFur() bool
        }
        class Dog {
            +bark() void
        }
        class Bird {
            +fly() void
        }
        class Wing {
            +wingSpan int
        }
    

    illustrate:This example shows several relationships:

    Example 2: Associations and Multiplicity

    This example shows how to represent multiplicity (1..*, 0..1, etc.) and roles between categories.

    <pre class="mermaid">
    classDiagram
        Customer "1" --> "0..*" Order : places
        Order "1" --> "1" Payment : includes
        class Customer {
            +String name
            +String email
            +placeOrder() void
        }
        class Order {
            +int orderId
            +String date
            +calculateTotal() float
        }
        class Payment {
            +float amount
            +String method
            +processPayment() void
        }
    </pre>
    
    classDiagram
        Customer "1" --> "0..*" Order : places
        Order "1" --> "1" Payment : includes
        class Customer {
            +String name
            +String email
            +placeOrder() void
        }
        class Order {
            +int orderId
            +String date
            +calculateTotal() float
        }
        class Payment {
            +float amount
            +String method
            +processPayment() void
        }
    

    illustrate:

    Example 3: Interfaces and abstract classes

    This example shows how to define interfaces and abstract classes in Mermaid.js.

    <pre class="mermaid">
    classDiagram
        class Shape {
            <<_abstract_>>
            +area() float
            +perimeter() float
        }
        Shape <|-- Rectangle
        Shape <|-- Circle
        class Rectangle {
            +width float
            +height float
            +area() float
            +perimeter() float
        }
        class Circle {
            +radius float
            +area() float
            +perimeter() float
        }
    </pre>
    
    classDiagram
        class Shape {
            <<_abstract_>>
            +area() float
            +perimeter() float
        }
        
        Shape <|-- Rectangle
        Shape <|-- Circle
        
        class Rectangle {
            +width : float
            +height : float
            +area() float
            +perimeter() float
        }
        
        class Circle {
            +radius : float
            +area() float
            +perimeter() float
        }    
    

    illustrate:

    Example 4: Complex class and interface implementation

    This example shows a mix of class inheritance and interface implementation.

    <pre class="mermaid">
    classDiagram
        class Flyable {
            <<_interface_>>
            +fly() void
        }
        class Bird {
            +String species
            +String color
            +sing() void
        }
        class Airplane {
            +String model
            +int capacity
            +takeOff() void
        }
        Bird ..|> Flyable : implements
        Airplane ..|> Flyable : implements
    </pre>
    
    classDiagram
        class Flyable {
            <<_interface_>>
            +fly() void
        }
        class Bird {
            +String species
            +String color
            +sing() void
        }
        class Airplane {
            +String model
            +int capacity
            +takeOff() void
        }
        Bird ..|> Flyable : implements
        Airplane ..|> Flyable : implements
    

    illustrate:



    Mermaid Test Tool

    Generate results

    flowchart TD
        A[Start] --> B{Do you need to continue? }
        B -- Yes --> C[Perform operation]
        B -- No --> D[End]
        C --> D


    How to check for Mermaid syntax errors

    1. Use Mermaid Live Editor

    Mermaid offers officialMermaid Live Editor, you can test and check for syntax errors on the fly. After pasting the Mermaid syntax, if there are errors, the editor will display specific error messages, allowing you to troubleshoot the problem faster.

    2. Reduce complexity for testing

    If your Mermaid chart is too complex, segmented testing is recommended. For example, by removing some categories or relationships first, leaving only the most basic structure, and gradually adding elements, you can more quickly identify possible sources of grammatical errors.

    3. Confirm Mermaid.js version

    Different versions of Mermaid.js may have different support for the syntax. Make sure you are using the latest version, or verify in a test environment that your version of Mermaid.js supports the syntax features used.

    4. Check for common errors

    5. Use developer tools to view error messages

    View the JavaScript console in the browser's developer tools. If the Mermaid chart is not generated correctly, specific error messages or tips may be displayed in the console to help you identify syntax errors.

    6. Refer to official documents

    Mermaid official documentation provides detailed grammar guidelines to help you confirm whether the grammar is used correctly. Official documents are located atMermaid.js official website



    flow chart

    Flowchart overview

    Below is a simple flowchart example illustrating the logical relationship between decisions and actions.

    Flow chart example

    flowchart TD
        A[Start] --> B{Do you need to continue? }
        B -- Yes --> C[Perform operation]
        B -- No --> D[End]
        C --> D

    Example description

    How to use

    Paste the flowchart syntax above into a Mermaid-enabled tool, such as the Markdown editor or the Mermaid online tool, to generate the graph.



    Mermaid.js zoom slider library

    Function description

    This JavaScript library adds scalable slider functionality to Mermaid.js charts, allowing users to<input type="range">Controls the scaling of the chart. Library usagetransform: scale()Enables visual scaling without re-rendering Mermaid.

    Library code (mermaidZoomSlider.js)

    // mermaidZoomSlider.js
    export function setupMermaidZoomSlider({
      sliderId = "zoomSlider",
      diagramContainerId = "mermaidContainer",
      min = 0.1,
      max = 3,
      step = 0.1,
      initial=1
    } = {}) {
      window.addEventListener("load", () => {
        const slider = document.getElementById(sliderId);
        const container = document.getElementById(diagramContainerId);
    
        if (!slider || !container) {
          console.warn("Mermaid zoom slider: Missing slider or container element");
          return;
        }
    
        //Initialize slider properties
        slider.min = min;
        slider.max = max;
        slider.step = step;
        slider.value = initial;
    
        //Set initial zoom
        container.style.transformOrigin = "top left";
        container.style.transform = `scale(${initial})`;
    
        //Event listening: zoom
        slider.addEventListener("input", () => {
          const scale = parseFloat(slider.value);
          container.style.transform = `scale(${scale})`;
        });
      });
    }

    Usage

    <!-- HTML -->
    <div>
      <input type="range" id="zoomSlider">
    </div>
    <div id="mermaidContainer">
      <pre class="mermaid">
        graph TD;
          A-->B;
          B-->C;
      </pre>
    </div>
    
    <!-- JavaScript module introduction -->
    <script type="module">
      import mermaid from "https://cdn.jsdelivr.net/npm/mermaid@10/dist/mermaid.esm.min.mjs";
      import { setupMermaidZoomSlider } from "./mermaidZoomSlider.js";
    
      mermaid.initialize({ startOnLoad: true });
    
      setupMermaidZoomSlider({
        sliderId: "zoomSlider",
        diagramContainerId: "mermaidContainer",
        min: 0.2,
        max: 3,
        step: 0.1,
        initial: 1
      });
    </script>

    Parameter description

    suggestion

    If you need advanced functions such as dragging and moving, zoom reset, etc., you can further expand this function library, such as integrating mouse drag and zoom reset buttons.



    Various styles of Mermaid.js lines

    basic line syntax

    Used in Mermaid.js diagrams-->===>Markers are used to establish connections between nodes. Different symbols represent different line styles.

    Common line styles

    grammar style illustrate
    --> ──> General solid arrow
    ---> ───> and-->Identical (syntax tolerant)
    -- text --> ── text ──> solid arrow with text label
    -.-> -.-> dashed arrow
    -. text .-> -. text .-> Dotted arrow with text
    ==> ===> thick solid arrow
    == text ==> == text ==> Thick arrow with text
    --o ──○ Round head without direction line (commonly used in class diagram)
    --|> ──▷ Solid arrow (commonly used in class diagram)
    --> | label | ──> (with double-sided text) Mermaid supports style annotation tags

    Usage examples

    graph TD
      A[Start] --> B[Step 1]
      B -.-> C [Asynchronous processing]
      C ==> D[strongly dependent]
      D -- text --> E [connection with text]
      E --o F[round head]
      F --|> G[solid arrow]

    Other instructions

    in conclusion

    Mermaid.js provides a variety of line syntax styles, allowing users to clearly express processes, logic, and relationships. Through the combination of solid lines, dotted lines, thick lines, and graphic endpoints, simple and well-structured diagrams can be created.



    D3.js

    What is D3.js?

    D3.js (Data-Driven Documents) is an open source JavaScript-based library for transforming data into dynamic and interactive visualizations. It uses web standard technologies such as SVG, HTML and CSS, provides powerful tools for manipulating data and drawing graphics.

    Features of D3.js

    Key features of D3.js

    1. Select elements:Use CSS-like selectors to select and manipulate DOM elements, such as:d3.select()andd3.selectAll()
    2. Data binding:Bind data to DOM elements and update views based on the data.
    3. Zoom and scale:Provides scaling tools and scale functions to facilitate data mapping to pixels.
    4. Draw the graph:Use SVG path and shape tools to create various diagrams such as circles, rectangles and curves.
    5. Transition effects:Built-in animation function supports smooth data changes.

    Application examples

    D3.js is widely used in various data visualization scenarios, such as:

    learning resources

    To learn D3.js, you can refer to the following resources:

    in conclusion

    D3.js is a powerful and flexible data visualization tool for developers who require highly customized charts and interactive effects. Although the learning curve is slightly higher, once mastered, its application potential is endless.



    D3.js tree diagram example

    Example description

    This example uses D3.js to draw a simple tree diagram to show how to visualize hierarchical structure data. Here are the main steps:

    Sample code

    
    <!DOCTYPE html>
    <html lang="en">
    <head>
      <meta charset="UTF-8">
      <title>D3.js Tree Diagram Example</title>
      <script src="https://d3js.org/d3.v7.min.js"></script>
      <style>
        .node circle {
          fill: steelblue;
        }
        .node text {
          font: 12px sans-serif;
        }
        .link {
          fill: none;
          stroke: #ccc;
          stroke-width: 1.5px;
        }
      </style>
    </head>
    <body>
      <script>
        const width = 800;
        const height = 600;
    
        const treeData = {
          name: "CEO",
          children: [
            {
              name: "CTO",
              children: [
                { name: "Engineering Manager" },
                { name: "Product Manager" }
              ]
            },
            {
              name: "CFO",
              children: [
                { name: "Accountant" },
                { name: "Finance Analyst" }
              ]
            }
          ]
        };
    
        const svg = d3.select("body")
          .append("svg")
          .attr("width", width)
          .attr("height", height)
          .append("g")
          .attr("transform", "translate(40,40)");
    
        const treeLayout = d3.tree().size([height - 100, width - 160]);
    
        const root = d3.hierarchy(treeData);
        treeLayout(root);
    
        svg.selectAll(".link")
          .data(root.links())
          .enter()
          .append("path")
          .attr("class", "link")
          .attr("d", d3.linkHorizontal()
            .x(d => d.y)
            .y(d => d.x)
          );
    
        const nodes = svg.selectAll(".node")
          .data(root.descendants())
          .enter()
          .append("g")
          .attr("class", "node")
          .attr("transform", d => `translate(${d.y},${d.x})`);
    
        nodes.append("circle").attr("r", 5);
    
        nodes.append("text")
          .attr("dy", 3)
          .attr("x", d => d.children ? -10 : 10)
          .style("text-anchor", d => d.children ? "end" : "start")
          .text(d => d.data.name);
      </script>
    </body>
    </html>
    
    

    Results display

    After running this code, you will see a tree diagram:

    Applications and extensions

    This example can be extended to more complex hierarchies, or the style adjusted to suit different needs. For example:



    Rectangular treemap Treemapping

    Concept Note

    Rectangular treemap is a visualization technique that uses nested rectangles to display hierarchical data. The area of ​​each rectangle represents a numerical value, such as sales or archive size, and each rectangle can be further nested to represent subcategories.

    Application scenarios

    Example (made using D3.js)

    advantage

    Things to note



    Cytoscape.js network diagram example

    Basic usage

    Cytoscape.js is a JavaScript library used to draw network graphs (Graph). It uses JSON to define nodes and edges. It has simple syntax and supports interaction and style customization.

    Simple network diagram example

    illustrate



    Cytoscape.js different application examples

    1. Circle Layout

    2. Drag and click interaction

    3. Style by Class

    illustrate



    JavaScript circuit diagram drawing library

    library name applicability Features Whether to support interaction illustrate
    JointJS ★★★★★ High drawing freedom and scalable circuit component symbols ✔️ It can draw logic circuits and flow charts. The free version has enough functions.
    GoJS ★★★★☆ Powerful graphics and data model support ✔️ Not free software, but there is a free trial; often used in production line diagrams and circuit diagrams
    SVG.js ★★★☆☆ Lightweight and supports precise drawing ✔️ Requires self-designed components (resistors, capacitors, etc.), suitable for detailed control
    Konva.js ★★★☆☆ Both Canvas and SVG are supported ✔️ Design tools suitable for interactive behaviors such as dragging and clicking
    ELK.js ★★☆☆☆ Excellent automatic layout ✖️ Only responsible for layout algorithm (can be paired with JointJS)


    JointJS displays basic circuit diagram components



    3D equation plotting

    Function description

    This tool can be used asxandyas an independent variablez = f(x, y)Equations are drawn into 3D surface diagrams, and interactive functions of mouse operation for rotation, scaling, and translation are provided.

    HTML structure

    <div id="plot3d" style="width:100%; height:600px;"></div>
    <script src="https://cdn.plot.ly/plotly-latest.min.js"></script>
    <script type="module">
      // Define the function z = f(x, y) (can be replaced by any equation)
      function computeZ(x, y) {
        return Math.sin(x) * Math.cos(y); // z = sin(x) * cos(y)
      }
    
      const xRange = numeric.linspace(-5, 5, 50);
      const yRange = numeric.linspace(-5, 5, 50);
    
      //Create z data
      const zValues = xRange.map(x =>
        yRange.map(y => computeZ(x, y))
      );
    
      const data = [{
        type: 'surface',
        x: xRange,
        y: yRange,
        z: zValues,
        colorscale: 'Viridis'
      }];
    
      const layout = {
        title: 'z = sin(x) * cos(y)',
        autosize: true,
        scene: {
          xaxis: { title: 'X axis' },
          yaxis: { title: 'Y axis' },
          zaxis: { title: 'Z axis' }
        }
      };
    
      Plotly.newPlot('plot3d', data, layout);
    </script>
    
    <!-- numeric.js is used to generate linspace arrays -->
    <script src="https://cdnjs.cloudflare.com/ajax/libs/numeric/1.2.6/numeric.min.js"></script>

    Operating Instructions

    Function replaceable example

    illustrate

    This example uses Plotly.js to provide interactive 3D visualization, and numeric.js to assist in generating numerical grids. You can change it freelycomputeZUse the contents in the function to draw any three-dimensional surface.



    JavaScript library for 3D chemical structure drawing

    3Dmol.js

    3Dmol.js is an open source WebGL chemical molecule visualization library designed specifically for browsers, which can draw molecular structures directly on web pages.

    <div id="viewer" style="width:400px;height:400px;"></div>
    <script src="https://3dmol.org/build/3Dmol-min.js"></script>
    <script>
      const viewer = $3Dmol.createViewer("viewer", { backgroundColor: "white" });
      viewer.addModel("C1=CC=CC=C1", "smi"); // SMILES structure of benzene
      viewer.setStyle({}, {stick: {}, sphere: {scale: 0.3}});
      viewer.zoomTo();
      viewer.render();
    </script>

    ChemDoodle Web Components

    ChemDoodle provides 2D and 3D structure drawings, supports a variety of chemical formats, and is suitable for teaching and web applications.

    JSmol

    JSmol is a JavaScript version of Jmol, suitable for displaying large molecules such as proteins or crystal structures.

    Mol*

    Mol* (MolStar) is a high-order structure visualization tool developed by RCSB PDB, specifically designed for biological macromolecules.

    comparison table

    Function library Main purpose Is it open source? Whether authorization is required
    3Dmol.js Universal 3D molecular visualization
    ChemDoodle 2D and 3D teaching and display part
    JSmol Academic research and teaching
    Mol* Protein and biomolecule visualization


    3Dmol.js displays benzene molecules

    Instructions for use

    This example uses3Dmol.jsand adoptXYZ formatDefine the atomic coordinates of the benzene molecule to correctly display the 3D molecular structure.

    An error occurs if the SMILES format ("smi") is usedUnknown format: smi, because this format is not supported in some 3Dmol.js versions.

    Usage steps

    1. Save the following program asbenzene.html
    2. Use a local HTTP server (such as Python'spython -m http.server) is turned on.
    3. Enter in browserhttp://localhost:8000View results.

    HTML code

    <!DOCTYPE html>
    <html>
    <head>
      <meta charset="utf-8">
      <title>3Dmol.js renders benzene molecules</title>
      <script src="https://3dmol.org/build/3Dmol-min.js"></script>
      <style>
        #viewer {
          width: 600px;
          height: 600px;
          position: relative;
          border: 1px solid #aaa;
        }
      </style>
    </head>
    <body>
    
    <h2>3Dmol.js benzene molecule display (XYZ format)</h2>
    <div id="viewer"></div>
    
    <script>
      document.addEventListener("DOMContentLoaded", function () {
        const viewer = $3Dmol.createViewer("viewer", { backgroundColor: "white" });
    
        const xyzData = `
    12
    benzene
    C 0.0000 1.3968 0.0000
    H 0.0000 2.4903 0.0000
    C -1.2096 0.6984 0.0000
    H -2.1471 1.2451 0.0000
    C -1.2096 -0.6984 0.0000
    H -2.1471 -1.2451 0.0000
    C 0.0000 -1.3968 0.0000
    H 0.0000 -2.4903 0.0000
    C 1.2096 -0.6984 0.0000
    H 2.1471 -1.2451 0.0000
    C 1.2096 0.6984 0.0000
    H 2.1471 1.2451 0.0000
    `;
    
        viewer.addModel(xyzData, "xyz");
        viewer.setStyle({}, {stick: {}, sphere: {scale: 0.3}});
        viewer.zoomTo();
        viewer.render();
      });
    </script>
    
    </body>
    </html>

    Things to note

    3Dmol.js benzene molecule display (XYZ format)



    Google custom maps

    Overview

    The Google Maps JavaScript API allows developers to embed interactive maps into web pages. And dynamically add custom elements such as markers, layers, and text labels through JavaScript. The following example demonstrates how to display a map and add custom markers.

    ---

    Step 1: Apply for a Google Maps API Key

    Go toGoogle Cloud Console, enableMaps JavaScript API, And create a set of API keys (API Key).
    After obtaining it, append it when loading script?key=YOUR_API_KEY

    ---

    Step 2: Create HTML structure

    <!DOCTYPE html>
    <html>
    <head>
      <meta charset="utf-8">
      <title>Google Map with Custom Tags</title>
      <style>
        #map {
          width: 100%;
          height: 500px;
        }
      </style>
    </head>
    <body>
    
    <h3>My Map</h3>
    <div id="map"></div>
    
    <!-- Load Google Maps JS API -->
    <script async
      src="https://maps.googleapis.com/maps/api/js?key=YOUR_API_KEY&callback=initMap">
    </script>
    
    <script>
    function initMap() {
      //Initialize the map
      const center = { lat: 25.033964, lng: 121.564468 }; // Taipei 101
      const map = new google.maps.Map(document.getElementById("map"), {
        zoom: 14,
        center: center
      });
    
      //Create custom tag
      const myTags = [
        { position: { lat: 25.034, lng: 121.565 }, title: "Mark A", content: "This is point A" },
        { position: { lat: 25.036, lng: 121.562 }, title: "Mark B", content: "This is point B" },
        { position: { lat: 25.032, lng: 121.568 }, title: "Mark C", content: "This is point C" }
      ];
    
      //Create an information window (InfoWindow)
      const infoWindow = new google.maps.InfoWindow();
    
      //Add markers to the map
      myTags.forEach(tag => {
        const marker = new google.maps.Marker({
          position: tag.position,
          map: map,
          title: tag.title,
          icon: {
            url: "https://maps.google.com/mapfiles/ms/icons/blue-dot.png"
          }
        });
    
        // Click to display information
        marker.addListener("click", () => {
          infoWindow.setContent("<b>" + tag.title + "</b><br>" + tag.content);
          infoWindow.open(map, marker);
        });
      });
    }
    </script>
    
    </body>
    </html>
    ---

    Step 3: Expandable functionality

    ---

    Commonly used settings

    propertyuse
    centerSet the initial center coordinates of the map.
    zoomMap zoom level (1–20).
    mapTypeIdDisplay style, which can beroadmapsatellitehybridterrain
    iconCustom mark icon.
    infoWindowDisplays the information window after clicking the mark.
    ---

    official resources



    Sound in Web

    JavaScript play do re mi using MIDI API

    illustrate

    To play a specific MIDI sound (such as a guitar) in a browser, you can use the Web MIDI API or, more simply, the Web Audio API with a SoundFont player, such asSoundFont Playerkitsoundfont-player

    Example: Playing Do Re Mi with a guitar tone

    <script src="https://unpkg.com/[email protected]/dist/soundfont-player.js"></script>
    <button onclick="playDoReMi()">Play Do Re Mi</button>
    
    <script>
    async function playDoReMi() {
      const audioCtx = new (window.AudioContext || window.webkitAudioContext)();
      const player = await Soundfont.instrument(audioCtx, 'acoustic_guitar_nylon');
    
      const now = audioCtx.currentTime;
      player.play('C4', now); // Do
      player.play('D4', now + 0.5); // Re
      player.play('E4', now + 1); // Mi
    }
    </script>

    illustrate

    Supported guitar sounds

    in conclusion

    usesoundfont-playerWith the Web Audio API, you can easily implement MIDI-level instrument playback functions without installing any plug-ins. Just specify the timbre and pitch, and you can quickly implement a scale melody such as "do re mi".



    JavaScript playback Do Re Mi using the AudioContext synthesizer

    illustrate

    If the sound cannot be played through an external SoundFont, we can use it directlyWeb Audio APIofOscillatorNodeThe synthesizer plays Do Re Mi and simulates a guitar style (eg: short sound + pianissimo)

    Sample program: built-in sound playback Do Re Mi

    <button onclick="playDoReMi()">Play Do Re Mi</button>
    
    <script>
    function playTone(frequency, startTime, duration, context) {
      const osc = context.createOscillator();
      const gain = context.createGain();
    
      osc.type = "triangle"; // Synthetic waveform close to guitar sound, can be changed to "square", "sawtooth"
      osc.frequency.value = frequency;
    
      gain.gain.setValueAtTime(0.2, startTime);
      gain.gain.exponentialRampToValueAtTime(0.001, startTime + duration);
    
      osc.connect(gain);
      gain.connect(context.destination);
    
      osc.start(startTime);
      osc.stop(startTime + duration);
    }
    
    function playDoReMi() {
      const context = new (window.AudioContext || window.webkitAudioContext)();
      const now = context.currentTime;
    
      // Frequency of Do Re Mi (C4, D4, E4)
      playTone(261.63, now, 0.4, context); // C4
      playTone(293.66, now + 0.5, 0.4, context); // D4
      playTone(329.63, now + 1.0, 0.4, context); // E4
    }
    </script>

    feature

    Analog Guitar Technique Advice

    in conclusion

    Using pure Web Audio API is the most stable and compatible way. If you have advanced needs, you can add filters, echo, or integrate MIDI sound sources.



    Type type and timbre description of OscillatorNode

    Built-in osc.type type

    osc.type Chinese name timbre characteristics Common analog instruments
    "sine" sine wave The purest, no harmonics Pure tone, tuning fork, flute, electronic synthesized tone
    "square" square wave Rich odd harmonics and sharp timbre Synthesizer, 8-bit sound effects, electronic keyboard
    "sawtooth" sawtooth wave Contains all harmonics for a thick and bright tone Strings, Guitar, Brass Simulation
    "triangle" triangle wave Only odd harmonics, softer sound Woodwinds, soft electric guitar sounds
    "custom" Custom waveform Customizable arbitrary waveforms Special synthesized sounds, real analog sounds

    Demonstration of usage

    const osc = audioContext.createOscillator();
    osc.type = "sawtooth"; // Can be changed to "sine", "square", "triangle", "custom"
    osc.frequency.value = 440; // A4
    osc.start();

    Supplement: Custom waveform

    
    const real = new Float32Array([0, 1, 0.5, 0.25]);
    const imag = new Float32Array(real.length);
    const wave = audioContext.createPeriodicWave(real, imag);
    
    osc.setPeriodicWave(wave);
    osc.type = "custom";
    

    in conclusion

    differentosc.typeCan simulate different styles of musical instrument sounds. If you want to simulate a guitar, it is recommended to start fromsawtoothortriangleGet started and fine-tune the sound with Envelopes, Filters and Echoes.



    Play the Do Re Mi phrase using WebAudioFont

    illustrate

    RecommendedWebAudioFontThis open source JavaScript library supports more than thousands of MIDI sounds, including guitar and other instrument sounds, with better sound quality and easy integration.

    Quick Example: Playing Do Re Mi (using piano or guitar sounds)

    
    
    
    
    
    
    

    Key explanation

    Interchangeable sounds

    Summarize

    By combining WebAudioFont with the Web Audio API, you can easily use real MIDI sounds (such as guitars) to play notes, solving the problem of a single pure Oscillator synthesized sound and avoiding the previous silent situation of SoundFont players.




    email: [email protected]
    
    T:0000
    資訊與搜尋 | 回dev首頁
    email: Yan Sa [email protected] Line: 阿央
    電話: 02-27566655 ,03-5924828
    阿央
    泱泱科技
    捷昱科技泱泱企業