Writing a voice assistant in Python

sqully · Oct 28, 2024

Приветствую! Думаю многие видели в TikTok голосовых помощников для ПК, которые чего только не могут: открывают сайты, меняют яркость, и так далее. Так вот, сейчас мы напишем такого, на самом деле это делается очень легко и быстро.
Не буду тянуть и лить воду, начинаем.
speech_recognition — для распознавания речи.

pyttsx3 — для синтеза речи.

webbrowser — для открытия веб-сайтов.

subprocess — для выполнения команд Windows.

pyautogui — для комбинаций клавиш (смена языка, сворачивание окон)
Для установки библиотек - введите команду в cmd:

pip install SpeechRecognition pyttsx3 pyautogui subprocess webbrowser

Code

pip install SpeechRecognition pyttsx3 pyautogui subprocess webbrowser
Импортируем библиотеки:

import speech_recognition as sr
import subprocess
import pyttsx3
import webbrowser
import pyautogui

Python

import speech_recognition as sr import subprocess import pyttsx3 import webbrowser import pyautogui

Настраиваем синтез речи:

engine = pyttsx3.init('sapi5')
voices = engine.getProperty('voices')
engine.setProperty('voice', voices[0].id) # Выбор голоса

Python

engine = pyttsx3.init('sapi5') voices = engine.getProperty('voices') engine.setProperty('voice', voices[0].id) # Выбор голоса

Создаем функцию озвучки текста:

def speak(text):
engine.say(text)
engine.runAndWait()

Python

def speak(text): engine.say(text) engine.runAndWait()

Создаем функцию распознавания речи:

def recognize_speech():
recognizer = sr.Recognizer()
with sr.Microphone() as source:
speak("Жду ваших указаний")
audio = recognizer.listen(source)
try:
command = recognizer.recognize_google(audio, language="ru-RU")
return command
except sr.UnknownValueError:
return ""
except sr.RequestError:
return ""

Python

def recognize_speech(): recognizer = sr.Recognizer() with sr.Microphone() as source: speak("Жду ваших указаний") audio = recognizer.listen(source) try: command = recognizer.recognize_google(audio, language="ru-RU") return command except sr.UnknownValueError: return "" except sr.RequestError: return ""

Тут мы отправляем аудио-файл на сервера гугла и получаем текст.

Создаем функцию обработки команд и их выполнения:

def execute_command(command):

# Команды связанные с браузером
if "ВК" in command.upper():
speak("Открываю ВКонтакте")
webbrowser.open("https://vk.com")
elif "github" in command.lower():
speak("Открываю гитхаб")
webbrowser.open("https://github.com")
elif "форум" in command.lower():
speak("Открываю лолз")
webbrowser.open("http://lolz.live")
elif "найди в гугле" in command.lower():
query = command.lower().replace("найди в гугле", "").strip()
speak("Ищу в Google " + query)
webbrowser.open(f"https://www.google.com/search?q={query}")

# Настройки
elif "яркость" in command.lower():
brightness = extract_brightness(command)
if brightness is not None:
set_brightness(brightness)
elif "экрана" in command.lower():
speak("Открываю настройки дисплея")
subprocess.Popen("control.exe desk.cpl")
elif "язык" in command.lower() or "раскладка" in command.lower() or "раскладку" in command.lower():
speak("Меняю раскладку клавиатуры")
pyautogui.hotkey('alt', 'shift')
speak("Готово")
elif "окно" in command.lower():
speak("Сворачиваю окно, сэр")
pyautogui.hotkey('win', 'd')

# Программы
elif "калькулятор" in command.lower():
speak("Открываю калькулятор")
subprocess.Popen("calc.exe")
elif "блокнот" in command.lower():
speak("Открываю блокнот, сэр")
subprocess.Popen("notepad.exe")
else:
speak('Пока что я это не умею')

Python

def execute_command(command): # Команды связанные с браузером if "ВК" in command.upper(): speak("Открываю ВКонтакте") webbrowser.open("https://vk.com") elif "github" in command.lower(): speak("Открываю гитхаб") webbrowser.open("https://github.com") elif "форум" in command.lower(): speak("Открываю лолз") webbrowser.open("http://lolz.live") elif "найди в гугле" in command.lower(): query = command.lower().replace("найди в гугле", "").strip() speak("Ищу в Google " + query) webbrowser.open(f"https://www.google.com/search?q={query}") # Настройки elif "яркость" in command.lower(): brightness = extract_brightness(command) if brightness is not None: set_brightness(brightness) elif "экрана" in command.lower(): speak("Открываю настройки дисплея") subprocess.Popen("control.exe desk.cpl") elif "язык" in command.lower() or "раскладка" in command.lower() or "раскладку" in command.lower(): speak("Меняю раскладку клавиатуры") pyautogui.hotkey('alt', 'shift') speak("Готово") elif "окно" in command.lower(): speak("Сворачиваю окно, сэр") pyautogui.hotkey('win', 'd') # Программы elif "калькулятор" in command.lower(): speak("Открываю калькулятор") subprocess.Popen("calc.exe") elif "блокнот" in command.lower(): speak("Открываю блокнот, сэр") subprocess.Popen("notepad.exe") else: speak('Пока что я это не умею')

Создаем главный блок кода, который запускает цикл распознавания и обработки команд:

if __name__ == "__main__":
while True:
voice_command = recognize_speech()
if voice_command:
execute_command(voice_command)

Python

if __name__ == "__main__": while True: voice_command = recognize_speech() if voice_command: execute_command(voice_command)
# Библиотеки
import speech_recognition as sr
import subprocess
import pyttsx3
import webbrowser
import wmi
import os
import pyautogui

# Настройки озвучки
engine = pyttsx3.init('sapi5')
voices = engine.getProperty('voices')
engine.setProperty('voice', voices[0].id)

# Функция озвучки
def speak(text):
engine.say(text)
engine.runAndWait()

# Распознование голоса
def recognize_speech():
recognizer = sr.Recognizer()
with sr.Microphone() as source:
print("Скажите что-нибудь...")
speak("Жду ваших указаний")
audio = recognizer.listen(source)
try:
print("Распознаю...")
command = recognizer.recognize_google(audio, language="ru-RU")
print("Вы сказали: " + command)
return command
except sr.UnknownValueError:
print("Команда не распознана")
return ""
except sr.RequestError:
print("Не удается подключиться к службе распознавания речи")
return ""

# Команды
def execute_command(command):

# browser, sites
if "ВК" in command.upper():
speak("Открываю ВКонтакте")
webbrowser.open("https://vk.com")
elif "github" in command.lower():
speak("Открываю гитхаб")
webbrowser.open("https://github.com")
elif "форум" in command.lower():
speak("Открываю лолз=")
webbrowser.open("http://lolz.live")
elif "найди в гугле" in command.lower():
query = command.lower().replace("найди в гугле", "").strip()
speak("Ищу в Google " + query)
webbrowser.open(f"https://www.google.com/search?q={query}")

# settings
elif "яркость" in command.lower():
brightness = extract_brightness(command)
if brightness is not None:
set_brightness(brightness)
elif "экрана" in command.lower():
speak("Открываю настройки дисплея, сэр")
subprocess.Popen("control.exe desk.cpl")
elif "язык" in command.lower() or "раскладка" in command.lower() or "раскладку" in command.lower():
speak("Меняю раскладку клавиатуры, сэр")
pyautogui.hotkey('alt', 'shift')
speak("Готово")
elif "сделай звук" in command.lower():
volume = extract_volume(command)
if volume is not None:
set_volume(volume)
elif "окно" in command.lower():
speak("Сворачиваю окно, сэр")
pyautogui.hotkey('win', 'd')

# softs
elif "калькулятор" in command.lower():
speak("Открываю калькулятор")
subprocess.Popen("calc.exe")
elif "блокнот" in command.lower():
speak("Открываю блокнот, сэр")
subprocess.Popen("notepad.exe")
else:
speak('Пока что я это не умею')

if __name__ == "__main__":
while True:
voice_command = recognize_speech()
if voice_command:
execute_command(voice_command)

Python
# Библиотеки

import speech_recognition as sr

import subprocess

import pyttsx3

import webbrowser

import wmi

import os

import pyautogui



# Настройки озвучки

engine = pyttsx3.init('sapi5')

voices = engine.getProperty('voices')

engine.setProperty('voice', voices[0].id)



# Функция озвучки

def speak(text):

    engine.say(text)

    engine.runAndWait()



# Распознование голоса

def recognize_speech():

    recognizer = sr.Recognizer()

    with sr.Microphone() as source:

        print("Скажите что-нибудь...")

        speak("Жду ваших указаний")

        audio = recognizer.listen(source)

    try:

        print("Распознаю...")

        command = recognizer.recognize_google(audio, language="ru-RU")

        print("Вы сказали: " + command)

        return command

    except sr.UnknownValueError:

        print("Команда не распознана")

        return ""

    except sr.RequestError:

        print("Не удается подключиться к службе распознавания речи")

        return ""

        

# Команды

def execute_command(command):



    # browser, sites

    if "ВК" in command.upper():

        speak("Открываю ВКонтакте")

        webbrowser.open("https://vk.com")

    elif "github" in command.lower():

        speak("Открываю гитхаб")

        webbrowser.open("https://github.com")

    elif "форум" in command.lower():

        speak("Открываю лолз=")

        webbrowser.open("http://lolz.live")

    elif "найди в гугле" in command.lower():

        query = command.lower().replace("найди в гугле", "").strip()

        speak("Ищу в Google " + query)

        webbrowser.open(f"https://www.google.com/search?q={query}")





    # settings

    elif "яркость" in command.lower():

        brightness = extract_brightness(command)

        if brightness is not None:

            set_brightness(brightness)

    elif "экрана" in command.lower():

        speak("Открываю настройки дисплея, сэр")

        subprocess.Popen("control.exe desk.cpl")

    elif "язык" in command.lower() or "раскладка" in command.lower() or "раскладку" in command.lower():

        speak("Меняю раскладку клавиатуры, сэр")

        pyautogui.hotkey('alt', 'shift')

        speak("Готово")

    elif "сделай звук" in command.lower():

        volume = extract_volume(command)

        if volume is not None:

            set_volume(volume)

    elif "окно" in command.lower():

        speak("Сворачиваю окно, сэр")

        pyautogui.hotkey('win', 'd')



    # softs

    elif "калькулятор" in command.lower():

        speak("Открываю калькулятор")

        subprocess.Popen("calc.exe")

    elif "блокнот" in command.lower():

        speak("Открываю блокнот, сэр")

        subprocess.Popen("notepad.exe")

    else:

        speak('Пока что я это не умею')



if __name__ == "__main__":

    while True:

        voice_command = recognize_speech()

        if voice_command:

            execute_command(voice_command)
Озвучка режет уши, но к сожалению я не нашёл библиотеку с нормальным голосом.
Также вы можете легко доработать этот скрипт под ваши нужды используя ChatGPT, его никто не отменял

ЧернильныйБро · Oct 28, 2024

Прикольно, представляю туда еще нейронку вьебать

krutyshkin · Oct 28, 2024

и че мне с ним делать? с одной части дома кричать "запусти помойку" запускает гта 5 рп пока я кушать грею?

sha1n · Nov 1, 2024

ошибочка в установке либ, subprocess .run должно быть, по другому ненаход

Bot for feedback from the admin panel | Bypassing spamblock.

Bot assistant for buying advertising | Checking statuses/pins + avatars

DICEBOT for forum | AIOGRAM bot with a beautiful menu

Script Bota Speaker at sessions

The bot that gives you your id

[TG DRAIN] Telegram drainer

Unconfirmed orders - remark for fpc

Free | Telegram Parser | Telegrams Parser Username

[Actual] User Bot Auto Buying gifts telegrams

Software for generating Octobrowser and acting proys

Software for changing passwords firstmail

[BEST] Bot Autosales | Telegram Shop

Telegram bot template

Random VKontakte status

Telegram bot: Gemini retells all voice messages that you receive in DM

Calls via userbot in telegram

What payments can be connected in general if <18?

How can you enter a telegram account using Auth Key (HEX) and DC ID. Looking for software

Autostatus VKontakte with your audio recordings

How to get the average price of gifts in TG?

Judgment Day script - read how much more you will stretch

Telegram channel event notifications (e.g. new/left members)

How people realize bots through applications

A script that recovers your deleted VKontakte messages

Python Android Development of Applications

Why doesnt VS Code see the library?

Script for viewing/cleaning VK conversations

Free | Telegram Bot, which downloads video from YouTube

Token authorization (fucked up)

Drain tg bot to download videos from YouTube.

Writing a voice assistant in Python