Skip to main content

Showcase and how to get started with a Speech Recognition Bot coded in Java


When I first got into coding, I dreamed about creating my own local version of Siri for my desktop; lying back on my chair and doing things with just my voice was what I always wanted. After coding with Java for over a year, I decided that it was time to learn more than just using JFrame and creating classes.

"Zero" is a speech recognition bot I coded using the Sphinx4 voice recognition library in Java and the Selenium Webdriver for Chrome. Although no Siri by any means, the bot can open and close windows, create tabs, open sites etc. using just my voice. The demo for the bot and its commands as of now can be found below:



1) Setting up the voice recognition:



Luckily, there is already documentation on how to get started with sphinx4. The Jar Files for this library can be downloaded here.



As you can see, each recognizer should have an Acoustic Model Path, a Dictionary Path and a Language Model Path set. After running the code, I figured out that the recognizer was actually wrong a lot of the times when trying to detect what I was saying. As a result, I decided to use a Grammar file instead






Creating a Grammar file was better in my case, because I wanted my bot to detect certain phrases or commands. 



Aside from making sure the formatting is correct, grammar files are very easy to create. As you can see, all of Zero's commands were typed into this file. After I ran the code now, the recognizer matched my voice with the phrases it thought I was saying. Voice recognition was now very accurate, and hence a success. Now it was time to make the commands a reality with Selenium.


2) Setting up Selenium Webdriver for Chrome: 


I downloaded the Selenium Webdriver I needed for Java from here. Don't forget to add the Jar Files to the project library after downloading. 


Since Selenium supports only Firefox, I also had to download ChromeDriver 2.3.1. You are going to have to setup the Chrome Driver by typing in the following: 

System.setProperty("webdriver.chrome.driver", Location of the Driver);
driver = new ChromeDriver();    


Once this was done, Selenium was ready to go! You can find a lot of Selenium Commands online to help you with opening websites, windows etc.  












Comments

  1. This comment has been removed by the author.

    ReplyDelete
    Replies
    1. what is the minimum hardware requirement ..it needs

      Delete
    2. any laptop with a working mic should work. if you need the code you can find it on my github here: https://github.com/yashhshah/VoiceBot

      if you go to src you can find the code there.

      Delete
    3. thank you bro its working great ..but will it be able to recognize any English word apart from the commands in grammar file..

      Delete
    4. Technically yes. But in practice i found the library was not that ideal for recognizing words normally. If anything you can look at the documentation and try it out for yourself, dont forget to set the dictionary path to do so!

      Delete
  2. thank you very much for replying :) i tried it it's not at all accurate without setting dictionary path ... i am doing my final year project ,your code helped me a lot in understanding sphinx usage ...can i get your mail address to be in touch with you

    ReplyDelete

Post a Comment

Popular posts from this blog

My experience at Hack The North 2017

I attended Canada's largest Hackathon, Hack The North, on September 15, 2017. It was an amazing experience, and we had a lot of coffee, no sleep and a decently-functioning project by the end of it. After signing in and eating dinner the very first day, we head out to Hagey Hall for the opening ceremonies.We had many speakers come out and talk to us, such as Balaji Srinivasan, Michael Gibson, and a Canadian rather popular in the hearts of everyone there- Justin Trudeau. The Prime Minister had a great speech, and this was the place we first came up with the idea for our project, so the opening ceremonies was an overall success. Hacking started the same night at 12. Our first night was entirely dedicated to research. Is our idea viable? What APIs do we want to use? What do we want to make our backend in?  We eventually planned out and envisioned what our project would look like, after which I started designing the frontend of the login page that was to be implemented. The next 2 ...

I downloaded and learned how to use Vim (On a basic level)

Having used Atom as my main and only text editor, I never even bothered looking at any other ones. However, a friend of mine-who has way more experience than me in development-showed me a few tests that seemed to suggest that Vim or Sublime Text was the way to go in terms of a text editor. I cant seem to find the tests right now, but the main thing that convinced me to choose a different editor was the data that showed the large amount of memory Atom took up while doing a task. Vim is probably the coolest editor I've seen, hands down. After downloading and launching Vim, I was greeted with this retro, old school screen that I fell in love with immediately. The shortcuts are so weird too! The concept of Vim revolves around never having to touch your mouse to do anything, and that is what makes it stand out the most in my opinion. I've learned how to run code, save files and some other basic commands, but a more detailed blog about Vim will come after I get to play ar...