Using Android Text-to-Speech to Create a Smart Assistant

May 25, 2016

In this tutorial I will show you how to create a simple Android App that listens to the speech of a user and converts it to text. The app will then analyze the text and use it as a command to store data or answer the user.

The application user interface is simple, just a single ImageButton in the center of a full screen gradient background. Each time a user speaks, they press the button and talk.

You can find the final project on Github.

Create Project

Create Application

Create a new project in android Studio, choosing a minimum API level of 18 and adding an Empty Activity. This will be the only activity in the project.

To make the view full screen, open AndroidManifest.xml and set android:theme="@style/Theme.AppCompat.NoActionBar". This will hide the ActionBar from our Activity.

You now have a full screen white layout with an TextView inside. To improve it, add a gradient shape will to the RelativeLayout

Right click on the drawable folder and select New -> Drawable resource file. Call it ‘background’ and replace the code with this:

<?xml version="1.0" encoding="UTF-8"?>
<shape xmlns:android="http://schemas.android.com/apk/res/android"
    android:shape="rectangle" >

    <gradient
        android:type="linear"
        android:startColor="#FF85FBFF"
        android:endColor="#FF008080"
        android:angle="45"/>
</shape>

Feel free to change the colors and angle to your own.

The ImageButton inside the layout uses an image from Material Design Icons. Download and add it as a src.

Update the code inside activity_main.xml to:

<?xml version="1.0" encoding="utf-8"?>
<RelativeLayout xmlns:android="http://schemas.android.com/apk/res/android"
    xmlns:tools="http://schemas.android.com/tools"
    android:layout_width="match_parent"
    android:layout_height="match_parent"
    android:background="@drawable/background"
    android:id="@+id/rel"
    tools:context="com.example.theodhor.speechapplication.MainActivity">

    <ImageButton
        android:layout_width="wrap_content"
        android:layout_height="wrap_content"
        android:id="@+id/microphoneButton"
        android:layout_centerVertical="true"
        android:layout_centerHorizontal="true"
        android:src="@drawable/ic_mic_none_white_48dp"
        android:background="@null"/>

</RelativeLayout>

Speaking

Now the user interface is complete, the next step is the Java code inside MainActivity.

Declare a TextToSpeech variable above the onCreate method:

private TextToSpeech tts;

Inside onCreate add:

tts = new TextToSpeech(this, new TextToSpeech.OnInitListener() {
    @Override
    public void onInit(int status) {
        if (status == TextToSpeech.SUCCESS) {
            int result = tts.setLanguage(Locale.US);
            if (result == TextToSpeech.LANG_MISSING_DATA || result == TextToSpeech.LANG_NOT_SUPPORTED) {
                Log.e("TTS", "This Language is not supported");
            }
            speak("Hello");

        } else {
            Log.e("TTS", "Initilization Failed!");
        }
    }
});

This starts the TextToSpeech service. The speak() method takes a String parameter, which is the text you want Android to speak.

Create the method and add this code:

private void speak(String text){
    if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.LOLLIPOP) {
        tts.speak(text, TextToSpeech.QUEUE_FLUSH, null, null);    
    }else{
        tts.speak(text, TextToSpeech.QUEUE_FLUSH, null);
    }
}

There’s a Build.VERSION check inside the method because tts.speak(param,param,param) is deprecated for API levels over 5.1

After speak() create another method to stop the TextToSpeech service when a user closes the app:

@Override
public void onDestroy() {
    if (tts != null) {
        tts.stop();
        tts.shutdown();
    }
    super.onDestroy();
}

At this stage, the application speaks “Hello” once started. The next step is to make it listen.

Listening

To make the application listen, you will use the microphone button. Add this code to onCreate:

findViewById(R.id.microphoneButton).setOnClickListener(new View.OnClickListener() {
    @Override
    public void onClick(View v) {
        listen();
    }
});

Clicking the ImageButton function will call this function:

private void listen(){
    Intent i = new Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH);
    i.putExtra(RecognizerIntent.EXTRA_LANGUAGE_MODEL, RecognizerIntent.LANGUAGE_MODEL_FREE_FORM);
    i.putExtra(RecognizerIntent.EXTRA_LANGUAGE, Locale.getDefault());
    i.putExtra(RecognizerIntent.EXTRA_PROMPT, "Say something");

    try {
        startActivityForResult(i, 100);
    } catch (ActivityNotFoundException a) {
        Toast.makeText(MainActivity.this, "Your device doesn't support Speech Recognition", Toast.LENGTH_SHORT).show();
    }
}

This method starts the listening activity which displays as a dialog with a text prompt. The language of the speech is taken from the device, via the Locale.getDefault() method.

The startActivityForResult(i, 100) method waits for the current activity to return a result. 100 is a random code attached to the started activity, and can be any number that suits your use case. When a result returns from the started activity, it contains this code and uses it to differentiate multiple results from each other.

To catch the result from a started activity, add this overriden method:

@Override
protected void onActivityResult(int requestCode, int resultCode, Intent data) {
  super.onActivityResult(requestCode, resultCode, data);
  if(requestCode == 100){
      if (resultCode == RESULT_OK && null != data) {
          ArrayList<String> res = data.getStringArrayListExtra(RecognizerIntent.EXTRA_RESULTS);
          String inSpeech = res.get(0);
          recognition(inSpeech);
      }
  }
}

This method catches every result coming from an activity and uses the requestCode for the speech recognizer result. If requestCode is equal to 100,resultCode equals OK and the data from this result is not null. You get the result string from res.get(0)

Create a new method called recognition which will take a String as a parameter:

private void recognition(String text){
    Log.e("Speech",""+text);            
}

At this stage, the application can listen after a user clicks the microphone button, and converts the user’s speech to text. The result is printed in the Error Log.

Learning

To make the app more interesting, in this step you are going to make the application able learn simple things, like your name. To make this possible, you need to use local storage.

Add these lines above the onCreate method:

private SharedPreferences preferences;
private SharedPreferences.Editor editor;
private static final String PREFS = "prefs";
private static final String NAME = "name";
private static final String AGE = "age";
private static final String AS_NAME = "as_name";

Then, inside onCreate add:

preferences = getSharedPreferences(PREFS,0);
editor = preferences.edit();

First you need to make the app ask a question, so change speak("Hello") to speak("What is your name?")

You can use a simple logic here, so when someone asks “What is your name?”, the answer is “My name is Dori.”, taking the name from the answer. A simple way is to split the answer’s string by spaces (” “) and get the value of the last index.

Update the code in the recognition method:

private void recognition(String text){
  Log.e("Speech",""+text);

  //creating an array which contains the words of the answer
  String[] speech = text.split(" ");

  //the last word is our name
  String name = speech[speech.length-1];

  //we got the name, we can put it in local storage and save changes
  editor.putString(NAME,name).apply();  

  //make the app tell our name
  speak("Your name is "+preferences.getString(NAME,null));
}

The recognition method uses all the results from the user’s speech. Since the speech may be different, you can differentiate them using certain words they might contain.

For example, the code inside this method could be:

private void recognition(String text){
    Log.e("Speech",""+text);
    String[] speech = text.split(" ");

    //if the speech contains these words, the user is saying their name
    if(text.contains("my name is")){
        String name = speech[speech.length-1];
        Log.e("Your name", "" + name);
        editor.putString(NAME,name).apply();
        speak("Your name is "+preferences.getString(NAME,null));
    }
}

But this is still a simple interaction with the app. You could make it learn your age, or even give it a name.

Inside the same method, try these simple conditions:

//This must be the age

//Just speak: I am x years old.
if(text.contains("years") && text.contains("old")){
    String age = speech[speech.length-3];
    Log.e("THIS", "" + age);
    editor.putString(AGE, age).apply();
}

//Then ask it for your age
if(text.contains("how old am I")){
    speak("You are "+preferences.getString(AGE,null)+" years old.");
}

The app can tell you the time:

//Ask: What time is it?
if(text.contains("what time is it")){
    SimpleDateFormat sdfDate = new SimpleDateFormat("HH:mm");//dd/MM/yyyy
    Date now = new Date();
    String[] strDate = sdfDate.format(now).split(":");
    if(strDate[1].contains("00"))strDate[1] = "o'clock";
    speak("The time is " + sdfDate.format(now));
}

Smart Speak

In the GitHub project I have included more examples for you to experiment with and build your own Android assistant.

I hope you enjoyed this tutorial and have a useful conversation with you phone. Any questions or comments, please let me know below.