There are a couple of different methods. The easiest one is to locate a special software called S.A.M - The Software Automatic Mouth. It produces speech through phonemes, and is highly programmable. It doesn't require any additional hardware, and is very good.
Then, you have cartridges such as Currah uSpeech (or rather µSpeech). These contain special hardware and have a cable that goes from the video output to the cartridge and a through port. The speech may be of higher quality than the software solution, but it is less programmable. I think the speech chip is SP0256 or something like that. It was very popular in a wide range of products for different computers and handheld units.
There may be some more solutions, but all can be divided into software or hardware. The end user will need to have the right kind of speech synthesis, as these systems are not compatible with eachother. Theoretically it may be possible to detect from software if you have either program or cartridge loaded, and conditionally call different speech commands or none at all if no speech synthesis was found. A few games are said to contain such conditional speech, but I never saw it working myself.