Get a substring between two strings in PHP

So you need to get a substring between two strings? A quick Google returns about 101,000 results, the first three of which are from StackOverflow and solve the problem for rather specific use-cases. I wanted to solve the problem once-and-for-all in a more general-purpose way.

The “Find Between” Function

/**
 * Finds a substring between two strings
 * @param  string $string The string to be searched
 * @param  string $start The start of the desired substring
 * @param  string $end The end of the desired substring
 * @param  bool   $greedy Use last instance of`$end` (default: false)
 * @return string
 */
function find_between(string $string, string $start, string $end, bool $greedy = false) {
    $start = preg_quote($start, '/');
    $end   = preg_quote($end, '/');
 
    $format = '/(%s)(.*';
    if (!$greedy) $format .= '?';
    $format .= ')(%s)/';
 
    $pattern = sprintf($format, $start, $end);
    preg_match($pattern, $string, $matches);
 
    return $matches[2];
}

If you’re running PHP 5.6 or lower you’ll need to remove the typehints

Nothing too flashy here, it escapes the start and end strings, constructs a regex pattern and returns whatever comes between.

Usage Example

$string = 'fizz foo bar foo foo';
$start  = 'foo';
$end    = 'foo';
$greedy = false;
var_dump(find_between($string, $start, $end));
// string(5) " bar "

By default it’ll find the shortest string possible (i.e. stop at the first instance of the end string) but if you pass true to $greedy then it’ll keep looking until it finds the last instance of the end string.

For example: foo bar foo foo with foo as both start and end string, greedy will return bar foo and non-greedy will return only bar.

A note on WordPress

A common complaint in the WordPress community is the inability to nest shortcodes of the same name, e.g. [short][short]Hello, World![/short][/short] because the regex which parses content for shortcodes is lazy and will stop at the first closing tag.

Assuming [short] wraps its contents in a bold tag, you’d get something like <b>[short]Hello, World!</b>[/short] as the output because the inner shortcode doesn’t get passed to the handling function in the $content parameter as you’d expect it to.

It’s a frustrating behaviour and leads developers to roll their own solutions, such as registering multiple shortcodes (e.g. [short], [short-inner], etc.) or even to register shortcodes dynamically if one is found in the content with a given prefix (e.g. [short], [short-foo], [short-bar], etc.) but that’s not particularly user-friendly, defying the whole point of shortcodes.

This might be fixed in the future (see https://core.trac.wordpress.org/ticket/14481) but for now, you’ll have to roll your own solution too.

Conclusion

While this is by no means a new problem and the solution is hardly revolutionary, it at least provides a super-simple way to get the job done. No more writing regex by hand (and no more substr-ing, explode-ing and implode-ing!) just a clear and simple function to solve a common annoyance.

12 Comments

  1. Hey Rich,

    I tried your solution and it wasn’t working for me so I did some debugging and fixed it.

    There is an issue in this block:
    if ($trim) {
    $string = substr($string, strlen($start));
    $string = substr($string, 0, -strlen($start) + 1);
    }

    Second line in this block should be: $string = substr($string, 0, -strlen($end)); as after trimming `start`, you need to trim length of `end` from the right side of string.

    but later I found the function preg_match also returns trimmed string i.e.
    Array
    (
    [0] => quick brown fox jump
    [1] => brown fox
    )

    So I tweaked above function and here is final outcome:

    function find_between($string, $start, $end, $trim=true, $greedy=false)
    {
    $stringOut = ”;
    $pattern = ‘/’.preg_quote($start).'(.*’;

    if (!$greedy) {
    $pattern .= ‘?’;
    }

    $pattern .= ‘)’.preg_quote($end).’/’;
    preg_match($pattern, $string, $matches);

    if (count($matches)>1) {
    if ($trim) {
    $stringOut = $matches[1];
    } else {
    $stringOut = $matches[0];
    }
    }

    return ($stringOut===”)?false:$stringOut;
    }

    Hope this helps,
    Waqar

    • Hi Waseem, turned out it didn’t handle that very well at all so I re-wrote the function and updated the post. Should be a bit more reliable now!

  2. Doesn’t work. Tried all your examples. Even the one in the comments. It throws a Warning, which doesn’t help to retrieve JSON.

    This works fine.

    function get_string_between($string, $start, $end){
    $string = ‘ ‘ . $string;
    $ini = strpos($string, $start);
    if ($ini == 0) return ”;
    $ini += strlen($start);
    $len = strpos($string, $end, $ini) – $ini;
    return substr($string, $ini, $len);
    }

    • Hi Nick! Which version of PHP are you running? I’ve tested on PHP 7.0, 7.1 and 7.2 but it obviously won’t work on 5.6 or lower because of the typehints. If you remove them from the function it should work as expected :)

  3. Hi Rich,

    You’re function is what I need, however when I use the following:

    $string = ‘url: bvjie2bvij23bevije2vbei2jvbo2.mp3,’;
    $start = ‘url: ‘;
    $end = ‘,’;

    here’s the var_dump:

    string(30) “bvjie2bvij23bevije2vbei2jvbo2.”

    Note that it is truncating after the . before mp3 rather than before the , after mp3. I am running php 7.2

    Thanks.

    Alan

    • Hi Alan,

      That looks like a really annoying little misbehaviour! Unfortunately I can’t reproduce it — have a look at this example: http://sandbox.onlinephpfunctions.com/code/9de5a0f5d26fb5db7b2c2f5853430814c3d07819

      Seems to work on PHP 7.x.x although obviously the type hints will cause an error on previous versions.

      My first thought was that maybe the . wasn’t being escaped and the pattern was therefore looking for a single character, which it found in a . literal, but it looks like `preg_quote()` is taking care of that.

      Let me know if you get it working!

Leave a Reply

Your email address will not be published.