Remove Multiple Spaces in String by Using Constant Memory

Problem Description:

The problem here makes me think: Can we come up with a method to remove multiple spaces in a string in O(N) time and O(1) space where the string is stored as an array rather than a linked list? After I searched out the internet and I found it is impossible but there are still things worth doing on it. I basically have two things to discuss and you are very welcome to leave any comments on them.

A. Multiple spaces could be moved into the tail in O(1) space

Although I say it is impossible to achieve our goal here to remove multiple spaces in string presented in an array. There is one thing we can do better than simple copying one character by one character from the original array to a second array from the beginning with space being examined. By saying we do not have to use the second array from the beginning, I mean two things: (1) we still need it at the end: take (N) spaces; (2) we can use the original array to store all the extra spaces we need to remove: before this, we achieved to use const memory (better than nothing).  Please see the following code:

std::string gao(std::string &strInput, const char cToSkip = ' ')
{
	int iSize = strInput.size();
	if (0 == iSize)
		return "";

	int iDiscardedSpaceCount = 0;
	for (int i = 0; i < iSize; ++i) // expect using o(1) space
	{
		if (cToSkip == strInput[i] && (0 == i || cToSkip == strInput[i - 1]))
			iDiscardedSpaceCount++;
		else
			strInput[i - iDiscardedSpaceCount] = strInput[i];
	}
	if (iSize > iDiscardedSpaceCount)
	{
		if (cToSkip == strInput[iSize - 1 - iDiscardedSpaceCount])
			++iDiscardedSpaceCount;
	}

	return strInput.substr(0, iSize - iDiscardedSpaceCount); // take O(N) space
}

B. Double Linked List Surely is Our Best Choice

What if we can choose the data structure of the input string?  I would be silly if so and I still choose std::string or std::vector. I would surely choose Linked List to store the string, and more specifically, it should be doubly linked list. I implemented all the logic by making a singly list as the following code shows, one can see at the end or the related comment that Why I say doubly linked list is the best choice both here in this post and in my last post here at the end.

struct SListNode
{
	char m_cData;
	SListNode *m_pNext;
	SListNode(char cData) : m_cData(cData), m_pNext(NULL) {};
	SListNode() : m_cData(' '), m_pNext(NULL) {};
};

SListNode * MakeList(const std::string &strData)
{
	SListNode *pHead = NULL;
	SListNode *pData = NULL;
	int iSize = strData.size();
	for (int i = 0; i < iSize; ++i)
	{
		if (NULL == pHead)
		{
			pData = new SListNode(strData[i]);
			pHead = pData;
		}
		else
		{
			pData->m_pNext = new SListNode(strData[i]);
			pData = pData->m_pNext;
		}
	}
	return pHead;
}

void printList(SListNode *pHead)
{
	while (NULL != pHead)
	{
		std::cout << pHead->m_cData ;
		pHead = pHead->m_pNext;
	}
}

void freeList(SListNode *pHead)
{
	SListNode *pNewHead = pHead;
	while(NULL != pNewHead)
	{
		pNewHead = pHead->m_pNext;
		delete pHead;
		pHead = pNewHead;
	}
}

// real O(N) time and O(1) Space
SListNode *gao2(SListNode *pList, const char cToSkip = ' ')
{
	SListNode *pCurrent = pList;
	SListNode *pNewHead = pList;

	while (NULL != pCurrent && cToSkip == pCurrent->m_cData)
	{
		// heading spaces
		pNewHead = pCurrent->m_pNext;
		pCurrent->m_pNext = NULL;
		delete pCurrent;
		pCurrent = pNewHead;
	}

	SListNode *pLast = pCurrent;
	SListNode *pLastLast = pCurrent;
	while(NULL != pCurrent)
	{
		// multiple spaces inside the string
		if ( cToSkip == pCurrent->m_cData && cToSkip == pLast->m_cData)
		{
			pLast->m_pNext = pCurrent->m_pNext;
			pCurrent->m_pNext = NULL;
			delete pCurrent;
			pCurrent = pLast->m_pNext;
		}
		else
		{
			pLastLast = pLast;
			pLast = pCurrent;
			pCurrent = pCurrent->m_pNext;
		}
	}
	// tailing: there could be one more space left
	// that is why doubly linked list is needed here, it is more convenient to achieve
        // the point for pLastLast, and we dont have to keep it in the code logic in such a way here
	if (NULL != pLast && cToSkip == pLast->m_cData)
	{
		pLastLast->m_pNext = NULL;
		delete pLast;
	}
	return pNewHead;
}

Conclusions:

I think coding is quite fun and we can learn a lot from every small piece of code. For this particular problem here “Remove Multiple Spaces in String by Using Constant Memory”. I discussed two points: (1) Multiple spaces could be moved into the tail in O(1) space and (2) Double Linked List Surely is Our Best Choice.

A final little tip: I use default parameter here and make the gao() function more general so that it actually can remove not only multiple spaces but multiple characters you specify when you pass the parameter to the function. :P

Please feel free to leave any comments and share your ideas on this. Thanks.

Written on June 7, 2014