tile: optimize and clean up string functions
This change cleans up the string code in a number of ways:
- For memcpy(), fix bug in prefetch and increase distance to 3 lines;
optimize for unaligned data; do all loads before wh64 to make memcpy
safe for forward-overlapping calls; etc. Performance is improved.
- Use new copy_byte() function on tilegx to spread a single byte value
out into a full word using the shufflebytes instruction.
- Clean up header include ordering to be more canonical, and remove
spurious #undefs of function names.
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
diff --git a/arch/tile/lib/memset_32.c b/arch/tile/lib/memset_32.c
index 57dbb3a..9a7837d 100644
--- a/arch/tile/lib/memset_32.c
+++ b/arch/tile/lib/memset_32.c
@@ -12,13 +12,10 @@
* more details.
*/
-#include <arch/chip.h>
-
#include <linux/types.h>
#include <linux/string.h>
#include <linux/module.h>
-
-#undef memset
+#include <arch/chip.h>
void *memset(void *s, int c, size_t n)
{