These functions are to help in using the namazu search engine (see http://www.namazu.org/) along with the namazu php module (see ftp://night.fminn.nagano.nagano.jp/php4/).
To install Namazu and Kakasi first get and install the following RPMs. The version numbers do not need to match exactly - it is usually best to get and install the most recent versions.
rpm -U kakasi-dict-2_3_4-1_i386.rpm rpm -U kakasi-2_3_4-1_i386.rpm rpm -U perl-File-MMagic-1_13-2_noarch.rpm rpm -U perl-NKF-1.71-2.i386.rpm rpm -U perl-Text-Kakasi-1_05-1_i386.rpm rpm -U namazu-2_0_10-1_i386.rpm rpm -U namazu-devel-2_0_10-1_i386.rpm rpm -U perl-Search-Namazu-0_13-2_i386.rpm |
Next get the php module and install as follows:
cd /usr/local/src/namazu/ tar xzf php4_namazu-2.1.0.tar.gz cd /namazu phpize ./configure --with-namazu su make install |
Make sure the place it installs it to is listed as the extensions_dir in php.ini. If not either move namazu.so, or alter php.ini.
To make the index for your site you run this command from the commandline:
mknmz --indexing-lang=ja -M -k -e --output-dir='/var/mysite/namazu.index/' --replace='s#/web/htdocs/mysite/#/#' /web/htdocs/mysite/ |
-M means use HTML meta tags
-k means use kakasi for analyzing Japanese
-e means exclude HTML files containing <meta name="ROBOTS" content="NOINDEX">
--output-dir means write all the index files under /var/mysite/namazu.index/ (which must already exist).
You can add --update='/var/mysite/namazu.index/' if the index has been created previously.
--replace='s#htdocs_base#url_base#' changes the entries from being disk paths into being URLs.
You can use --deny and --exclude to have it exclude certain files and sub-directories.
NB. You'll either want to wrap that command up in an administration web page and/or put it in a cron job.
$fname is the full path of a csv file of synonyms.
$synonyms is an array to add them to.
It returns the number of entries added, or -1 if file not there.
NB. Requires parse_text_data.inc to have been included.
Handle synonyms and other processing of user input. NB. This function requires the mb_convert_kana() function (i.e. for php to be compiled with multi-byte support).
This function is mostly just example usage, but as the name suggests it should be usable in the typical search engine application.
It returns false if any problem (with an error message). If $q is blank it will return false with no error message. Otherwise it returns nmzid and hlist.
This example shows a complete example using fclib_namazu_do_typical1().
<?php
include_once "../faq_admin/settings.inc";
include_once $FCLIB_PATH."parse_text_data.inc";
include_once $FCLIB_PATH."namazu.inc";
if(array_key_exists('q',$_REQUEST))$q=$_REQUEST['q'];
else $q='';
$internal="SJIS"; //We assume the cgi inputs are this encoding, and
//that the page output should also be in this encoding.
$idx_file=array("/var/mysite/namazu.index/");
$synonym_files=array("/var/mysite/synonyms.csv");
$ret=fclib_namazu_do_typical1($q,$synonym_files,$idx_file,$internal);
if($ret){
list($nmzid,$hlist)=$ret;
$num=nmz_num_hits($hlist);
//Do some logging. NB. The search_failed version is redundant so not created
//currently - just use the entries in main log that have 3rd column as zero.
fclib_save($amway_faq_data_dir."search.$machine_id.log",date("Y-m-d H:i:s").",$faq,$num,$q\n","at");
//if($num==0)fclib_save($amway_faq_data_dir."search_failed.faq.$machine_id.log",date("Y-m-d H:i:s").",$faq,$num,$q\n","at");
}
else $num=0;
$st=0;$per_page=10; //Use for next/prev pages. For this example it is hard-coded
//to only show the top 10 hits.
<html>
<head>
<title>Search</title>
</head>
<body>
<h2>Search</h2>
<form action="<?php echo $PHP_SELF;?>" method="post">
<input type="text" name="q" value="<?php echo $q:?>">
<input type="submit">
</form>
<?php if($q>''){
echo "<hr>Found $num matches<br>\n";
for($i=$st;$i<($st+$per_page) && $i<$num;$i++){
$subject=mb_convert_encoding(nmz_result_field($hlist,$i,"subject"),$internal,"EUC");
$desc=mb_convert_encoding(nmz_result_field($hlist,$i,"description"),$internal,"EUC");
$uri=nmz_result_field($hlist,$i,"uri");
$size=nmz_result_field($hlist,$i,"size");
$date=nmz_result_field($hlist,$i,"date");
?>
<a href="<?php echo $uri;?>"><font color="#0066CC"> <?php echo $subject;?></font></a><br>
<?php
if($desc!='')echo "<br>$desc</br>\n";
echo "Date: $date<br>\n";
echo "Size: $size bytes<br>\n";
}
if(isset($hlist) && $hlist)nmz_free_result($hlist);
if(isset($nmzid) && $nmzid)nmz_close($nmzid);
?>
</body>
</html>
|