解决Apache Archiva下载文件超时的问题

注意: Apache Archiva 2024-02 开始已经停止维护 建议使用 JFrog Artifactory 替代。

最近使用自己搭建的Apache Archiva来代理Maven仓库,经常发生失败的情况,观察Archiva的日志(logs/archiva.log),看到如下的内容:

2016-11-22 19:52:02,773 [ajp-bio-127.0.0.1-8009-exec-74] WARN  org.apache.archiva.proxy.DefaultRepositoryProxyConnectors [] - Transfer error from repository central for artifact org.mockito:mockito-core:2.2.22::jar , continuing to next repository. Error message: Download failure on resource [https://repo.maven.apache.org/maven2/org/mockito/mockito-core/2.2.22/mockito-core-2.2.22.jar]:GET request of: org/mockito/mockito-core/2.2.22/mockito-core-2.2.22.jar from central failed (cause: java.net.SocketTimeoutException: Read timed out)
2016-11-22 19:52:02,773 [ajp-bio-127.0.0.1-8009-exec-74] ERROR org.apache.archiva.webdav.ArchivaDavResourceFactory [] - Failures occurred downloading from some remote repositories:
        central: Download failure on resource [https://repo.maven.apache.org/maven2/org/mockito/mockito-core/2.2.22/mockito-core-2.2.22.jar]:GET request of: org/mockito/mockito-core/2.2.22/mockito-core-2.2.22.jar from central failed (cause: java.net.SocketTimeoutException: Read timed out)
org.apache.archiva.policies.ProxyDownloadException: Failures occurred downloading from some remote repositories:
        central: Download failure on resource [https://repo.maven.apache.org/maven2/org/mockito/mockito-core/2.2.22/mockito-core-2.2.22.jar]:GET request of: org/mockito/mockito-core/2.2.22/mockito-core-2.2.22.jar from central failed (cause: java.net.SocketTimeoutException: Read timed out)
        at org.apache.archiva.proxy.DefaultRepositoryProxyConnectors.fetchFromProxies(DefaultRepositoryProxyConnectors.java:366) ~[archiva-proxy-2.2.1.jar:?]
        at org.apache.archiva.webdav.ArchivaDavResourceFactory.fetchContentFromProxies(ArchivaDavResourceFactory.java:820) [archiva-webdav-2.2.1.jar:?]
        at org.apache.archiva.webdav.ArchivaDavResourceFactory.processRepository(ArchivaDavResourceFactory.java:629) [archiva-webdav-2.2.1.jar:?]
        at org.apache.archiva.webdav.ArchivaDavResourceFactory.createResource(ArchivaDavResourceFactory.java:325) [archiva-webdav-2.2.1.jar:?]
        at org.apache.archiva.webdav.RepositoryServlet.service(RepositoryServlet.java:126) [archiva-webdav-2.2.1.jar:?]
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:727) [servlet-api-3.0.jar:?]
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:303) [tomcat-catalina-7.0.52.jar:7.0.52]
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208) [tomcat-catalina-7.0.52.jar:7.0.52]
        at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220) [tomcat-catalina-7.0.52.jar:7.0.52]
        at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122) [tomcat-catalina-7.0.52.jar:7.0.52]
        at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:501) [tomcat-catalina-7.0.52.jar:7.0.52]
        at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:170) [tomcat-catalina-7.0.52.jar:7.0.52]
        at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98) [tomcat-catalina-7.0.52.jar:7.0.52]
        at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:950) [tomcat-catalina-7.0.52.jar:7.0.52]
        at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116) [tomcat-catalina-7.0.52.jar:7.0.52]
        at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408) [tomcat-catalina-7.0.52.jar:7.0.52]
        at org.apache.coyote.ajp.AjpProcessor.process(AjpProcessor.java:193) [tomcat-coyote-7.0.52.jar:7.0.52]
        at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:607) [tomcat-coyote-7.0.52.jar:7.0.52]
        at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:313) [tomcat-coyote-7.0.52.jar:7.0.52]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [?:1.7.0_111]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [?:1.7.0_111]
        at java.lang.Thread.run(Thread.java:745) [?:1.7.0_111]

明显是从https://repo.maven.apache.org/maven2这个地址下载数据的时候发生了异常。这个仓库的地址是Apache Archiva中的默认仓库地址。从目前的测试来看,这个地址在国内访问,经常出现问题。对于国内用户来说https://repo1.maven.org/maven2这个中央仓库的地址是相对来说更加稳定。因此只要在Remote Repositories中增加这个中央仓库地址即可。
如下图操作:

repositorylist

add_repository

另外,在添加完成后,顺便在属性中修改一下Download Timeout,从默认的60秒修改到600秒,减少超时的发生即可。

如上操作只能部分解决问题,在现实过程中,依旧会发生失败,失败主要集中在下载https://repo1.maven.org/maven2/.index/nexus-maven-repository-index.gz这个索引文件的时候,这个索引文件有300-400MB的样子,一次完整的下载基本上是都会失败,要命的是Apache Archiva在处理这个文件的时候,基本上没有进行任何容错处理。这个时候我们要么修改源代码来修正,要么需要辅助Apache Archiva完成这个文件的下载。

下面,我们实验通过Linux定时任务,nginxaria2来实现对Apache Archiva下载的辅助处理。

1.首先安装必须的软件

$ sudo apt-get install nginx

$ sudo apt-get install aria2

2.接下来,配置nginx

$ sudo vim /etc/nginx/sites-enabled/default

整个配置文件的原文如下:

##
# You should look at the following URL's in order to grasp a solid understanding
# of Nginx configuration files in order to fully unleash the power of Nginx.
# http://wiki.nginx.org/Pitfalls
# http://wiki.nginx.org/QuickStart
# http://wiki.nginx.org/Configuration
#
# Generally, you will want to move this file somewhere, and start with a clean
# file but keep this around for reference. Or just disable in sites-enabled.
#
# Please see /usr/share/doc/nginx-doc/examples/ for more detailed examples.
##

# Default server configuration
#
server {
	listen 80 default_server;
	listen [::]:80 default_server;

	# SSL configuration
	#
	# listen 443 ssl default_server;
	# listen [::]:443 ssl default_server;
	#
	# Note: You should disable gzip for SSL traffic.
	# See: https://bugs.debian.org/773332
	#
	# Read up on ssl_ciphers to ensure a secure configuration.
	# See: https://bugs.debian.org/765782
	#
	# Self signed certs generated by the ssl-cert package
	# Don't use them in a production server!
	#
	# include snippets/snakeoil.conf;

	root /var/www/html;

	# Add index.php to the list if you are using PHP
	index index.html index.htm index.nginx-debian.html;

	server_name _;

	location / {
		# First attempt to serve request as file, then
		# as directory, then fall back to displaying a 404.
		try_files $uri $uri/ =404;
	}

	# pass the PHP scripts to FastCGI server listening on 127.0.0.1:9000
	#
	#location ~ \.php$ {
	#	include snippets/fastcgi-php.conf;
	#
	#	# With php7.0-cgi alone:
	#	fastcgi_pass 127.0.0.1:9000;
	#	# With php7.0-fpm:
	#	fastcgi_pass unix:/run/php/php7.0-fpm.sock;
	#}

	# deny access to .htaccess files, if Apache's document root
	# concurs with nginx's one
	#
	#location ~ /\.ht {
	#	deny all;
	#}
}


# Virtual Host configuration for example.com
#
# You can move that to a different file under sites-available/ and symlink that
# to sites-enabled/ to enable it.
#
#server {
#	listen 80;
#	listen [::]:80;
#
#	server_name example.com;
#
#	root /var/www/example.com;
#	index index.html;
#
#	location / {
#		try_files $uri $uri/ =404;
#	}
#}

修改后的结果如下:

##
# You should look at the following URL's in order to grasp a solid understanding
# of Nginx configuration files in order to fully unleash the power of Nginx.
# http://wiki.nginx.org/Pitfalls
# http://wiki.nginx.org/QuickStart
# http://wiki.nginx.org/Configuration
#
# Generally, you will want to move this file somewhere, and start with a clean
# file but keep this around for reference. Or just disable in sites-enabled.
#
# Please see /usr/share/doc/nginx-doc/examples/ for more detailed examples.
##

# Default server configuration
#
server {
	#listen 80 default_server;
	#listen [::]:80 default_server;
	listen 127.0.0.1:8090 default_server;
	# SSL configuration
	#
	# listen 443 ssl default_server;
	# listen [::]:443 ssl default_server;
	#
	# Note: You should disable gzip for SSL traffic.
	# See: https://bugs.debian.org/773332
	#
	# Read up on ssl_ciphers to ensure a secure configuration.
	# See: https://bugs.debian.org/765782
	#
	# Self signed certs generated by the ssl-cert package
	# Don't use them in a production server!
	#
	# include snippets/snakeoil.conf;

	root /data/nginx/maven_index;

	# Add index.php to the list if you are using PHP
	index index.html index.htm index.nginx-debian.html;

	server_name _;

	location / {
		# First attempt to serve request as file, then
		# as directory, then fall back to displaying a 404.
		try_files $uri $uri/ =404;
		resolver 114.114.114.114 218.85.152.99;
		resolver_timeout 30s;
		
		#nginx 不支持if嵌套也不支持多条件判断,因此只能用下面的方式来模拟
		#留意判断语句与括号之间的空格,缺失空格会导致语法错误		
		#if ( ( $host ~* "repo1\.maven\.org" ) && ( $request_uri ~* "maven2/\.index" ) ) {
		set $flag 0;
		if ( $host ~* "repo1\.maven\.org" ) {
			set $flag "${flag}1";
		}
		if ( $request_uri ~* "maven2/\.index" ) {
			set $flag "${flag}2";
		}
		if ($flag = "012") {
			proxy_pass http://127.0.0.1:8090$request_uri;
		}
		#避免循环重定向问题
 		if ( $host != "127.0.0.1" ) {
			proxy_pass http://$host$request_uri;
		}		
	}

	# pass the PHP scripts to FastCGI server listening on 127.0.0.1:9000
	#
	#location ~ \.php$ {
	#	include snippets/fastcgi-php.conf;
	#
	#	# With php7.0-cgi alone:
	#	fastcgi_pass 127.0.0.1:9000;
	#	# With php7.0-fpm:
	#	fastcgi_pass unix:/run/php/php7.0-fpm.sock;
	#}

	# deny access to .htaccess files, if Apache's document root
	# concurs with nginx's one
	#
	#location ~ /\.ht {
	#	deny all;
	#}
}


# Virtual Host configuration for example.com
#
# You can move that to a different file under sites-available/ and symlink that
# to sites-enabled/ to enable it.
#
#server {
#	listen 80;
#	listen [::]:80;
#
#	server_name example.com;
#
#	root /var/www/example.com;
#	index index.html;
#
#	location / {
#		try_files $uri $uri/ =404;
#	}
#}

接下来,重启nginx服务。

3.设置定时任务,定时检查远端服务器上的数据是否有更新

任务脚本内容如下:

nginx_dir="/data/nginx"
mvn_idx_dir="$nginx_dir/maven2/.index"
mvn_idx_dl_dir="$nginx_dir/maven2/.CacheIndex"
mvn_idx_name="nexus-maven-repository-index"
mvn_idx_path_name="$mvn_idx_dir/$mvn_idx_name"
mvn_idx_gz_name="$mvn_idx_dir/$mvn_idx_name.gz"
mvn_idx_prop_name="$mvn_idx_dir/$mvn_idx_name.properties"
mvn_idx_dl_gz_name="$mvn_idx_dl_dir/$mvn_idx_name.gz"
mvn_idx_dl_prop_name="$mvn_idx_dl_dir/$mvn_idx_name.properties"
log_file="/data/nginx/maven_index_log.txt"
dt_fmt="`date "+%Y-%m-%d %H:%M:%S"`"



if [ ! -d $mvn_idx_dl_dir ]; then
	mkdir -p $mvn_idx_dl_dir
fi

if [ ! -d $mvn_idx_dir ]; then
	mkdir -p $mvn_idx_dir
fi

if [ ! -d $mvn_idx_dl_dir ]; then
	echo "$dt_fmt	mkdir $mvn_idx_dl_dir failed !" >> $log_file
	exit -1 
fi

if [ ! -d $mvn_idx_dir ]; then
	echo "$dt_fmt	mkdir $mvn_idx_dir failed !" >> $log_file
	exit -1 
fi

cd $mvn_idx_dl_dir

dl_idx="true"
dl_prop="true"

#echo mvn_idx_gz_name="$mvn_idx_gz_name"
#先检查是不是需要下载文件,然后再下载文件,减少不必要的网络请求数据
if [ -f "$mvn_idx_gz_name" ]; then
	remote_idx_md5=$(curl -s https://repo1.maven.org/maven2/.index/nexus-maven-repository-index.gz.md5)
	 
	if [ "${#remote_idx_md5}" != "32" ] ; then
		echo "$dt_fmt	download $mvn_idx_gz_name.md5 failed !" >> $log_file
		exit -1
	fi
	local_idx_md5=$(md5sum $mvn_idx_gz_name | cut -b 1-32)
	
	#echo remote_idx_md5="$remote_idx_md5" local_idx_md5="$local_idx_md5"
	if [ "$remote_idx_md5" = "$local_idx_md5" ] ; then
		dl_idx="false"
		if [ -f "$mvn_idx_prop_name" ]; then
			remote_prop_md5=$(curl -s https://repo1.maven.org/maven2/.index/nexus-maven-repository-index.properties.md5)
			if [ "${#remote_prop_md5}" != "32" ] ; then
				echo "$dt_fmt	download $mvn_idx_prop_name failed !" >> $log_file
				exit -1
			fi
			local_prop_md5=$(md5sum $mvn_idx_prop_name | cut -b 1-32)
			if [ "$remote_prop_md5" = "$local_prop_md5" ] ; then
				dl_prop="false"
				echo "$dt_fmt	check file success ,no need to update !" >> $log_file
				exit 0
			fi
		fi
	fi
fi

if [ "$dl_idx" = "true" ] ; then
	aria2c -c https://repo1.maven.org/maven2/.index/nexus-maven-repository-index.gz
	aria2c -c https://repo1.maven.org/maven2/.index/nexus-maven-repository-index.gz.md5
	aria2c -c https://repo1.maven.org/maven2/.index/nexus-maven-repository-index.gz.sha1
fi

if [ "$dl_prop" = "true" ] ; then
	aria2c -c https://repo1.maven.org/maven2/.index/nexus-maven-repository-index.properties
	aria2c -c https://repo1.maven.org/maven2/.index/nexus-maven-repository-index.properties.md5
	aria2c -c https://repo1.maven.org/maven2/.index/nexus-maven-repository-index.properties.sha1
fi

#校验下载到的文件
if [ "$dl_idx" = "true" ] ; then
	dl_idx_md5=$(md5sum $mvn_idx_dl_gz_name | cut -b 1-32)
	dl_idx_f_md5=$(cat $mvn_idx_dl_gz_name.md5)
	if [ "$dl_idx_md5" = "$dl_idx_f_md5" ] ; then
		rm -rf $mvn_idx_gz_name
		mv $mvn_idx_dl_gz_name $mvn_idx_gz_name
		rm -rf $mvn_idx_gz_name.md5 
		mv $mvn_idx_dl_gz_name.md5 $mvn_idx_gz_name.md5
		rm -rf $mvn_idx_gz_name.sha1
		mv $mvn_idx_dl_gz_name.sha1 $mvn_idx_gz_name.sha1
	else
		echo "$dt_fmt	check downloaded index file failed !" >> $log_file
		cd  $nginx_dir
		rm -rf $mvn_idx_dl_dir
	fi
fi

if [ "$dl_prop" = "true" ] ; then
	dl_prop_md5=$(md5sum $mvn_idx_dl_prop_name | cut -b 1-32)
	dl_prop_f_md5=$(cat $mvn_idx_dl_prop_name.md5)
	if [ "$dl_prop_md5" = "$dl_prop_f_md5" ] ; then
		rm -rf $mvn_idx_prop_name
		mv $mvn_idx_dl_prop_name $mvn_idx_prop_name
		rm -rf $mvn_idx_prop_name.md5 
		mv $mvn_idx_dl_prop_name.md5 $mvn_idx_prop_name.md5
		rm -rf $mvn_idx_prop_name.sha1
		mv $mvn_idx_dl_prop_name.sha1 $mvn_idx_prop_name.sha1
	else
		echo "$dt_fmt	check downloaded properties file failed !" >> $log_file
		cd  $nginx_dir
		rm -rf $mvn_idx_dl_dir
	fi
fi

默认我们把脚本执行路径为/data/nginx/mvn_index_corn.sh

设置定时任务的脚本如下:

chmod +x /data/nginx/mvn_index_corn.sh
#write out current crontab
crontab -l > addcron
#echo new cron into cron file ,每隔30分钟我们调度一次任务,前面是文件锁,防止并发冲突
echo "30 * * * * flock -x -w 10 /dev/shm/mvn_index_corn.lock -c \"sh /data/nginx/mvn_index_corn.sh\"" >> addcron
#install new cron file
crontab addcron
rm addcron

执行上面的脚本。

4.设置Apache Archiva的代理服务器配置

发布者

发表回复

您的电子邮箱地址不会被公开。 必填项已用 * 标注